evaluation framework
DESCRIPTION
Evaluation Framework. Russ Masco Matt Heusman. Objectives. To understand the necessity of formal evaluation in the education process To become familiar with components of evaluation, terminology, and a basic overview of some of the technical aspects of setting up an effective evaluation - PowerPoint PPT PresentationTRANSCRIPT
Evaluation Framework
Russ MascoMatt Heusman
Objectives
1. To understand the necessity of formal evaluation in the education process
2. To become familiar with components of evaluation, terminology, and a basic overview of some of the technical aspects of setting up an effective evaluation
3. An awareness of the planning needed before implementation if a program is to be evaluated for impact
Outline• Terminology• Stating the case for formal evaluation in
education• Identify phases of evaluation• Discuss types of evaluation• Outline components of an ideal evaluation
model• Brief mention of ethics, logistics, and
alternatives to the ideal model
What we will not cover
• Details of program implementation• Data collection process in great detail• Data analysis process• Drawing conclusions from the analysis
Baseline Data Categorical Data Cause Census Data Continuous Data Control Group Dependent
Variable Experimental Unit External Validity Extraneous Variable Independent Variable Internal Validity
Level of Analysis n size Population Practical Significance Reliability
Research Question Sample-Set of Measurement Statistical Significance Treatment
Treatment Group Unit of Analysis Validity
• Baseline data-measurements collected from the units of analysis before any treatment is given
• Categorical data-measurements that are based on groupings or are binary in nature (unable to distinguish the amount of difference between measurements)
• Census data-The state of having a measurement of observation for every unit in the population of interest
• Continuous data- measurements composed of actual numbers non-grouped (form a steady progression/interval of real numbers)
• Control group- Portion of a sample or population that does not receive any treatment• Dependent variable-Observed result of the independent variable being manipulated (we
don’t say that one caused the other)• Experimental Unit-The unit from which the sample is taken (student, school) these are not
the sample• External validity-The degree to which results can be applied to other groups outside of the
experimental group• Extraneous Variable- a variable other than the variable being measured that may have an
effect on the dependent variable (Usually try to control for or minimize)• Independent variable-attribute taken as given. The value being manipulated or changed in an
experiment or evaluation process. Usually associated with the treatment
• Internal validity- the degree to which the experiment accurately measured what happened within the group (sample) the experiment took place within.
• Level of analysis-The organizational level at which your experiment takes place• n size- The number of measurements in your sample (sample size)• Population- The entire set of measurements for the group of interest• Practical significance-An arbitrary limit whereby some difference in variables is
useful• Relationship- A change in the amount of one variable results in the change of another
variable (be careful when stating relationship versus cause!!)• Reliability-the degree to which someone could replicate your experiment and get the
same results or that the same data would be collected in repeated observations• Research question-The question you wish the experiment to answer• Sample-set of measurements or observations (not the unit from which it was
observed) taken from a larger group• Statistical significance- The likelihood that a relationship between variables is
attributed to sampling error or to a true relationship (based on probability)• Treatment - an agent of change administered to something or someone• Treatment group- Portion of a sample of population that receives a treatment or agent
of change• Unit of analysis- The entity from which the sample (measurement) is taken• Validity-Measuring what you are intending to measure
Why?• One of the ways to get more support for
programs is to convince people that programs are working
• May translate to money and monetary support for education
• Generate excitement and understanding in the community
• Exposure• Share ideas across the state• Impact policy • Much less threatening the evaluate a program
during than afterwardsPrezi
Needs analysis
Program determination
Evaluation design
Baseline data
collectionProgram
implementation
Post implemation
data collection
Data analysis
Conclusion
Terminate, continue, or
expand program
• Will be based on the objectives of the evaluation• Will drive the rest of the evaluation (experiment)
• Sampling procedure• Baseline data collected• Level of analysis • Unit of analysis• Level of data collected (measurement)• Categorical • Nominal• Interval• Continuous• Interval • Ratio• Statistical procedure• How the conclusion is stated
The importance of the research question cannot be overstated. Must start thinking about it early.
Research Question
• Who is eligible• Able to select some and not others• Is it gold plated or is it feasible to implement
program if successful on a larger scale
Program Determination
Theory of Change
• Who is the target• What are their needs• What is the program seeking to change• What is the precise program or part of
program being evaluated• Intermediate indicators• Final outcomes• What are the measureless
Needs Assessment• Who is the target population• What is the nature of the problems being solved• How does the service fit into the environment,
curriculum• Consideration for support and professional
development• Clear sense of need program will fill• Clear sense of alternatives• Use data (qualitative and quantitative) to identify
needs
Logical Framework
Needs Input Output Outcome Impact
Low income, and low learning, with limited or no access to technology
School District purchases tablet devices.
Children use tablets and are able to study better.
Higher test scores.
Better standard of living. Long term.
Evaluation Components
• Needs Assessment• Process Evaluation• Impact Evaluation• Review• Cost-Benefit Analysis• Extended Impacts – unexpected results or
outcomes
Process Evaluation
• Are the services being delivered• Can it be done at a lower cost• Are the services reaching the intended
population• Support for participants– Teachers– Students– Parents– Other stakeholders
Impact Evaluation
• What impact did the program have on?? • How does it relate to the theory of change?– Intermediate indicators– Final outcomes
• Distributional Questions• Program or Treatment satisfaction
Part of the objective of evaluation is to determine if a program
(treatment)/Independent variable impacts another dependent variable
• Will be based on the objectives of the evaluation• Will drive the rest of the evaluation(experiment)
• Sampling procedure• Baseline data collected• Level of analysis • Unit of analysis• Level of data collected (measurement)• Categorical • Nominal• Interval• Continuous• Interval • Ratio• Statistical procedure• How the conclusion is stated
• The importance of the research question cannot be overstated
Research Question
Questions to think about related to research question
To whom are the conclusions of the evaluation being applied?
• Just the group from which the control group and
the treatment group was taken• To a group larger than just the sample group• District 00-0000 school children• ESU 00 School children• Nebraska School children• USA school children
SamplingIs all about to whom or what you want generalize your results
Will make or break your external validity
Population from which you sample will be as far as you can generalize your results
If you want to know about all students in the USA each student in the US would need to have the equal probability of being chosen for the
sample
Sample must represent the population if results are to be
applied
And results can not be applied to separate population with different characteristics even if one is similar
y
x
Decision Do you have census data
Yes• You do not need to worry about sampling procedure at times
• May still need to randomize assignment to control and treatment group• Often time the analysis will be based on Practical conclusions not probabilities
• Results will only be applied to that group
No• Must be concerned about sampling as well as assignment
• Analysis will be based on probabilities (statistics)
Note: There are different ways to think about the definition of census data• All students in a school
• All students in and ESU’s boundary
Scale of measurement for independent variable
Present/Absentor
How much present/absent • requires more than one treatment group
• Means smaller sample size for each group(less powerful test)
More Things to Consider
Even more things to consider
Scale of measurement for dependent variableImpact Yes/No or How much impact
Will determine type of data collected continuous or categoricalWhich determines data analysis (statistical) procedure that is appropriate
• Usually best to collect most detailed level of data possible
• If continuous data is available collect at that level • Easy to move to categorical from continuous but not
the other way
Ideal method (true experiment)
1. Determine the population from which to sample
Determine how many treatment groups you will needWill need a group for each treatment and one for control Many times one treatment group and one control group
Caution: The more groups you break your sample into the more power you lose in your test as n size decreases
n size Power of test
Power of test = how sure you are of a relationship existing between two variablesWhen n size decreases confidence interval will be larger(wider)
How do we know that the variables we are measuring are related
Or is it other (extraneous) variables?
Student Variables
Home Support
Language Fluency
Learning style match
Reading program
Treatment GroupReading scores
increased
Control GroupReading scores
stayed the same
Motivation
Which student variables are related to the increase in reading scores
School Year
Read
ing
Scal
e Sc
ore
Aug. Jan.Nov. March May
Treatment GroupControl GroupBaseline Data
Treatment
Control
How were the treatment and control group members assigned?
All from one classroomAll from one side of the classroom
Could those other variables be grouped by the classroom or by the side of the classroom?
Three ways to be sure other variables are not the ones showing the relationship or to equalize the
impact on the outcome
• Control for all variables that may impact the result• List all variables (independent ) that may
impact the results (dependent variable)• Control for those “Other” extraneous
variables• Means hold them constant• Would need to quantify extraneous
variables
Method 1
Example of method 1
Start with 100 students Only take those of a certain Home support level x =70 students
Only use Students with a certain learning style Y 50=students Only take students with a motivation level Z 30=students
Small n lower level of power in testMay only have 15 in each group when divided into treatment and control
Measure each of the extraneous variables
Potentially creates a more statistically and logistically complex analysis like:
Multiple regression models versus a comparison of means(T-test or ANOVA)
May also increase the complexity of implementationWould need to be able to quantify the extraneous variables
Method 2
Randomly assign students to the control and treatment group
Random meaning that each subject has the equal probability of being selected for the treatment or control group
Added benefitStill have 50 in each group= more powerful test!
Method 3
Treatment Group Control Group
!!!
Sample must represent the population if results are to be applied
Randomly assigned
Spreads out extraneous variables other than the reading program evenly in both groups
So we can assume to a greater degree that the reading program is related to the change because the other (extraneous) variables
should impact both groups the same.
Why Random?
Non-Random assignment
Treatment Group Control GroupSubject 1 motivation =10 Subject 1motivation=3Subject 2 motivation= 8 Subject 2 motivation=4Subject 3 motivation= 9 Subject 3 motivation=3Subject 4 motivation=9 Subject 4 motivation=3
If the reading program had no influence the treatment group may still show post test scores higher than control
Was it reading program or motivation?
VS.
Random AssignmentTreatment Group Control GroupSubject 1 motivation=3 Subject 1 motivation=9Subject 2 motivation=10 Subject 2 motivation=4Subject 3 motivation= 3 Subject 3 motivation=8Subject 4 motivation=9 Subject 4 motivation=3
Now the impact of motivation is distributed equally between the two groupsResults more likely to indicate a relationship between the variables
Steps
Determine Population to be sampled To whom do you wish to generalize the results
Determine which level of subject to be randomized over
What Level to Randomize
• School or students in a school• Some of this will be determined by the
research question• The higher level of randomization the harder it
will be to obtain a large sample size• But the easier to generalize to larger groups• Must find the balance
Assign Each Subject a Number
Use a random number generator or table to pick numbers for the • Treatment group• Control group
Assign subjects to the groups based on results of table or number generator
At this point researchers will sometimes do a check of the groups for homogeneity between the
groups on variables determined to randomized over
This can add a considerable cost and time to the evaluation
Baseline data
For the variable you are measuring collect values for that variable before you start the treatment
Treatment groupControl group
This may also be called pre-test data
Test (NeSA, NRT etc.) offer this automatically but if it is locally collected data this must be considered.
Due Dates
You’ll let us
submit late
Sample Fall CalendarData Collection/Form Due Date
Review Period Ending date – FIINAL NO CHANGES
Non Certificated Staff 31-Oct 15-Nov
Carl D. Perkins Career and Technical Education Act Report 10-Aug 15-Nov
Summer School Supplement 31-Aug 15-Nov
Elementary Class Size 15-Oct 15-Nov
Summer School Student Unit 15-Oct 15-Nov
Elementary Site Allowance 15-Oct 15-Nov
Assessed Valuation and Levies 15-Oct 15-Nov
Two-Year New School Adjustment Application 15-Oct 15-Nov
Student Growth Adjustment 15-Oct 15-NovInstructional Time 15-Oct 15-Nov
PK Instructional Program Hours/K Program 15-Oct 15-NovNSSRS Staff Data 31-Oct 15-Nov
Membership (last Friday in September) 31-Oct 15-NovSPED Child Count 31-Oct 15-NovEC Program Participation 31-Oct 15-Nov
Nonpublic Membership 31-Oct 15-Nov
SAMPLE
Administer the treatmentWatch the process
One group getting the treatment impacts the results of the control group
Students that got books (treatment) shares the books with friends who did not (control)
Post TestCollect post test values
CompareData analysis
Pre-test control to post test controlPre-test treatment to post-test treatment
Greater difference between pre and post results in the treatment
group = There is a relationship between the independent and dependent variable
There is a relationship between the treatment (independent variable) and the outcome (dependent variable) being measuring
School Year
Read
ing
Scal
e Sc
ore
Aug. Jan.Nov. March May
Treatment GroupControl GroupBaseline Data
Treatment
Control
Ethics
Is it ok for one group to receive the treatment and not another?
How can you deny one group the treatment?
Another way to think about itRandomizing is not always an ethics problem
• If the treatment is new the control group is still getting the old services
• If you are worried that the new treatment will not be as good as the old give both to the treatment group
Is this really any worse than they were under the old program?
At times ethics may not allow for a true randomized experiment
Once you know the new program works better then it would be a problem to continue to deny someone
or if it appears to be harmful then you would not want to continue to give it to a group
Prezi
Needs analysis
Program determination
Evaluation design
Baseline data
collectionProgram
implementation
Post implemation
data collection
Data analysis
Conclusion
Terminate, continue, or expand program
Logistics
At times logistic may make a true experimental design impossible or cost prohibitive
If logistics or ethics determine the inability to conduct a true randomized
experiment evaluation work is still possible
Quasi-experimental designs
Time Series designNonequivalent control groupMultiple time series design
Control of extraneous variables if sample size is large enough (method 1)
If true randomization is not possible there are other means of implementation such as
• Phase in
• Rotation
• Encouragement
Qualitative methods
• Can be used in addition to quantitative methods•Often times qualitative work before the
quantitative design• Can help outline possible variables to measure as well as those (extraneous) that may influence the
results• Possibly used in the Needs Analysis