evaluation framework

Evaluation Framework

Russ MascoMatt Heusman

Objectives

1. To understand the necessity of formal evaluation in the education process

2. To become familiar with components of evaluation, terminology, and a basic overview of some of the technical aspects of setting up an effective evaluation

3. An awareness of the planning needed before implementation if a program is to be evaluated for impact

Outline• Terminology• Stating the case for formal evaluation in

education• Identify phases of evaluation• Discuss types of evaluation• Outline components of an ideal evaluation

model• Brief mention of ethics, logistics, and

alternatives to the ideal model

What we will not cover

• Details of program implementation• Data collection process in great detail• Data analysis process• Drawing conclusions from the analysis

Baseline Data Categorical Data Cause Census Data Continuous Data Control Group Dependent

Variable Experimental Unit External Validity Extraneous Variable Independent Variable Internal Validity

Level of Analysis n size Population Practical Significance Reliability

Research Question Sample-Set of Measurement Statistical Significance Treatment

Treatment Group Unit of Analysis Validity

• Baseline data-measurements collected from the units of analysis before any treatment is given

• Categorical data-measurements that are based on groupings or are binary in nature (unable to distinguish the amount of difference between measurements)

• Census data-The state of having a measurement of observation for every unit in the population of interest

• Continuous data- measurements composed of actual numbers non-grouped (form a steady progression/interval of real numbers)

• Control group- Portion of a sample or population that does not receive any treatment• Dependent variable-Observed result of the independent variable being manipulated (we

don’t say that one caused the other)• Experimental Unit-The unit from which the sample is taken (student, school) these are not

the sample• External validity-The degree to which results can be applied to other groups outside of the

experimental group• Extraneous Variable- a variable other than the variable being measured that may have an

effect on the dependent variable (Usually try to control for or minimize)• Independent variable-attribute taken as given. The value being manipulated or changed in an

experiment or evaluation process. Usually associated with the treatment

• Internal validity- the degree to which the experiment accurately measured what happened within the group (sample) the experiment took place within.

• Level of analysis-The organizational level at which your experiment takes place• n size- The number of measurements in your sample (sample size)• Population- The entire set of measurements for the group of interest• Practical significance-An arbitrary limit whereby some difference in variables is

useful• Relationship- A change in the amount of one variable results in the change of another

variable (be careful when stating relationship versus cause!!)• Reliability-the degree to which someone could replicate your experiment and get the

same results or that the same data would be collected in repeated observations• Research question-The question you wish the experiment to answer• Sample-set of measurements or observations (not the unit from which it was

observed) taken from a larger group• Statistical significance- The likelihood that a relationship between variables is

attributed to sampling error or to a true relationship (based on probability)• Treatment - an agent of change administered to something or someone• Treatment group- Portion of a sample of population that receives a treatment or agent

of change• Unit of analysis- The entity from which the sample (measurement) is taken• Validity-Measuring what you are intending to measure

Why?• One of the ways to get more support for

programs is to convince people that programs are working

• May translate to money and monetary support for education

• Generate excitement and understanding in the community

• Exposure• Share ideas across the state• Impact policy • Much less threatening the evaluate a program

during than afterwardsPrezi

http://prezi.com/iz2lojq4aozz/evaluation-framework/

Needs analysis

Program determination

Evaluation design

Baseline data

collectionProgram

implementation

Post implemation

data collection

Data analysis

Conclusion

Terminate, continue, or

expand program

• Will be based on the objectives of the evaluation• Will drive the rest of the evaluation (experiment)

• Sampling procedure• Baseline data collected• Level of analysis • Unit of analysis• Level of data collected (measurement)• Categorical • Nominal• Interval• Continuous• Interval • Ratio• Statistical procedure• How the conclusion is stated

The importance of the research question cannot be overstated. Must start thinking about it early.

Research Question

• Who is eligible• Able to select some and not others• Is it gold plated or is it feasible to implement

program if successful on a larger scale

Program Determination

Theory of Change

• Who is the target• What are their needs• What is the program seeking to change• What is the precise program or part of

program being evaluated• Intermediate indicators• Final outcomes• What are the measureless

Needs Assessment• Who is the target population• What is the nature of the problems being solved• How does the service fit into the environment,

curriculum• Consideration for support and professional

development• Clear sense of need program will fill• Clear sense of alternatives• Use data (qualitative and quantitative) to identify

needs

Logical Framework

Needs Input Output Outcome Impact

Low income, and low learning, with limited or no access to technology

School District purchases tablet devices.

Children use tablets and are able to study better.

Higher test scores.

Better standard of living. Long term.

Evaluation Components

• Needs Assessment• Process Evaluation• Impact Evaluation• Review• Cost-Benefit Analysis• Extended Impacts – unexpected results or

outcomes

Process Evaluation

• Are the services being delivered• Can it be done at a lower cost• Are the services reaching the intended

population• Support for participants– Teachers– Students– Parents– Other stakeholders

Impact Evaluation

• What impact did the program have on?? • How does it relate to the theory of change?– Intermediate indicators– Final outcomes

• Distributional Questions• Program or Treatment satisfaction

Part of the objective of evaluation is to determine if a program

(treatment)/Independent variable impacts another dependent variable

• Will be based on the objectives of the evaluation• Will drive the rest of the evaluation(experiment)

• Sampling procedure• Baseline data collected• Level of analysis • Unit of analysis• Level of data collected (measurement)• Categorical • Nominal• Interval• Continuous• Interval • Ratio• Statistical procedure• How the conclusion is stated

• The importance of the research question cannot be overstated

Research Question

Questions to think about related to research question

To whom are the conclusions of the evaluation being applied?

• Just the group from which the control group and

the treatment group was taken• To a group larger than just the sample group• District 00-0000 school children• ESU 00 School children• Nebraska School children• USA school children

SamplingIs all about to whom or what you want generalize your results

Will make or break your external validity

Population from which you sample will be as far as you can generalize your results

If you want to know about all students in the USA each student in the US would need to have the equal probability of being chosen for the

sample

Sample must represent the population if results are to be

applied

And results can not be applied to separate population with different characteristics even if one is similar

y

x

Decision Do you have census data

Yes• You do not need to worry about sampling procedure at times

• May still need to randomize assignment to control and treatment group• Often time the analysis will be based on Practical conclusions not probabilities

• Results will only be applied to that group

No• Must be concerned about sampling as well as assignment

• Analysis will be based on probabilities (statistics)

Note: There are different ways to think about the definition of census data• All students in a school

• All students in and ESU’s boundary

Scale of measurement for independent variable

Present/Absentor

How much present/absent • requires more than one treatment group

• Means smaller sample size for each group(less powerful test)

More Things to Consider

Even more things to consider

Scale of measurement for dependent variableImpact Yes/No or How much impact

Will determine type of data collected continuous or categoricalWhich determines data analysis (statistical) procedure that is appropriate

• Usually best to collect most detailed level of data possible

• If continuous data is available collect at that level • Easy to move to categorical from continuous but not

the other way

Ideal method (true experiment)

1. Determine the population from which to sample

Determine how many treatment groups you will needWill need a group for each treatment and one for control Many times one treatment group and one control group

Caution: The more groups you break your sample into the more power you lose in your test as n size decreases

n size Power of test

Power of test = how sure you are of a relationship existing between two variablesWhen n size decreases confidence interval will be larger(wider)

How do we know that the variables we are measuring are related

Or is it other (extraneous) variables?

Student Variables

Home Support

Language Fluency

Learning style match

Reading program

Treatment GroupReading scores

increased

Control GroupReading scores

stayed the same

Motivation

Which student variables are related to the increase in reading scores

School Year

Read

ing

Scal

e Sc

ore

Aug. Jan.Nov. March May

Treatment GroupControl GroupBaseline Data

Treatment

Control

How were the treatment and control group members assigned?

All from one classroomAll from one side of the classroom

Could those other variables be grouped by the classroom or by the side of the classroom?

Three ways to be sure other variables are not the ones showing the relationship or to equalize the

impact on the outcome

• Control for all variables that may impact the result• List all variables (independent ) that may

impact the results (dependent variable)• Control for those “Other” extraneous

variables• Means hold them constant• Would need to quantify extraneous

variables

Method 1

Example of method 1

Start with 100 students Only take those of a certain Home support level x =70 students

Only use Students with a certain learning style Y 50=students Only take students with a motivation level Z 30=students

Small n lower level of power in testMay only have 15 in each group when divided into treatment and control

Measure each of the extraneous variables

Potentially creates a more statistically and logistically complex analysis like:

Multiple regression models versus a comparison of means(T-test or ANOVA)

May also increase the complexity of implementationWould need to be able to quantify the extraneous variables

Method 2

Randomly assign students to the control and treatment group

Random meaning that each subject has the equal probability of being selected for the treatment or control group

Added benefitStill have 50 in each group= more powerful test!

Method 3

Treatment Group Control Group

!!!

Sample must represent the population if results are to be applied

Randomly assigned

Spreads out extraneous variables other than the reading program evenly in both groups

So we can assume to a greater degree that the reading program is related to the change because the other (extraneous) variables

should impact both groups the same.

Why Random?

Non-Random assignment

Treatment Group Control GroupSubject 1 motivation =10 Subject 1motivation=3Subject 2 motivation= 8 Subject 2 motivation=4Subject 3 motivation= 9 Subject 3 motivation=3Subject 4 motivation=9 Subject 4 motivation=3

If the reading program had no influence the treatment group may still show post test scores higher than control

Was it reading program or motivation?

VS.

Random AssignmentTreatment Group Control GroupSubject 1 motivation=3 Subject 1 motivation=9Subject 2 motivation=10 Subject 2 motivation=4Subject 3 motivation= 3 Subject 3 motivation=8Subject 4 motivation=9 Subject 4 motivation=3

Now the impact of motivation is distributed equally between the two groupsResults more likely to indicate a relationship between the variables

Steps

Determine Population to be sampled To whom do you wish to generalize the results

Determine which level of subject to be randomized over

What Level to Randomize

• School or students in a school• Some of this will be determined by the

research question• The higher level of randomization the harder it

will be to obtain a large sample size• But the easier to generalize to larger groups• Must find the balance

Assign Each Subject a Number

Use a random number generator or table to pick numbers for the • Treatment group• Control group

Assign subjects to the groups based on results of table or number generator

At this point researchers will sometimes do a check of the groups for homogeneity between the

groups on variables determined to randomized over

This can add a considerable cost and time to the evaluation

Baseline data

For the variable you are measuring collect values for that variable before you start the treatment

Treatment groupControl group

This may also be called pre-test data

Test (NeSA, NRT etc.) offer this automatically but if it is locally collected data this must be considered.

Due Dates

You’ll let us

submit late

Sample Fall CalendarData Collection/Form Due Date

Review Period Ending date – FIINAL NO CHANGES

Non Certificated Staff 31-Oct 15-Nov

Carl D. Perkins Career and Technical Education Act Report 10-Aug 15-Nov

Summer School Supplement 31-Aug 15-Nov

Elementary Class Size 15-Oct 15-Nov

Summer School Student Unit 15-Oct 15-Nov

Elementary Site Allowance 15-Oct 15-Nov

Assessed Valuation and Levies 15-Oct 15-Nov

Two-Year New School Adjustment Application 15-Oct 15-Nov

Student Growth Adjustment 15-Oct 15-NovInstructional Time 15-Oct 15-Nov

PK Instructional Program Hours/K Program 15-Oct 15-NovNSSRS Staff Data 31-Oct 15-Nov

Membership (last Friday in September) 31-Oct 15-NovSPED Child Count 31-Oct 15-NovEC Program Participation 31-Oct 15-Nov

Nonpublic Membership 31-Oct 15-Nov

SAMPLE

Administer the treatmentWatch the process

One group getting the treatment impacts the results of the control group

Students that got books (treatment) shares the books with friends who did not (control)

Post TestCollect post test values

CompareData analysis

Pre-test control to post test controlPre-test treatment to post-test treatment

Greater difference between pre and post results in the treatment

group = There is a relationship between the independent and dependent variable

There is a relationship between the treatment (independent variable) and the outcome (dependent variable) being measuring

School Year

Read

ing

Scal

e Sc

ore

Aug. Jan.Nov. March May

Treatment GroupControl GroupBaseline Data

Treatment

Control

Ethics

Is it ok for one group to receive the treatment and not another?

How can you deny one group the treatment?

Another way to think about itRandomizing is not always an ethics problem

• If the treatment is new the control group is still getting the old services

• If you are worried that the new treatment will not be as good as the old give both to the treatment group

Is this really any worse than they were under the old program?

At times ethics may not allow for a true randomized experiment

Once you know the new program works better then it would be a problem to continue to deny someone

or if it appears to be harmful then you would not want to continue to give it to a group

Prezi

http://prezi.com/iz2lojq4aozz/evaluation-framework/

Needs analysis

Program determination

Evaluation design

Baseline data

collectionProgram

implementation

Post implemation

data collection

Data analysis

Conclusion

Terminate, continue, or expand program

Logistics

At times logistic may make a true experimental design impossible or cost prohibitive

If logistics or ethics determine the inability to conduct a true randomized

experiment evaluation work is still possible

Quasi-experimental designs

Time Series designNonequivalent control groupMultiple time series design

Control of extraneous variables if sample size is large enough (method 1)

If true randomization is not possible there are other means of implementation such as

• Phase in

• Rotation

• Encouragement

Qualitative methods

• Can be used in addition to quantitative methods•Often times qualitative work before the

quantitative design• Can help outline possible variables to measure as well as those (extraneous) that may influence the

results• Possibly used in the Needs Analysis

evaluation framework

Documents