evaluation requirements for msp and characteristics of designs to estimate impacts with confidence...

Evaluation Requirements for MSP and Characteristics of Designs to Estimate

Impacts with Confidence

Ellen BobronnikovFebruary 16, 2011

Evaluation is a requirement of all ED MSPs

• According to the statute:– Each MSP project is required to develop an evaluation plan

that includes rigorous objectives that measure the impact of activities.

• However, the type of evaluation design is not specified. The MSP Program only requires projects to report on two aspects of evaluation findings:1) Teacher gains in content knowledge based on pre- and post-

testing; and 2) Proficiency levels on state-level assessments of students of

teachers who received professional development.

• Rigorous evaluations demonstrating that impacts are caused by the program are encouraged, but they may not be appropriate for small projects in their early stages. If your project prepares an evaluation report, this should be attached to the APR.

3

According to the 2007 Report of the Academic Competitive Council

“Successful, large-scale interventions to improve STEM education are unlikely to arise without serious study and trial and error. There is a critical pathway for the development of successful educational interventions and activities, starting generally with small-scale studies to test new ideas and generate hypotheses, leading to increasingly larger and more rigorous studies to test the effect of a given intervention or activity on a variety of students in a variety of settings. Different research methodologies are used along the development pathway, and corresponding evaluation strategies must be used to assess their progress.”

Hierarchy of Study Designs for Evaluating Effectiveness

Overview

• Criteria for Classifying Designs of MSP Evaluations (“the rubric”) created through the Data Quality Initiative (DQI)

• Rubric’s key criteria for a rigorous design

• Common issues with evaluation reports

• Recommendations for better reporting

• Preliminary framework for review standards for all evaluation reports

6

Evaluations Reviewed Using the Rubric

• All final year evaluations that report using an experimental or quasi-experimental design are considered for review

• Evaluations need to include a comparison group to ultimately be reviewed with the rubric

• Within each project, we review evaluations of teacher content knowledge, classroom practices, and student achievement

7

6 criteria used in rubric

Rubric comprises six criterion:1. Equivalence of groups at baseline

2. Adequate sample size

3. Use of valid & reliable measurement instruments

4. Use of consistent data collection methods

5. Sufficient response and retention rates

6. Reporting of relevant statistics

8

Criterion 1 – Baseline Equivalence

Requirement

• Study demonstrates no significant differences in key characteristics between treatment and comparison groups at baseline (for the analytic sample) OR

• Adequate steps were taken to address the lack of baseline equivalence in the statistical analysis

Purpose – Helps rule out alternative explanations for differences between groups

9

Criterion 2 – Sample Size

Requirement

• Sample size is adequate to detect a difference, based on a power analysis using:

– Significance level = 0.05,

– Power = 0.8

– Minimum detectable effect informed by the literature or otherwise justified

• Alternatively, meets or exceeds “rule of thumb” sample sizes:

– School/district-level interventions: 12 schools

– Teacher-level interventions: 60 teachers (teacher outcomes) or 18 teachers (student outcomes)

Purpose – Increases the likelihood of finding an impact

10

Criterion 3 – Measurement Instruments

Requirement – Data collection instruments used were shown to be valid and reliable to measure key outcomes

• Use existing instruments that have already been deemed valid and reliable Refer to TCK instrument database developed by MSP Knowledge Management and Dissemination Project at http://mspkmd.net/ OR

• Create new instruments that have either been:

– Sufficiently tested with subjects comparable to the study sample and found to be valid and reliable, OR

– Created using scales and items from pre-existing data collection instruments that have been validated and found to be reliable

Resulting instrument needs to include at least 10 items, and at least 70 percent of the items are from the validated and reliable instrument(s)

Purpose – Ensure that instruments used accurately capture the intended outcomes

11

Criterion 4 – Data Collection Methods

• Requirement - Methods, procedures, and timeframes used to collect the key outcome data from treatment and comparison groups are comparable

• Purpose – Limits possibility that observed differences can be attributed to factors besides the program, such as passage of time and differences in testing conditions

12

Criterion 5 – Attrition

Requirement

• Need to measure key outcomes for at least 70% of original sample (both treatment and control groups), or evidence that attrition is unrelated to treatment

• If the attrition rates between groups equal or exceed 15 percentage points, difference should be accounted for in the statistical analysis

Purpose – Helps ensure that sample attrition does not bias results as participants/control group members who drop out may systematically differ from those who remain

13

Criterion 6 – Relevant Statistics Reported

Requirement

• Include treatment and comparison group post-test means and tests of significance for key outcomes OR,

• Provide sufficient information for calculation of statistical significance (e.g., mean, sample size, standard deviation/standard error)

Purpose – Provides context for interpreting results, indicating where observed differences between groups are most likely larger than what chance alone might cause

14

Common Issues Found in Evaluation Reports

• Information critical for complete assessment of all criteria is often not reported, inconsistently reported, or only reported for the treatment group

– Pre & post sample sizes for both groups, means, standard deviations/ errors are frequently missing – these are needed for statistical testing and to calculate attrition rates

– Varying sample sizes throughout report without explanations for changes

– Validity and reliability testing not reported for locally – developed instruments or cited for pre-existing instruments

– Data collection methods are not discussed

Key Recommendation – Report the Details

• Describe the intervention in detail, including such information as: the format of the training, the background of the trainer(s), the material covered, the frequency of classes, and the numbers of participants.

• Report pre & post sample sizes for both groups and explain changes in samples sizes; if reporting sub-groups, indicate their sample sizes as well

• Report key characteristics associated with outcomes at baseline (e.g., pretest scores, teaching experience)

• Document and describe the data collection procedures

• Report means, standard deviations/errors, for both groups on key outcomes; if using a regression model, describe it

• Report results from appropriate significance testing of differences observed between groups (e.g., t-statistics or p-values)

Preliminary Framework for Review Standards for Evaluation Reports• Preliminary framework based on generally acceptable

standards for high quality research studies.

– General Framework and Background

– Data Sources and Methods

– Presentation Of Findings And Placing Them In Context

– Report Summary

Mathematics and Science Partnership (MSP) Programs

U.S. Department of EducationBaltimore Regional Meeting

February 16, 2011

evaluation requirements for msp and characteristics of designs to estimate impacts with confidence...

Documents