What Does Research Tell Us About Identifying Effective Teachers?
Jonah RockoffColumbia Business School
Nonprofit Leadership Forum, May 2010
2
First, Let’s Define “Effective”
● Can be an inputs based concept– Observable actions or characteristics
● Can be outcomes based concept– Measured by student success
● Recent work of economists focuses on outcomes– Use a value-added approach– Outcomes measured are typically standardized exams
in math and reading, usually elementary/middle school
● Movement to bring rigorous analysis to teacher evaluations based on in-class observation
3
Basics of Value Added Analysis
● VA is all about comparing actual student outcomes to a counterfactual expectation
● Suppose we knew the “right” counterfactual expectation for each child, call it A*– Expected achievement w/ some basic level of
educational quality (e.g., “the average teacher”)
● Subtract expectation (A*) from actual student achievement (A); call this G
● To get VA for a teacher, take the average G across all of the students she taught
4
Setting Expectations
● How to set up the counterfactual expectation is the big question in value-added work
● Typically, we estimate expectations with data– Example: set expectation (A*) as the average
achievement of students w/ the same prior test scores
● Quality of estimates contingent on quality of the data and the process that generates it– Expectations set too low make teachers look good;
expectations set too high make them look bad
5
Potential Statistical Problems
● #1: Systematic sorting of students– Concern here is bias – Unfair treatment of teachers that is systematic
• Example: P’s friends get “easier” kids
● #2: Instability of VA estimates– Concern here is imprecision– If estimates are very noisy, using them for
rewards/consequences means lots of mistakes– Also means it may be a poor motivational tool
6
Basic Findings from VA Research
● Substantial variation in VA across teachers– 1 s.d. in VA 0.1 to 0.2 s.d. in achievement– A bit more variation in math than reading/ELA– Much of the variation is within schools
● VA estimates appear to contain real power to predict teacher effectiveness as measured by student achievement– Stability across years is enough to appear
useful in teacher evaluation– Bias is not a big deal overall, though it could
matter for individual teachers
7
0.1
.2.3
-5 -2.5 0 2.5 5Students' Percentile Rank, Relative to Peers
0.0
5.1
.15
.2.2
5
-6 -3 0 3 6Students' Percentile Rank, Relative to Peers
Bottom Quartile Top Quartile
Results on Stability from KS, KRS
(1) Group Teachers, Years 1/2(2) Compare in Years 3/4(3) Large Persistent Differences
8
Why Get Excited About Value Added?
● Why not just hire good teachers?– Wise selection is the best means of improving the school
system, and the greatest lack of economy exists wherever teachers have been poorly chosen.• Frank Pierrepont Graves, NYS Commissioner, 1932
● Because it is, unfortunately, easier said than done– Decades of work on type of certification, graduate
education, exam scores, GPA, college selectivity – (Very) small, positive effects on student outcomes
● Rockoff et al. (2008): non-traditional predictors– Personality, content knowledge, cognitive ability, self-
efficacy, commercial teacher selection test score– Result: no silver bullets, but moderate power to
distinguish when pool measures into an index
9
What You Get is What You See?
● Why not identify individuals likely to be effective teachers through direct observation of teaching?
● There is consistent evidence that subjective evaluations of existing teachers are strongly related to gains in student achievement – Research extends back nearly a century
• Hill (‘21), Gotham (‘45), Brookover (’45), Anderson (’54)
– More recent analysis focuses on rubric-based teaching evaluations and principal opinions• Schacter & Thum, Milanowski, Tyler et al., Rockoff & Speroni,
Jacob & Lefgren, Harris & Sass, Rockoff et al.
10
Less Math, but No Less Difficult
● One nice aspect of subjective evaluation is it does not rely on complicated formulae
● However, the details of how evaluation is done present issues similar to VA analysis– Context (Does one size fit all?)– Focus (What goes on the evaluation form?)– Bias (Are evaluators fair and impartial?)– Imprecision (A few lessons a whole year?)
11
A (Modest?) Proposal
● Provide VA estimates to principals– Help them with the problem of estimating A*– Let them combine VA with other information
(e.g., observation) to evaluate teachers
● NYC has done this: “Teacher Data Reports” – Piloted in using a randomized control trial
• “Treatment principals” received reports and training
– Rockoff et al. (2009) study this pilot using baseline and follow-up surveys of principals
12
Principals’ Evaluations and VA0
12
34
Pro
port
ion
of T
each
ers
-.4 -.2 0 .2 .4Value Added Score for Math
Very Poor Poor
Fair GoodVery Good Exceptional
Note: The Value Added Score is 'All Schools, All Teachers, Same Grade'measuredFor the Math test the student-level standard deviation is approximately:0.70 in 4th grade, 0.77 in 5th, 0.81 in 6th. 0.79 in 7th, and 0.74 in 8th.
by Principal's RatingTeacher Impacts on Math Performance
● Substantial variation in baseline evaluations● Strong relationship with VA estimates
0.1
.2.3
.4P
erce
nt o
f T
each
ers
Very P
oor
Poor
FairGoo
d
Very G
ood
Excep
tiona
l
Baseline Overall Evaluation
13
New and Useful Information?
● Were treatment principals’ evaluations affected by the VA reports?– Are the effects greater for more precise VA?
Treatment Control Difference Treatment Control Difference
MathValue-added Score 0.360** -0.075 0.436** 0.208** 0.067 0.140*
(0.111) (0.083) [0] (0.060) (0.042) [0.022]Principal's Pre-experiment Rating 0.552** 0.724** -0.172+ 0.555** 0.705** -0.15
(0.079) (0.077) [0.069] (0.094) (0.081) [0.146]R-squared 0.684 0.734 0.713 0.635Sample Size 340 336 275 295English Language Arts (ELA)Value-added Score 0.285* 0.049 0.236+ 0.021 0.062 -0.041
(0.125) (0.091) [0.07] (0.065) (0.057) [0.572]Principal's Pre-experiment Rating 0.539** 0.747** -0.208* 0.590** 0.632** -0.042
(0.091) (0.082) [0.045] (0.085) (0.101) [0.702]R-squared 0.645 0.72 0.662 0.635Sample Size 313 322 270 285
Confidence Interval of Teacher's Value-Added EstimateMore Precise than Median Teacher Less Precise than Median Teacher
Treatment Control DifferenceMathValue-added Score, Multi-year, Peer Comparison 0.185** 0.034 0.150**
(0.036) (0.035) [0.002]Principal's Pre-experiment Rating 0.631** 0.718** -0.087
(0.046) (0.042) [0.157]Experience Controls Y YR-squared 0.473 0.468Sample Size 615 631English Language Arts (ELA)Value-added Score, Multi-year, Peer Comparison 0.025 0.003 0.022
(0.038) (0.036) [0.672]Principal's Pre-experiment Rating 0.671** 0.696** -0.025
(0.044) (0.046) [0.69]Experience Controls Y YR-squared 0.439 0.419Sample Size 583 607
14
In Conclusion
● Identifying highly effective teachers is near impossible if all you have to go on is a CV
● Value-added and in-class observation offer potential insight into this problem– Both, of course, are imperfect
● Innovative evaluation policies that begin to harness this information can raise teacher quality and improve student outcomes