Download - What Does Research Tell Us About Identifying Effective Teachers?

What Does Research Tell Us About Identifying Effective Teachers?

Jonah RockoffColumbia Business School

Nonprofit Leadership Forum, May 2010

2

First, Let’s Define “Effective”

● Can be an inputs based concept– Observable actions or characteristics

● Can be outcomes based concept– Measured by student success

● Recent work of economists focuses on outcomes– Use a value-added approach– Outcomes measured are typically standardized exams

in math and reading, usually elementary/middle school

● Movement to bring rigorous analysis to teacher evaluations based on in-class observation

3

Basics of Value Added Analysis

● VA is all about comparing actual student outcomes to a counterfactual expectation

● Suppose we knew the “right” counterfactual expectation for each child, call it A*– Expected achievement w/ some basic level of

educational quality (e.g., “the average teacher”)

● Subtract expectation (A*) from actual student achievement (A); call this G

● To get VA for a teacher, take the average G across all of the students she taught

4

Setting Expectations

● How to set up the counterfactual expectation is the big question in value-added work

● Typically, we estimate expectations with data– Example: set expectation (A*) as the average

achievement of students w/ the same prior test scores

● Quality of estimates contingent on quality of the data and the process that generates it– Expectations set too low make teachers look good;

expectations set too high make them look bad

5

Potential Statistical Problems

● #1: Systematic sorting of students– Concern here is bias – Unfair treatment of teachers that is systematic

• Example: P’s friends get “easier” kids

● #2: Instability of VA estimates– Concern here is imprecision– If estimates are very noisy, using them for

rewards/consequences means lots of mistakes– Also means it may be a poor motivational tool

6

Basic Findings from VA Research

● Substantial variation in VA across teachers– 1 s.d. in VA 0.1 to 0.2 s.d. in achievement– A bit more variation in math than reading/ELA– Much of the variation is within schools

● VA estimates appear to contain real power to predict teacher effectiveness as measured by student achievement– Stability across years is enough to appear

useful in teacher evaluation– Bias is not a big deal overall, though it could

matter for individual teachers

7

0.1

.2.3

-5 -2.5 0 2.5 5Students' Percentile Rank, Relative to Peers

0.0

5.1

.15

.2.2

5

-6 -3 0 3 6Students' Percentile Rank, Relative to Peers

Bottom Quartile Top Quartile

Results on Stability from KS, KRS

(1) Group Teachers, Years 1/2(2) Compare in Years 3/4(3) Large Persistent Differences

8

Why Get Excited About Value Added?

● Why not just hire good teachers?– Wise selection is the best means of improving the school

system, and the greatest lack of economy exists wherever teachers have been poorly chosen.• Frank Pierrepont Graves, NYS Commissioner, 1932

● Because it is, unfortunately, easier said than done– Decades of work on type of certification, graduate

education, exam scores, GPA, college selectivity – (Very) small, positive effects on student outcomes

● Rockoff et al. (2008): non-traditional predictors– Personality, content knowledge, cognitive ability, self-

efficacy, commercial teacher selection test score– Result: no silver bullets, but moderate power to

distinguish when pool measures into an index

9

What You Get is What You See?

● Why not identify individuals likely to be effective teachers through direct observation of teaching?

● There is consistent evidence that subjective evaluations of existing teachers are strongly related to gains in student achievement – Research extends back nearly a century

• Hill (‘21), Gotham (‘45), Brookover (’45), Anderson (’54)

– More recent analysis focuses on rubric-based teaching evaluations and principal opinions• Schacter & Thum, Milanowski, Tyler et al., Rockoff & Speroni,

Jacob & Lefgren, Harris & Sass, Rockoff et al.

10

Less Math, but No Less Difficult

● One nice aspect of subjective evaluation is it does not rely on complicated formulae

● However, the details of how evaluation is done present issues similar to VA analysis– Context (Does one size fit all?)– Focus (What goes on the evaluation form?)– Bias (Are evaluators fair and impartial?)– Imprecision (A few lessons a whole year?)

11

A (Modest?) Proposal

● Provide VA estimates to principals– Help them with the problem of estimating A*– Let them combine VA with other information

(e.g., observation) to evaluate teachers

● NYC has done this: “Teacher Data Reports” – Piloted in using a randomized control trial

• “Treatment principals” received reports and training

– Rockoff et al. (2009) study this pilot using baseline and follow-up surveys of principals

12

Principals’ Evaluations and VA0

12

34

Pro

port

ion

of T

each

ers

-.4 -.2 0 .2 .4Value Added Score for Math

Very Poor Poor

Fair GoodVery Good Exceptional

Note: The Value Added Score is 'All Schools, All Teachers, Same Grade'measuredFor the Math test the student-level standard deviation is approximately:0.70 in 4th grade, 0.77 in 5th, 0.81 in 6th. 0.79 in 7th, and 0.74 in 8th.

by Principal's RatingTeacher Impacts on Math Performance

● Substantial variation in baseline evaluations● Strong relationship with VA estimates

0.1

.2.3

.4P

erce

nt o

f T

each

ers

Very P

oor

Poor

FairGoo

d

Very G

ood

Excep

tiona

l

Baseline Overall Evaluation

13

New and Useful Information?

● Were treatment principals’ evaluations affected by the VA reports?– Are the effects greater for more precise VA?

Treatment Control Difference Treatment Control Difference

MathValue-added Score 0.360** -0.075 0.436** 0.208** 0.067 0.140*

(0.111) (0.083) [0] (0.060) (0.042) [0.022]Principal's Pre-experiment Rating 0.552** 0.724** -0.172+ 0.555** 0.705** -0.15

(0.079) (0.077) [0.069] (0.094) (0.081) [0.146]R-squared 0.684 0.734 0.713 0.635Sample Size 340 336 275 295English Language Arts (ELA)Value-added Score 0.285* 0.049 0.236+ 0.021 0.062 -0.041

(0.125) (0.091) [0.07] (0.065) (0.057) [0.572]Principal's Pre-experiment Rating 0.539** 0.747** -0.208* 0.590** 0.632** -0.042

(0.091) (0.082) [0.045] (0.085) (0.101) [0.702]R-squared 0.645 0.72 0.662 0.635Sample Size 313 322 270 285

Confidence Interval of Teacher's Value-Added EstimateMore Precise than Median Teacher Less Precise than Median Teacher

Treatment Control DifferenceMathValue-added Score, Multi-year, Peer Comparison 0.185** 0.034 0.150**

(0.036) (0.035) [0.002]Principal's Pre-experiment Rating 0.631** 0.718** -0.087

(0.046) (0.042) [0.157]Experience Controls Y YR-squared 0.473 0.468Sample Size 615 631English Language Arts (ELA)Value-added Score, Multi-year, Peer Comparison 0.025 0.003 0.022

(0.038) (0.036) [0.672]Principal's Pre-experiment Rating 0.671** 0.696** -0.025

(0.044) (0.046) [0.69]Experience Controls Y YR-squared 0.439 0.419Sample Size 583 607

14

In Conclusion

● Identifying highly effective teachers is near impossible if all you have to go on is a CV

● Value-added and in-class observation offer potential insight into this problem– Both, of course, are imperfect

● Innovative evaluation policies that begin to harness this information can raise teacher quality and improve student outcomes

Download - What Does Research Tell Us About Identifying Effective Teachers?

Top Related