review of student assessment data reading first in massachusetts presented online april 13, 2009...

46
Review of Student Assessment Data Reading First in Massachusetts Presented Online April 13, 2009 Jennifer R. Gordon, M.P.P. Research Manager

Upload: christopher-oconnor

Post on 16-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Review of Student Assessment Data

Reading First in Massachusetts

Presented Online

April 13, 2009

Jennifer R. Gordon, M.P.P.

Research Manager

2

Questions Addressed Today

• Have student assessment results in participating schools improved over time?

• Is there evidence that RF is closing the performance gap for key demographic subgroups?

• How effective is instruction for students who entered the year with varying levels of performance?

• How do students in participating schools perform on the third grade MCAS?

• What are the key factors differentiating students who do and do not attain proficiency on the state’s 3rd grade reading test?

3

Cross-sectional analysis of grade-level changes

• Changes in the demographic profile over time likely to impact observed outcomes

• Analysis utilizes a mixed model regression procedure (similar to HLM) controlling for demographic differences in the schools and students being measured

– Multi-level repeated measures model with observations (students) nested within larger units (schools)

– Student outcomes (changes over time) modeled as a function of both student-level and school-level factors

– Statistical significance (p ≤ 0.05) indicates that the observed outcome is more than just a function of the change in demography

4

Have student assessment results in participating schools improved over time?

Massachusetts relies primarily on results from the DIBELS ORF and GRADE assessments to address the following federal evaluation criteria

• Increase in percentage of students performing “at or above grade-level”– DIBELS “Low Risk” and GRADE “Average/Strength”

• Decrease in percentage of students with “serious reading difficulties”– DIBELS “At Risk” and GRADE “Weak”

• Overall results show that Massachusetts has met these criteria for all grade-levels

5

DIBELS ORF – RF Cohort 1 Percent Low Risk by grade

5060 63 65 66

4150

5559 60

37 43 4754 53

0

20

40

60

80

100

Grade 1 Grade 2 Grade 3

Spring 2004 Spring 2005 Spring 2006 Spring 2007 Spring 2008

All cumulative changes from 2004 to 2008 are statistically significant

6

DIBELS ORF – RF Cohort 1 Percent At Risk by grade

2216 14 14 12

3630 25 23 22

2824

20 17 17

0

20

40

60

80

100

Grade 1 Grade 2 Grade 3

Spring 2004 Spring 2005 Spring 2006 Spring 2007 Spring 2008

All cumulative changes from 2004 to 2008 are statistically significant

7

DIBELS ORF – RF Cohort 1Change in Mean Score (Words Correct per Minute)

Spring 2004 Spring 2008

Grade Benchmark N Mean Score N Mean Score Change

1 40 3756 46.43 3688 57.80 11.37

2 90 3679 81.08 3522 95.34 14.26

3 110 3676 97.00 3352 110.20 13.20

• All improvements in mean scores from 2004 to 2008 are statistically significant after controlling for demographic shifts over time.

• All spring 2008 means are higher than spring 2007 means (not shown).

• In spring 2004 only first grade mean score met the benchmark. By spring 2008, mean scores for all grades are at or above benchmark.

8

DIBELS ORF – RF Cohort 2 Percent Low Risk by grade

4855 56

60

3847

53 51

3742 45

48

0

20

40

60

80

100

Grade 1 Grade 2 Grade 3

Spring 2005 Spring 2006 Spring 2007 Spring 2008

All cumulative changes from 2005 to 2008 are statistically significant

9

DIBELS ORF – RF Cohort 2 Percent At Risk by grade

26 21 20 20

3933

28 30 29 26 24 21

0

20

40

60

80

100

Grade 1 Grade 2 Grade 3

Spring 2005 Spring 2006 Spring 2007 Spring 2008

All cumulative changes from 2005 to 2008 are statistically significant

10

DIBELS ORF – RF Cohort 2Change in Mean Score (Words Correct per Minute)

Spring 2005 Spring 2008

Grade Benchmark N Mean Score N Mean Score Change

1 40 1821 42.99 1714 50.82 7.83

2 90 1769 77.82 1785 87.17 9.35

3 110 1875 95.55 1746 103.29 7.74

• All improvements in mean scores from 2005 to 2008 are statistically significant after controlling for demographic shifts over time.

• First grade mean scores for both spring 2005 and spring 2008 exceed the benchmark.

11

GRADE Total Test – RF Cohort 1 Percent Average/Strength by grade

62 68 70 71 72

60 6367 66 69

61 63 66 67 68

0

20

40

60

80

100

Grade 1 Grade 2 Grade 3

Spring 2004 Spring 2005 Spring 2006 Spring 2007 Spring 2008

All cumulative changes from 2004 to 2008 are statistically significant

12

GRADE Total Test – RF Cohort 1 Percent Weak by grade

2520 19 18 17

2320 17 18 15

23 2218 19 18

0

20

40

60

80

100

Grade 1 Grade 2 Grade 3

Spring 2004 Spring 2005 Spring 2006 Spring 2007 Spring 2008

All cumulative changes from 2004 to 2008 are statistically significant

13

Interpretation of Changes in Mean Standard Score

• Source: Journal of School Improvement, formerly published by the North Central Association Commission on Accreditation and School Improvement

Magnitude of Gain Interpretation

0.10 - 0.19 SU meaningful; worth mentioning

0.20 – 0.29 SU quite good

0.30 SU or greater substantial; impressive

(www.ncacasi.org/jsi/2000v1i2/standard_score)

14

GRADE Total Test – RF Cohort 1Change in Mean Std. Score

Spring 2004 Spring 2008

Grade NMean Std.

ScoreN

Mean Std. Score

Change in Std Units

Interpretation

1 3729 100.81 3713 104.88 0.27 Quite Good

2 3636 99.08 3536 102.33 0.22 Quite Good

3 3648 99.18 3369 101.43 0.15 Meaningful

• Standard score of 100 is average for student’s grade. Standard deviation of standard score is 15.

• All changes in mean score (not shown) are statistically significant

• Interpretation taken from Journal of School Improvement

15

GRADE Total Test – RF Cohort 2 Percent Average/Strength by grade

5562 62 63

5054

58 5649 53 55 55

0

20

40

60

80

100

Grade 1 Grade 2 Grade 3

Spring 2005 Spring 2006 Spring 2007 Spring 2008

Cumulative changes for grades 1 and 2 are statistically significant

16

GRADE Total Test – RF Cohort 2 Percent Weak by grade

3125 24 24

32 27 23 2635

30 29 26

0

20

40

60

80

100

Grade 1 Grade 2 Grade 3

Spring 2005 Spring 2006 Spring 2007 Spring 2008

Cumulative changes for grades 1 and 3 are statistically significant

17

GRADE Total Test – RF Cohort 2Change in Mean Std. Score

Spring 2005 Spring 2008

Grade NMean Std.

ScoreN

Mean Std. Score

Change in Std Units

Interpretation

1 1836 97.50 1702 100.66 0.21 Quite Good

2 1806 94.83 1766 97.56 0.18 Meaningful

3 1918 94.33 1741 97.28 0.25 Quite Good

• Standard score of 100 is average for student’s grade. Standard deviation of standard score is 15.

• All changes in mean score (not shown) are statistically significant

• Interpretation taken from Journal of School Improvement

18

GRADE – Schools with 80% or more at benchmarkAll Reading First cohorts

•Haverhill/Walnut Square (92%)•Plymouth/West (90%)•Westfield/Moseley (89%)•Narragansett/Baldwinville (86%)•Plymouth/South (86%)•Revere Garfield (85%)•Taunton/Walker (84%)•Cambridge/Haggerty (82%)•Community Day Charter (81%)•Methuen/Tenney (81%)•Westfield/Franklin Ave (80%)•Boston Renaissance (80%)

Since they began program implementation, about 70 percent of RF schools have demonstrated increases in the proportion of students in the average/strength category AND decreases in the proportion of students in the weak category.

These included about 27 percent of schools which showed substantial improvement, with average/strength increases AND weak decreased of at least 10 percentage points.

19

Is there evidence that RF is closing the performance gap for key demographic subgroups?

• Nearly demographic subgroups have shown improvement in overall reading skills as measured by GRADE.

– The exception is for African American students in RF Cohort 2 who have shown a very small decline in A/S performance

• Of particular note are subgroups with levels of improvement which meaningfully exceed the general population

– An indication that the performance gap for these students is narrowing

– Cohort 1: SPED, LEP, Hispanic

– Cohort 2: LEP

• There were no subgroups with levels of improvement that were meaningfully smaller than the general population

– An indication that the performance gap for these students is widening

20

GRADE Total Test – Third Grade SubgroupsRF Cohort 1 Percent Average/Strength

23 2329 28 29 28 26 29 30

34

50 55 58 60 62

0

20

40

60

80

100

SPED LEP Low Inc

Spring 2004 Spring 2005 Spring 2006 Spring 2007 Spring 2008

Cumulative change for low income students is statistically significant

21

GRADE Total Test – Third Grade Subgroups (cont)RF Cohort 1 Percent Average/Strength

74 75 78 81 81

5765 66

61 62

41 4550 52

57

0

20

40

60

80

100

White Black Hispanic

Spring 2004 Spring 2005 Spring 2006 Spring 2007 Spring 2008

Cumulative change for Hispanic students is statistically significant

22

RF Cohort 1 SubgroupsChange in GRADE Mean Std Score – 2004 vs. 2008

Group Grade 1 Grade 2 Grade 3

All Students 0.27 0.22 0.15

SPED * 0.46 * 0.33 0.23

LEP * 0.44 0.30 0.08

Low Income 0.35 0.24 0.22

Black 0.26 0.27 0.11

Hispanic * 0.39 0.28 * 0.31

Subgroup results compared to All Students** Quite good improvement * Meaningful improvement ^ Meaningful lag

23

GRADE Total Test – Third Grade SubgroupsRF Cohort 2 Percent Average/Strength

2129 33 33

44 47 50 52

0

20

40

60

80

100

LEP Low Inc

Spring 2005 Spring 2006 Spring 2007 Spring 2008

Cumulative changes are not statistically significant

24

GRADE Total Test – Third Grade Subgroups (cont)RF Cohort 2 Percent Average/Strength

72 75 75 77

3744

48 49

0

20

40

60

80

100

White Hispanic

Spring 2005 Spring 2006 Spring 2007 Spring 2008

Cumulative change for Hispanic students is statistically significant

25

RF Cohort 2 SubgroupsChange in GRADE Mean Std Score – 2005 vs. 2008

Group Grade 1 Grade 2 Grade 3

All Students 0.21 0.18 0.25

SPED Insufficient numbers for analysis

LEP * 0.33 0.25 0.32

Low Income 0.25 0.24 0.23

Black Insufficient numbers for analysis

Hispanic 0.23 0.26 0.29

Subgroup results compared to All Students** Quite good improvement * Meaningful improvement ^ Meaningful lag

26

How effective is instruction for students who entered the year with varying levels of performance?

Developed by Florida Center for Reading Research (FCRR) using DIBELS. Massachusetts uses GRADE to provide a measure of overall reading ability.

Effectiveness for Average/Strength Students: calculated for students scoring in the average/strength categories in the fall and provides the percentage of those students who are still scoring at that level in the spring. Effectiveness for Low Average Students: calculated for those students scoring in the low average category in the fall and provides the percentage of those students scoring at the average/strength level in the spring. Effectiveness for Weak Students: is calculated for those students scoring in the weak category in the fall and provides the percentage of those students scoring at low average or above in the spring.

27

Findings: Instructional Effectiveness

Among students who began the school year:

• In Average/Strength categories (stanines 5-9)– About 95% ended the year at or above benchmark– More than half improved their performance by one or more stanine

• In the Low Average category (stanine 4)

– About 70% ended the year in average/strength

– Instruction had a substantial impact at all grade levels and was most effective for first graders, especially in regard to moving from low average to strength

• In the Weak Category (stanines 1-3)

– More than half ended the year in low average or higher

– Instruction was most effective for first graders, about 47% moved from weak to average/strength

28

Effectiveness for “Average/Strength” Students (2007-2008 All RF Cohorts)

10 16 9

15

3436

19

22 29

2324

49

0

20

40

60

80

100

Grade 1 Grade 2 Grade 3

Declined w/in A/S Same stanine Improved in level Avg to Strength

93 96 9768

46 52

29

Effectiveness for “Low Average” Students (2007-2008 All RF Cohorts)

1325 26

40

63 63

56

36

0

20

40

60

80

100

Grade 1 Grade 2 Grade 3

Same stanine Low Avg to Avg Low Avg to Strength

7666 68

30

Effectiveness for “Weak” Students (2007-2008 All RF Cohorts)

1126 21

18

2926

34

25

171

2

13

0

20

40

60

80

100

Grade 1 Grade 2 Grade 3

Improved w/in Weak Weak to Low Avg Weak to Avg Weak to Strength

6556

44

31

How do students in participating schools perform on the third grade MCAS?

• Despite improvement on the DIBELS and GRADE, skills have not improved enough to yield improvement on the more challenging MCAS Reading test.

• Overall performance levels are lower, but the performance trend for RF students is consistent with statewide data showing declines in proficiency and increases in warning.

• Needs Improvement is more consistent with “grade-level” performance on nationally-normed assessments

– In 2008, 89 percent of students statewide met or exceeded NI as did 77 percent of RF students

32

Key Differences: GRADE Level 3 Form B compared to 2008 Grade 3 MCAS Reading

• Nature of the items– GRADE measures a combination of decoding and comprehension

skills whereas MCAS is almost exclusively comprehension questions– GRADE includes only multiple choice items whereas MCAS also

includes two open-response items.

Passage “difficulty”– GRADE totals 849 words with an average of 121 words per passage.

Passages range from 45 to 196 words. Predominantly text constructed specifically for the test.

– MCAS totals 4,221 words with an average of 603 words per passage. Passages range from 136 to 1,005 words. All text taken from literature.

33

MCAS Third Grade Reading TestStatewide Results – 2003 to 2008

7 7 7 8 9 11

30 30 31 34 32 33

63 63 6258 59

56

0

20

40

60

80

100

Warning * Needs Improvement* Proficient*

2003 2004 2005 2006 2007 2008

34

MCAS Third Grade Reading TestCohort 1 Results – 2003 to 2008

15 13 14 15 1721

42 43 4247 43 43 43 44 44

38 4036

0

20

40

60

80

100

Warning * Needs Improvement Proficient

2003 2004 2005 2006 2007 2008

35

MCAS Third Grade Reading TestCohort 2 Results – 2004 to 2008

18 21 22 2429

48 48 49 47 47

3431 29 29

24

0

20

40

60

80

100

Warning * Needs Improvement Proficient

2004 2005 2006 2007 2008

36

“Needs Improvement” is more consistent with “grade-level” performance on nationally-normed tests

GRADEStanine

2008 MCAS Performance Level (All RF Cohorts)

WarningNeeds

ImprovementProficient or Above

5 9.6% 74.9% 15.5%

6 1.6% 53.0% 45.4%

7 0.0% 24.6% 75.4%

8 0.0% 8.3% 91.7%

9 0.0% 2.2% 97.8%

37

2008 MCAS results – School-level

• Wide disparities in MCAS performance among schools– Proficiency

• 6 schools equal or better than the statewide rate of 56%• 28 schools at 25% proficiency or less

– Warning

• 11 schools had warning rates equal or better than the statewide rate of 11%, including 3 schools at 0%

• 19 schools had warning rates of 33% or more

• Only 11 schools showed both increases in proficiency and decreases in warning

– 3 schools with substantial improvement (10 or more points)

38

2008 MCAS– Top Performing RF Schools

Warning at or below state average• Westfield – Moseley (0%)• Plymouth – South Elem. (0%)• Westfield – Franklin Ave (0%)• Boston Renaissance Charter (3%)• Gill-Montague – Sheffield (3%)• Boston – Perkins (3%)• Plymouth – West Elem. (5%)• Chicopee – Stefanik (5%)• Robert M. Hughes Academy (9%)• North Adams – Brayton (10%)• West Springfield – Coburn (11%)

• Statewide warning is 11%

Proficiency at or above state average• Westfield – Moseley (78%)• Plymouth – South Elem. (75%)• Westfield – Franklin Ave (73%)• Boston Renaissance Charter (65%)• North Adams – Brayton (60%)• Plymouth – West Elem. (56%)

• Statewide proficiency is 56%

39

MCAS–Schools with “Substantial” ImprovementProficiency Increases and Warning Decreases of 10+ points

School Cohort Proficiency Warning

Chicopee – Stefanik 1 + 24 pts -17 pts

Westfield – Moseley 1 + 20 pts -10 pts

Lawrence – Arlington 1 + 11 pts -15 pts

40

What key factors differentiate students who do and do not attain proficiency on the MCAS?

• Conducted analysis for all RF and Silber 3rd graders with spring 2008 GRADE results in the average/strength categories (stanine 5-9)

• Compared performance of proficient and not-proficient students on the following items:

– DIBELS ORF: percent low risk

– GRADE subtests: percent at or above benchmark

– Individual MCAS passages and test questions (including multiple-choice vs. open-response items)

41

Key Factors in ProficiencyAll RF and Silber Cohorts

GRADE stanine 7-9

GRADE stanine 6

GRADE stanine 5

NP(333)

P(1746)

NP(850)

P(719)

NP(1445)

P(253)

DIBELS ORF – percent low risk

* 71% 86% * 56% 74% * 46% 59%

GRADE Passage Comp – percent A/S

* 92% 99% *83% 96% * 70% 78%

GRADE Listening Comp – percent A/S

* 70% 83% 66% 70% * 53% 64%

* Difference in percentage between proficient and non-proficient students with similar GRADE performance are statistically significant (chi-square)

42

Key Factors in Proficiency (continued) Individual MCAS Passages

GRADE stanine 7-9

GRADE stanine 6

GRADE stanine 5

Mean Percent CorrectNP

(333)

P(1746)

NP(850)

P(719)

NP(1445)

P(253)

MCAS Total Test 69% 84% 64% 80% 59% 79%

Passage 4: “Star Pictures” and “Canis Major” (poetry)

* 71% 90% 67% 84% 61% 83%

Passage 6: “Mercury and the Workmen” (play)

74% 91% * 66% 87% 61% 82%

Passage 7: Soil Circle * 56% 82% * 49% 71% * 42% 72%

* Difference between NP and P students is disproportionate to the difference in their overall MCAS results (4+ pts greater than the total test mean pct correct)

43

The Open Response Challenge

GRADE stanine 7-9

GRADE stanine 6

GRADE stanine 5

Mean Pct Correct NP P NP P NP P

Joanna Cole MC 75% 89% 71% 86% 66% 83%

Joanna Cole OR 32% 46% 34% 44% 32% 47%

Hello, Goodbye MC 88% 96% 83% 94% 77% 93%

Hello, Goodbye OR 42% 56% 41% 55% 40% 60%

On the two passages with both multiple choice and open response items, RF students perform much better on the multiple choice items than the open response items – regardless of their MCAS proficiency and GRADE scores.

44

Findings – Opportunities for improving MCAS performance

• Developing faster and more accurate decoding skills

• Practicing with longer and more difficult authentic text – including high quality expository text

• Building receptive vocabulary

• Developing strategies to infer meaning from text

• Helping students respond to literature – especially in writing

45

Summary

• In Massachusetts, RF has had positive measurable impacts on student skills including improving the performance of students who begin the year at moderate or substantial risk.

• Yet, it remains important for the state to develop a better understanding of the challenges that limit improvement, particularly on MCAS and provide the necessary PD and support to move forward.

• Survey responses indicate that RF staff are generally quite positive about the program’s impact on their knowledge and practice with regard to effective reading instruction. In the long run, this holds the potential to positively impact students’ reading skills once program funding is gone.

46

For additional information, please contact:

Jennifer Gordon, Research Manager508-856-1349

[email protected]

UMass Donahue Institute333 South Street, Suite 400

Shrewsbury, MA 01545

www.donahue.umassp.edu