students in the gap(s): research findings on who they are, what they need, and implications for the...

Students in the gap(s): Research findings on who they

are, what they need, and implications for the 2%

flexibility option

Gaye FedorchakNew Hampshire

Department of Education

Sue BechardMeasured Progress

Using Data to Improve Instruction

• Understand differences between instruction and assessment contexts

• Investigate component skills underlying grade level expectations

• Discern students’ assessment needs

• Make decisions about the 2% flexibility option

New England Compact Enhanced Assessment Grant

Funded by US Department of Education

Four states: NH, ME, RI, VT

2005-2007

Challenge: describe students in gap, and design an assessment that will meet needs of students in gap

8th grade mathematics– 8th grade to look at complexity– Mathematics to avoid reading

comprehension issues

Project history

Original goals− Identify students in the gaps− Develop varied assessment modules− Pilot/validate assessment modules

Issues we faced– Not easy to identify the gap, or the

students– Impossible to develop an

assessment without knowing target students’ needs

Revised goals

• Identify students in the gaps through multiple methods, triangulating evidence

• Define common criteria for identifying students in the gap

• Plan and develop task module assessment strategies (assessment prototypes)

• Recommend core components of an assessment structure that would lessen the gaps

• Disseminate products to others considering assessments for students in the gap

Accountability context

• Project began February 2005• Modified achievement standards

announced April 2005• Proposed “2%” regulations released

December 2005• Studies were designed before 2%, not

in response to 2%• Findings speak to needs of all students

not effectively assessed in current system, not necessarily dovetailing with 2% definitions

Five Studies and a Literature Review• Who are the students in the gaps? (Parker &

Saxon)

• Of all the students who are not proficient, how can states identify those who are in the assessment gaps? (Bechard & Godin)

• What are the attributes of students in the gaps, and how do these students perform? (Bechard & Godin)

• What issues in the assessments themselves contribute to the gaps? (Dolan)

• Are there specific aspects of multiple-choice items used in state assessments that contribute to the assessment gaps? (Famularo & Russell)

Gap identification processConduct exploratory

interviews with teachers to identify the assessment

gaps

Review student assessment data

Review teacher judgment data

Operationalize gap criteria

Conduct focused teacher interviews to

confirm gap criteria

Parker and Saxon: Teacher

views of students and assessments

Bechard and Godin: Finding the real assessment

gaps

The process for investigating gap profiles

Bechard and Godin: Who

are students in gaps?

Conduct focused teacher interviews to confirm gap criteria

Investigate characteristics of students in gap 1

Investigate characteristics of students in gap 2

Investigate achievement patterns of students in gap 1

Investigate achievement patterns of students in gap 2

Develop profiles of students in gap 1

Develop profiles of students in gap 2

Alternative test itemsHypothesize alternate test

items

Decompose items into requisite skills/

knowledge

Provide alternative formats:Item format Item content Visuals Multimedia

Review with mathematics

experts

Pilot and evaluate items

Russell and Famularo: Utility of a prototype assessment

Dolan et. al.: Providing

students with choice

Literature Review

• Students likely to be in the gaps: – Mild mental retardation– Learning disabilities– English language learners

• Target population: middle school

• Target academic content: mathematics

Middle School Math Instruction and Assessment for Students with LD, Students with MMR, & ELL Students:A Review of the Literature

Bob Dolan, Boo Murray, and Nicole Strangman

Purpose Comprehensive literature review of research-based

practices during instruction and assessment of students with learning disabilities (LD), students with mild mental retardation (MMR), and students who are English language learners (ELL). Instructional techniques include instructional approaches as

well as scaffolds and supports used in the classroom. Assessment techniques consider test design and delivery,

with emphasis on testing accommodations. Focus on identifying common approaches, despite large

heterogeneity within each group. Goal to support states in understanding how these students

may be represented within a definition of “students in the gap.”

Only considering students who would take general assessment (i.e. not considering students who would qualify for AA-AAS).

Included Student Populations Mild Mental Retardation (MMR)

Students identified as having MMR, being “educable mentally retarded,” or described as having mental retardation and an IQ within the range of 50-55 to approximately 70 (DSM-IV).

Learning Disabilities (LD) Students identified as having LD, a specific LD, or a reading

disability as determined by the study author(s). Students not diagnosed with a specific LD but having had low

performance in math calculations that would presumably meet the IDEA definition for LD.

No attempt was made to evaluate the methodology used to diagnose LD (e.g. discrepancy model, response to intervention model); assumed that authors followed the standard IDEA definition of LD.

English Language Learners (ELL) ELL students, English as a second language (ESL) students,

and limited English proficient (LEP) students.

Conclusions Large disconnect between the level and types of instructional

supports and testing accommodations for all three student populations. Instructional supports focus largely on pedagogical approaches toward

reducing barriers to learning. Test designs, modes of administration, and accommodations largely

limited to reducing accessibility barriers. Discontinuity reflects the limited nature of current large-scale

assessment techniques and psychometric approaches toward their design. Concern over compromising the validity of test inferences or

comparability of scores of students who do and don’t receive such supports.

General failure to consider the heterogeneity of the student population that could significantly impact the effectiveness of test design factors, modes of administration, and accommodations.

As a result, techniques that may be largely successful in allowing these students to learn effectively may not available at the point that students must demonstrate learning.

Considerations for Future Research Approaches toward assessment that better dovetail with

the supports students receive instructionally. Additional research focused on methodologies for test

development that consider construct-relevant vs. construct-irrelevant factors, such as Evidence Centered Design (Mislevy et al.) and construct deconstruction.

Approaches that consider individual student differences, such as through application of universal design (Mace et al., 1996) and Universal Design for Learning (Rose & Meyer, 2002) principles, to create and administer tests that consider diverse students from the start (Thompson et al., 2002) and flexible tests that include built-in supports for diverse students (Bryant & Rivera, 1997; Dolan & Hall, 2001; Dolan, Hall, Banerjee, Chun, & Strangman, 2005; Ketterlin-Geller, 2005).

Are there specific aspects of multiple-choice items used in

state assessments that contribute to the assessment

gaps?

Famularo and Russell

Examining the Utility of a Prototype Assessment for

Assessing Students in the Gap

Lisa Famularo and Michael Russell, Technology and Assessment Study

Collaborative (inTASC)

Overview• Goal: Develop and pilot-test a prototype for

assessing students in the gaps.• 4 complex algebra problems were the

foundation for the prototype:– Linear Patterns– Equality– Rate of Change– Evaluating an Expression

• Modifications were made to determine what, if any, changes would enable students to solve them – Changing problem context from words to pictures – Removing the context of the problem– Changing how information was presented – Simplifying the problem

Purpose of the study• Assess the quality and usefulness of items

designed to decompose skills/knowledge required to solve complex problems.

• Examine the extent to which students who perform well on the complex items also perform well on the decomposed items.

• Examine the extent to which students in the gap are able to succeed on decomposed items while struggling with the complex items.

Gap Definitions• Gap 1: The validity gap contains low-scoring students

whose teachers rated their performance in class as proficient. In other words, there is a discrepancy between their performance on the test and their teachers’ rating of their proficiency.

• Gap 2: The relevance gap consists of students who scored in the lowest achievement level on the test and were rated as low performing in class by their teachers. The large-scale assessment, which is aligned to grade-level achievement standards, is not sensitive to their progress, even with appropriate accommodations and effective instruction..

Prototype test• One test containing 43 MC items • Four sets of algebra items: “item families”• One item family each from these stems within

the algebra strand:– Linear Patterns, – Evaluating an Equation, – Equality, – Rate of Change

• Each item family contained:– One “Parent” item taken from the NECAP grade 8 math

test,– One isomorph “Sibling” representation of the parent,– 10-11 Deconstructed component items: “Child Items”

• Data were collected (Spring 2006) from 2,365 8th

grade students from schools in NH, RI, and VT.

Process used to develop item families

Criteria used to select the “Parent” items:• Complex problems that required multiple skills/concepts. • Of moderate difficulty (item difficulty range: .56 to .66)

Criteria for developing “Child” items: • Developed to probe students’ understanding of the

component skills both individually and in combination.• Alternate representations of parent items were

developed to determine if modifications to the original problem would enable students to solve it.

Example

Illustrates the impact of:

– Simplifying the values

– Removing the context of the problem

Simplifying the Values

Simplifying the Values

Gap 1: 53%Gap 2: 51%

Gap 1: 46%Gap 2: 38%

Simplifying the Values & Removing the Context

Simplifying the Values & Removing the Context

Gap 1: 77%Gap 2: 75%

Gap 1: 53%Gap 2: 51%

Findings

• A factor analysis revealed that the items do cluster together by family and reliability analysis showed moderate to high internal consistency with reliability coefficients

Findings

• As expected, most of the child items were easier than their parent.

• We expected that within each family the parent and sibling items would have roughly the same item difficulty but our analyses revealed that the siblings were easier. (Practice effect?)

Findings• Students who answered a parent item correctly usually

answered child items in that family correctly

• Students who answered a parent item incorrectly were less consistent in their performance on the child items – sometimes correct, sometimes incorrect

• Below grade level items reduced performance

differences between gap and comparison groups more than grade level items.

• However, some grade level items in the rate of change and linear patterns families did not appear to reduce the gap in performance as much as the on grade level items.

Findings• In the equality family, the two items that reduce the gap

in performance the most differ from the parent in that they are single-step rather than multi-step problems.

• Removing the problem situation does not reduce the gap in performance but simplifying the problem (by using whole numbers or variables equal to 1) and having the student demonstrate understanding of algebraic expressions without requiring them to solve an equation might reduce the gap.

• Simplifying the problems presented appeared to enable some students in the gap to solve them correctly. In many cases, however, simplifying the problem resulted in items that were considered below grade level.

Summary

Findings suggest that the dual goal of:• Measuring student achievement relative to

grade level expectations and

• Providing teachers with information about what students can and cannot do – might be accomplished through a modular test

design that employed “parent” items to measure student achievement relative to grade level achievements and “child” items to measure component skills required to accurately answer complex parent items.

Summary:

• Item Changes found to have a positive effect on gap student performance: (Impact was typically greater for Gap 1 students than for Gap 2.)

– Simplify by using whole numbers– Using whole numbers & removing the context– Simplifying information in the table (for Gap 1)– Identifying the correct algebraic expression as

opposed to solving an equation

Summary• Item Changes found to be Not Effective for Gap

students:– Changing the table format from vertical to horizontal– Removing the context & changing the numerical

sequence from decreasing to increasing

• Item Changes that need more study:– Removing the problem context – sometimes reduces

difficulty, sometimes not – how does this happen?– Changing problem context from abstract symbols to

pictures (instead of removing problem context).

Recommendations for future researchItem design research is needed:• Need well described, clearer categories of component (child) items

that produce lower difficulty for gap 1 and gap 2 kids: Some questions to answer -– What does it mean to “remove” context from a math problem?

It did not seem to reduce difficulty for gap kids reliably and when it did, the problem was also below grade level. Does removing context (as described here) cause linguistic complexity to decrease, or does it cause linguistic complexity to increase? What features control this impact?

– If not removed, how can presentation context be changed? Can changing from words to pictures help reduce problem difficulty (or linguistic complexity) without lowering grade level?

– What does it mean to “simplify” a problem? Does this primarily impact the memory load students are handling while solving a problem? If so, how else might we reduce memory load during testing without altering test construct or reducing grade level?

Policy Question:– If a student could solve each component part of a problem, but not

all parts of the problem at once – should it count as grade level?

Who are the students in the gaps?

Parker and Saxon

Gaps in Large-Scale Assessment: Teacher

Views

Caroline E. Parker, EDCSusan Saxon, Ed Alliance at Brown

Background• Single exploratory study

• Findings in two areas:– Teacher views of students– Teacher views of assessments

• Two separate exploratory papers– Student characteristics triangulated in other

studies– Teacher views of assessments still

exploratory, not triangulated

Methods• 40 teachers/administrators from ME,

NH, RI, VT– 23 mathematics teachers– 14 special education teachers– 3 administrators with special education

and mathematics expertise

• Semi-structured interviews

• Convenience sample

• Total of 9 schools and 1 district (teachers from 3 schools)

• Analysis: iterative coding

Explaining the gap between classroom

achievement and assessment results

Conclusion• Two assessment gaps

– First gap includes students who appear to be proficient in class, not proficient on assessment

– Second gap includes students far below grade level in class, very low scores on assessment

– Both gaps include students with disabilities, ELLs and general education students, but in different percentages.

• Three assessment characteristics: – Structure– Relationship/Scaffolding – Relevance

• Study could not delineate between assessment gap and instruction gap and the effects of teacher expectations, content coverage, and opportunities to learn.

Of all the students who are not proficient, how can states identify those who are in

the assessment gaps?

What are the attributes of students in the gaps, and how do these students

perform?

Bechard and Godin

Identifying the gaps in state assessment systems

Sue Bechard and Ken GodinMeasured Progress

Data sources

State assessment data – grade 8 mathematics results from two systemsGeneral large-scale test resultsDemographics (special programs, ethnicity, gender)Teachers’ judgments of students’ classroom workStudent questionnaires completed at time of testAccommodations used at time of test

State data bases for additional student demographic dataDisability classificationFree/reduced lunchAttendance

Student-focused teacher interviews

Why use teacher judgment of students’ classroom performance?

Validity Gap: the test may not reflect classroom performance

Teachers see students performing proficiently in class, but test results are below proficient.

Relevance Gap: the test may not be relevant for instructional planning

Teachers rate students’ class work as low as possible and test results are at “chance” level. No information is generated on what students can do.

Teacher judgment instructions

The instructions were clear that this was to be a judgment of the student’s demonstrated achievement on GLE-aligned academic material in the classroom, not a prediction of test performance.

The teacher judgment field consisted of 12 possibilities – each of the 4 achievement levels had low, medium, and high divisions.

Research on validity of teacher judgment

While there are some conflicting results, the most accurate judgments were found when:

• teachers were given specific evaluation criteria • levels of competency were clearly delineated • criterion-referenced tests in mathematics or reading

were the matching measure • criterion-referenced tests reflected the same content as

did classroom assessments • judgments were of older students who had no

exceptional characteristics, and • teachers were asked to assign ratings to students, not to

rank-order them

Validation of teacher judgment data

Data collected to establish as “Round 1” cutpoints (of 3 rounds) during standard-setting.

Validation studies were conducted which asked: Were there differences between the sample of students with non-

missing teacher judgments data and the rest of the population? Were there suspicious trends in the judgment data suggesting that

teachers did not take the task seriously? How did teacher judgments compare with students’ actual test

scores?

Results of these investigations were considered supportive of using the teacher judgment data for standard setting.

Teacher judgment vs. test performance

Mathematics Achievement Levels – Student Performance and Teacher Judgments

Achievement Level

Overall Mathematics Performance (N=36,708)

Teacher Judgments* (n=24,168)

4 Proficient with Distinction

12.9% 17.9%

3 Proficient

40.6% 53.5%

39.7% 57.6%

2 Partially Proficient

21.6% 31.0%

1 Substantially Below

Proficient 24.9%

46.5% 11.4%

42.4%

Test Floor† (4.6%) *Collapsed from 12 to 4 categories † Students within error of bottom of scale (i.e., chance score) is subset of Achievement Level 1.

Operationalizing the gap definitions using teacher judgment

Operationalizations of the Two Gaps (Grade 8 Mathematics Test). Validity

Gap 1

Non-gap 1

student performance ≤ 1 S.E.M. below sub-proficient/proficient cutscore but teacher judgment ≥ Proficient. student performance ≤ 1 S.E.M. below sub-proficient/proficient cutscore, and, if score was within 1 S.E.M. of achievement level 2 boundaries, received level 2 teacher judgment, or, if score was within 1 S.E.M. of achievement level 1 boundaries, received achievement level 1 teacher judgment.

Relevance Gap 2

Non-gap 2

student performance within 1 S.E.M. of the floor of the test and teacher judgment matched as closely as possible within assessment system (NECAP: lowest available within level 1. MEA: Level 1).

student performance within 1 S.E.M. of the floor of the test and teacher judgment too high (NECAP: next higher available within level 1. MEA: Level 2).

Comparison student performance ≥ 1 S.E.M. above sub-proficient/proficient cutscore and teacher judgment ≥ Proficient.

Student questionnaires (answered after taking the test)1. How difficult was the mathematics test?

A. harder than my regular mathematics schoolworkB. about the same as my regular mathematics schoolworkC. easier than my regular mathematics schoolwork

2. How hard did you try on the mathematics test?

A. I tried harder on this test than I do on my regular mathematics schoolwork.

B. I tried about the same as I do on my regular mathematics schoolwork.

C. I did not try as hard on this test as I do on my regular mathematics schoolwork

Accommodations (used during the mathematics test)

16 accommodations listed by category:SettingScheduling/timingPresentation formatsResponse formats

Student-focused teacher interviews Student profile data math test scores (both overall and on subtests)specific responses to released math test items student’s responses to the questionnaire special program statusaccommodations used during testing

Teacher interview questions Questions regarding perceptions of the students in each

gap on various aspects of gap criteria, 17 Likert scale questions on the student’s class work and

participation in classroom activities.

Student-focused teacher interview samples

20 8th grade math and special ed teachers

7 schools across three states (NH, RI, and VT).

51 students: gap 1=19, gap 2=18, and comparison group=14.

Results: Percentages of students in the gaps

Breakdown of Gap Group Designations Group (N=24,168)

Validity Gap 1 8.6% Non-gap 1 8.8%†

Relevance Gap 2 0.8% [2.3%]* Non-gap 2 1.5% [1.2%]*

Comparison 39.0% † 188 (i.e., 8.7% of) non-gap 1 students scored so low that they also fit the criterion for gap 2 * Shown in brackets: If teacher judgments were collapsed to four achievement levels

Relevance gap 2 and non-gap 2 percentages are different when

fine or gross grained ratings are used.

Accommodations use

Mathematics Accommodation Frequencies within Gap and Comparison Groups Within Group 0 1 2-3 4-6 7+

Gap 1 (n=2,070) 89.8%+ 3.1%- 5.6%- 1.6%- none- Non-gap 1 (n=2,129) 54.3%- 10.4%+ 23.7%+ 10.1%+ 1.6%+ Gap 2 (n=188) 26.5% 15.1% 30.8% 22.2% 5.4%

Non-gap 2 (n=369) 33.9% 16.3% 32.5% 15.5% 1.9% Comparison

(n=9,429) 97.9%+ 1.3%- 0.6%- 0.2%- none- Overall Population 89.8% 3.1% 5.6% 1.6% none + Statistically higher than expected - Statistically lower than expected

•Students in validity gap 1 were significantly less likely to use accommodations than students in non-gap 1. •Only a small percentage of students in validity gap 1 used any accommodations at all.•The majority of students in both relevance gap 2 and non-gap 2 used one or more accommodations.

Performance of students in validity gap 1 compared to non-gap 1

Subpopulation Mean Mathematics Scaled Scores* within Gap Group Designations: Within Group IEP only ELL only IEP&ELL General Ed

Gap 1 (n=2,070) 830.9+ 829.7+ 827.7+ 833.2+

Non-gap 1 (n=2,129) 819.7- 819.1- 815.8- 829.3-

Comparison

(n=9,429) 847.4 848.8 none 850.2

Overall Population 828.2 827.3 817.5 842.3

*AL scale score ranges

AL 1: 800-833

AL 2: 834-839

AL 3: 840-851

AL 4: 852-880

Below proficient Above proficient

+ Statistically higher than expected- Statistically lower than expected

Special program status of students in validity gap 1

Breakdown of Subpopulations within Gap 1 and Comparison Groups Within Group IEP only ELL only IEP&ELL General Ed

Gap 1 (n=2,070) 14.2%- 2.3% 0.1% 83.4%+

Non-gap 1 (n=2,129) 50.8%+ 5.0% 0.9% 43.3%-

Comparison

(n=9,429) 2.2% 0.5% none 97.3%

Overall Population 15.1% 1.9% 0.2% 82.8%

•The majority of students in validity gap 1 were in general education.

•Students with IEPs were under-represented in validity gap 1 and over-represented in non-gap 1.

+ Statistically higher than expected- Statistically lower than expected

Disability designations in validity gap

Learning disabilitiesValidity Gap 1: 57.7% of the IEP gap 1 group (n=208)Non-gap 1: 49.7% of the IEP non-gap 1 group (n=860)Comparison: 49.2% of the IEP comparison group (n=83)Total population: 52% of students with IEPs (N=4,465)

Disability designations only seen in non-gap 1:Students with learning impairments (MR), deafness, multiple

disabilities and traumatic brain injury

Additional characteristics of students in validity gap 1 compared to non-gap 1

Validity gap students:Were more likely female and whiteHad the fewest absencesHad higher SESFound the state test about the same level of

difficulty as class workExhibited academic and mathematics-

appropriate behaviors in class

Performance of students in relevance gap 2 on the test

By definition, students in both relevance gap 2 and non-gap 2 scored no better than chance on the assessment.

Special program status of students in relevance gap 2

Breakdown of Sub-Populations within Gap 2 and Comparison Groups Within Group IEP only ELL only IEP&ELL General Ed

Gap 2 (n=185)

80.0% 6.5% 2.7% 10.8%-

Non-gap 2 (n=369)

69.4% 9.8% 1.6% 19.2%

Comparison (n=9,429)

2.2% 0.5% none 97.3%

Overall Population

15.1% 1.9% 0.2% 82.8%

The vast majority of students in relevance gap 2 and non-gap 2 were students with IEPs.

Disability designations in relevance gap

Learning disabilities: Fewer than half of the students in relevance gap 2 groups had learning disabilities

Students who were deaf/blind and those with multiple disabilities were only found in gap 2.

Students with hearing impairments, deafness and traumatic brain injury were only found in non-gap 2.

Additional characteristics of students in relevance gap 2 compared to non-gap 2Students in relevance gap 2 were very similar to students

in non-gap 2 on most variables.

Students from both groups felt that the test was as hard as or harder than their schoolwork.

They tried as hard as or harder on the test as in class.

They used mathematics tools in the classroom (e.g., calculators).

Summary: How many students are in the gaps?

10.9% - 11.4% of the total student population in two systems are in assessment gaps.

NECAPValidity Gap 1 = 8.6% Relevance Gap 2 = 2.3%

MEAValidity Gap 1 = 7.1% Relevance Gap 2 = 4.3%

Summary

We found substantial differences between the composition of the validity gap 1 groups, which held in both NECAP and MEA systems.

Validity gap 1 students may have characteristics and behaviors that mask their difficulties.

Non-gap 1 students are those generally thought to be in the “achievement gap”.

Summary (cont.)

Low performing students in relevance gap 2 and non-gap 2 share many characteristics.

Their extremely low performances in both classroom activities and the test raise issues about the relevancy of the general assessment for them.

Conclusions

For students in validity gap 1, increase focus on classroom supports and training on how to transfer their knowledge and skills from classroom to assessment environments.

For students in non-gap 1, examine expectations and opportunities to learn. Providing a different test based on modified academic achievement standards is premature.

Students with IEPs in relevance gap 2 and non-gap 2 may benefit from the 2% option for AYP and an alternate assessment based on modified academic achievement standards (AA-MAAS).

There will be challenges designing a test based on MAAS that is strictly aligned with grade level content.

www.necompact.org

www.measuredprogress.org

Gaye [email protected]

Sue [email protected]

http://www.necompact.org/

http://www.measuredprogress.org/

mailto:[email protected]

mailto:[email protected]

students in the gap(s): research findings on who they are, what they need, and implications for the...

Documents

assessment of students

ell students

needs of students

attributes of students

teacher views of students

assessment structure

state assessments

gap profilesbechard