students in the gap(s): research findings on who they are, what they need, and implications for the...
TRANSCRIPT
Students in the gap(s): Research findings on who they
are, what they need, and implications for the 2%
flexibility option
Gaye FedorchakNew Hampshire
Department of Education
Sue BechardMeasured Progress
Using Data to Improve Instruction
• Understand differences between instruction and assessment contexts
• Investigate component skills underlying grade level expectations
• Discern students’ assessment needs
• Make decisions about the 2% flexibility option
New England Compact Enhanced Assessment Grant
Funded by US Department of Education
Four states: NH, ME, RI, VT
2005-2007
Challenge: describe students in gap, and design an assessment that will meet needs of students in gap
8th grade mathematics– 8th grade to look at complexity– Mathematics to avoid reading
comprehension issues
Project history
Original goals− Identify students in the gaps− Develop varied assessment modules− Pilot/validate assessment modules
Issues we faced– Not easy to identify the gap, or the
students– Impossible to develop an
assessment without knowing target students’ needs
Revised goals
• Identify students in the gaps through multiple methods, triangulating evidence
• Define common criteria for identifying students in the gap
• Plan and develop task module assessment strategies (assessment prototypes)
• Recommend core components of an assessment structure that would lessen the gaps
• Disseminate products to others considering assessments for students in the gap
Accountability context
• Project began February 2005• Modified achievement standards
announced April 2005• Proposed “2%” regulations released
December 2005• Studies were designed before 2%, not
in response to 2%• Findings speak to needs of all students
not effectively assessed in current system, not necessarily dovetailing with 2% definitions
Five Studies and a Literature Review• Who are the students in the gaps? (Parker &
Saxon)
• Of all the students who are not proficient, how can states identify those who are in the assessment gaps? (Bechard & Godin)
• What are the attributes of students in the gaps, and how do these students perform? (Bechard & Godin)
• What issues in the assessments themselves contribute to the gaps? (Dolan)
• Are there specific aspects of multiple-choice items used in state assessments that contribute to the assessment gaps? (Famularo & Russell)
Gap identification processConduct exploratory
interviews with teachers to identify the assessment
gaps
Review student assessment data
Review teacher judgment data
Operationalize gap criteria
Conduct focused teacher interviews to
confirm gap criteria
Parker and Saxon: Teacher
views of students and assessments
Bechard and Godin: Finding the real assessment
gaps
The process for investigating gap profiles
Bechard and Godin: Who
are students in gaps?
Conduct focused teacher interviews to confirm gap criteria
Investigate characteristics of students in gap 1
Investigate characteristics of students in gap 2
Investigate achievement patterns of students in gap 1
Investigate achievement patterns of students in gap 2
Develop profiles of students in gap 1
Develop profiles of students in gap 2
Alternative test itemsHypothesize alternate test
items
Decompose items into requisite skills/
knowledge
Provide alternative formats:Item format Item content Visuals Multimedia
Review with mathematics
experts
Pilot and evaluate items
Russell and Famularo: Utility of a prototype assessment
Dolan et. al.: Providing
students with choice
Literature Review
• Students likely to be in the gaps: – Mild mental retardation– Learning disabilities– English language learners
• Target population: middle school
• Target academic content: mathematics
Middle School Math Instruction and Assessment for Students with LD, Students with MMR, & ELL Students:A Review of the Literature
Bob Dolan, Boo Murray, and Nicole Strangman
Purpose Comprehensive literature review of research-based
practices during instruction and assessment of students with learning disabilities (LD), students with mild mental retardation (MMR), and students who are English language learners (ELL). Instructional techniques include instructional approaches as
well as scaffolds and supports used in the classroom. Assessment techniques consider test design and delivery,
with emphasis on testing accommodations. Focus on identifying common approaches, despite large
heterogeneity within each group. Goal to support states in understanding how these students
may be represented within a definition of “students in the gap.”
Only considering students who would take general assessment (i.e. not considering students who would qualify for AA-AAS).
Included Student Populations Mild Mental Retardation (MMR)
Students identified as having MMR, being “educable mentally retarded,” or described as having mental retardation and an IQ within the range of 50-55 to approximately 70 (DSM-IV).
Learning Disabilities (LD) Students identified as having LD, a specific LD, or a reading
disability as determined by the study author(s). Students not diagnosed with a specific LD but having had low
performance in math calculations that would presumably meet the IDEA definition for LD.
No attempt was made to evaluate the methodology used to diagnose LD (e.g. discrepancy model, response to intervention model); assumed that authors followed the standard IDEA definition of LD.
English Language Learners (ELL) ELL students, English as a second language (ESL) students,
and limited English proficient (LEP) students.
Conclusions Large disconnect between the level and types of instructional
supports and testing accommodations for all three student populations. Instructional supports focus largely on pedagogical approaches toward
reducing barriers to learning. Test designs, modes of administration, and accommodations largely
limited to reducing accessibility barriers. Discontinuity reflects the limited nature of current large-scale
assessment techniques and psychometric approaches toward their design. Concern over compromising the validity of test inferences or
comparability of scores of students who do and don’t receive such supports.
General failure to consider the heterogeneity of the student population that could significantly impact the effectiveness of test design factors, modes of administration, and accommodations.
As a result, techniques that may be largely successful in allowing these students to learn effectively may not available at the point that students must demonstrate learning.
Considerations for Future Research Approaches toward assessment that better dovetail with
the supports students receive instructionally. Additional research focused on methodologies for test
development that consider construct-relevant vs. construct-irrelevant factors, such as Evidence Centered Design (Mislevy et al.) and construct deconstruction.
Approaches that consider individual student differences, such as through application of universal design (Mace et al., 1996) and Universal Design for Learning (Rose & Meyer, 2002) principles, to create and administer tests that consider diverse students from the start (Thompson et al., 2002) and flexible tests that include built-in supports for diverse students (Bryant & Rivera, 1997; Dolan & Hall, 2001; Dolan, Hall, Banerjee, Chun, & Strangman, 2005; Ketterlin-Geller, 2005).
Are there specific aspects of multiple-choice items used in
state assessments that contribute to the assessment
gaps?
Famularo and Russell
Examining the Utility of a Prototype Assessment for
Assessing Students in the Gap
Lisa Famularo and Michael Russell, Technology and Assessment Study
Collaborative (inTASC)
Overview• Goal: Develop and pilot-test a prototype for
assessing students in the gaps.• 4 complex algebra problems were the
foundation for the prototype:– Linear Patterns– Equality– Rate of Change– Evaluating an Expression
• Modifications were made to determine what, if any, changes would enable students to solve them – Changing problem context from words to pictures – Removing the context of the problem– Changing how information was presented – Simplifying the problem
Purpose of the study• Assess the quality and usefulness of items
designed to decompose skills/knowledge required to solve complex problems.
• Examine the extent to which students who perform well on the complex items also perform well on the decomposed items.
• Examine the extent to which students in the gap are able to succeed on decomposed items while struggling with the complex items.
Gap Definitions• Gap 1: The validity gap contains low-scoring students
whose teachers rated their performance in class as proficient. In other words, there is a discrepancy between their performance on the test and their teachers’ rating of their proficiency.
• Gap 2: The relevance gap consists of students who scored in the lowest achievement level on the test and were rated as low performing in class by their teachers. The large-scale assessment, which is aligned to grade-level achievement standards, is not sensitive to their progress, even with appropriate accommodations and effective instruction..
Prototype test• One test containing 43 MC items • Four sets of algebra items: “item families”• One item family each from these stems within
the algebra strand:– Linear Patterns, – Evaluating an Equation, – Equality, – Rate of Change
• Each item family contained:– One “Parent” item taken from the NECAP grade 8 math
test,– One isomorph “Sibling” representation of the parent,– 10-11 Deconstructed component items: “Child Items”
• Data were collected (Spring 2006) from 2,365 8th
grade students from schools in NH, RI, and VT.
Process used to develop item families
Criteria used to select the “Parent” items:• Complex problems that required multiple skills/concepts. • Of moderate difficulty (item difficulty range: .56 to .66)
Criteria for developing “Child” items: • Developed to probe students’ understanding of the
component skills both individually and in combination.• Alternate representations of parent items were
developed to determine if modifications to the original problem would enable students to solve it.
Example
Illustrates the impact of:
– Simplifying the values
– Removing the context of the problem
Simplifying the Values
Simplifying the Values
Simplifying the Values
Gap 1: 53%Gap 2: 51%
Gap 1: 46%Gap 2: 38%
Simplifying the Values & Removing the Context
Simplifying the Values & Removing the Context
Simplifying the Values & Removing the Context
Gap 1: 77%Gap 2: 75%
Gap 1: 53%Gap 2: 51%
Findings
• A factor analysis revealed that the items do cluster together by family and reliability analysis showed moderate to high internal consistency with reliability coefficients
Findings
• As expected, most of the child items were easier than their parent.
• We expected that within each family the parent and sibling items would have roughly the same item difficulty but our analyses revealed that the siblings were easier. (Practice effect?)
Findings• Students who answered a parent item correctly usually
answered child items in that family correctly
• Students who answered a parent item incorrectly were less consistent in their performance on the child items – sometimes correct, sometimes incorrect
• Below grade level items reduced performance
differences between gap and comparison groups more than grade level items.
• However, some grade level items in the rate of change and linear patterns families did not appear to reduce the gap in performance as much as the on grade level items.
Findings• In the equality family, the two items that reduce the gap
in performance the most differ from the parent in that they are single-step rather than multi-step problems.
• Removing the problem situation does not reduce the gap in performance but simplifying the problem (by using whole numbers or variables equal to 1) and having the student demonstrate understanding of algebraic expressions without requiring them to solve an equation might reduce the gap.
• Simplifying the problems presented appeared to enable some students in the gap to solve them correctly. In many cases, however, simplifying the problem resulted in items that were considered below grade level.
Summary
Findings suggest that the dual goal of:• Measuring student achievement relative to
grade level expectations and
• Providing teachers with information about what students can and cannot do – might be accomplished through a modular test
design that employed “parent” items to measure student achievement relative to grade level achievements and “child” items to measure component skills required to accurately answer complex parent items.
Summary:
• Item Changes found to have a positive effect on gap student performance: (Impact was typically greater for Gap 1 students than for Gap 2.)
– Simplify by using whole numbers– Using whole numbers & removing the context– Simplifying information in the table (for Gap 1)– Identifying the correct algebraic expression as
opposed to solving an equation
Summary• Item Changes found to be Not Effective for Gap
students:– Changing the table format from vertical to horizontal– Removing the context & changing the numerical
sequence from decreasing to increasing
• Item Changes that need more study:– Removing the problem context – sometimes reduces
difficulty, sometimes not – how does this happen?– Changing problem context from abstract symbols to
pictures (instead of removing problem context).
Recommendations for future researchItem design research is needed:• Need well described, clearer categories of component (child) items
that produce lower difficulty for gap 1 and gap 2 kids: Some questions to answer -– What does it mean to “remove” context from a math problem?
It did not seem to reduce difficulty for gap kids reliably and when it did, the problem was also below grade level. Does removing context (as described here) cause linguistic complexity to decrease, or does it cause linguistic complexity to increase? What features control this impact?
– If not removed, how can presentation context be changed? Can changing from words to pictures help reduce problem difficulty (or linguistic complexity) without lowering grade level?
– What does it mean to “simplify” a problem? Does this primarily impact the memory load students are handling while solving a problem? If so, how else might we reduce memory load during testing without altering test construct or reducing grade level?
Policy Question:– If a student could solve each component part of a problem, but not
all parts of the problem at once – should it count as grade level?
Who are the students in the gaps?
Parker and Saxon
Gaps in Large-Scale Assessment: Teacher
Views
Caroline E. Parker, EDCSusan Saxon, Ed Alliance at Brown
Background• Single exploratory study
• Findings in two areas:– Teacher views of students– Teacher views of assessments
• Two separate exploratory papers– Student characteristics triangulated in other
studies– Teacher views of assessments still
exploratory, not triangulated
Methods• 40 teachers/administrators from ME,
NH, RI, VT– 23 mathematics teachers– 14 special education teachers– 3 administrators with special education
and mathematics expertise
• Semi-structured interviews
• Convenience sample
• Total of 9 schools and 1 district (teachers from 3 schools)
• Analysis: iterative coding
Explaining the gap between classroom
achievement and assessment results
Conclusion• Two assessment gaps
– First gap includes students who appear to be proficient in class, not proficient on assessment
– Second gap includes students far below grade level in class, very low scores on assessment
– Both gaps include students with disabilities, ELLs and general education students, but in different percentages.
• Three assessment characteristics: – Structure– Relationship/Scaffolding – Relevance
• Study could not delineate between assessment gap and instruction gap and the effects of teacher expectations, content coverage, and opportunities to learn.
Of all the students who are not proficient, how can states identify those who are in
the assessment gaps?
What are the attributes of students in the gaps, and how do these students
perform?
Bechard and Godin
Identifying the gaps in state assessment systems
Sue Bechard and Ken GodinMeasured Progress
Data sources
State assessment data – grade 8 mathematics results from two systemsGeneral large-scale test resultsDemographics (special programs, ethnicity, gender)Teachers’ judgments of students’ classroom workStudent questionnaires completed at time of testAccommodations used at time of test
State data bases for additional student demographic dataDisability classificationFree/reduced lunchAttendance
Student-focused teacher interviews
Why use teacher judgment of students’ classroom performance?
Validity Gap: the test may not reflect classroom performance
Teachers see students performing proficiently in class, but test results are below proficient.
Relevance Gap: the test may not be relevant for instructional planning
Teachers rate students’ class work as low as possible and test results are at “chance” level. No information is generated on what students can do.
Teacher judgment instructions
The instructions were clear that this was to be a judgment of the student’s demonstrated achievement on GLE-aligned academic material in the classroom, not a prediction of test performance.
The teacher judgment field consisted of 12 possibilities – each of the 4 achievement levels had low, medium, and high divisions.
Research on validity of teacher judgment
While there are some conflicting results, the most accurate judgments were found when:
• teachers were given specific evaluation criteria • levels of competency were clearly delineated • criterion-referenced tests in mathematics or reading
were the matching measure • criterion-referenced tests reflected the same content as
did classroom assessments • judgments were of older students who had no
exceptional characteristics, and • teachers were asked to assign ratings to students, not to
rank-order them
Validation of teacher judgment data
Data collected to establish as “Round 1” cutpoints (of 3 rounds) during standard-setting.
Validation studies were conducted which asked: Were there differences between the sample of students with non-
missing teacher judgments data and the rest of the population? Were there suspicious trends in the judgment data suggesting that
teachers did not take the task seriously? How did teacher judgments compare with students’ actual test
scores?
Results of these investigations were considered supportive of using the teacher judgment data for standard setting.
Teacher judgment vs. test performance
Mathematics Achievement Levels – Student Performance and Teacher Judgments
Achievement Level
Overall Mathematics Performance (N=36,708)
Teacher Judgments* (n=24,168)
4 Proficient with Distinction
12.9% 17.9%
3 Proficient
40.6% 53.5%
39.7% 57.6%
2 Partially Proficient
21.6% 31.0%
1 Substantially Below
Proficient 24.9%
46.5% 11.4%
42.4%
Test Floor† (4.6%) *Collapsed from 12 to 4 categories † Students within error of bottom of scale (i.e., chance score) is subset of Achievement Level 1.
Operationalizing the gap definitions using teacher judgment
Operationalizations of the Two Gaps (Grade 8 Mathematics Test). Validity
Gap 1
Non-gap 1
student performance ≤ 1 S.E.M. below sub-proficient/proficient cutscore but teacher judgment ≥ Proficient. student performance ≤ 1 S.E.M. below sub-proficient/proficient cutscore, and, if score was within 1 S.E.M. of achievement level 2 boundaries, received level 2 teacher judgment, or, if score was within 1 S.E.M. of achievement level 1 boundaries, received achievement level 1 teacher judgment.
Relevance Gap 2
Non-gap 2
student performance within 1 S.E.M. of the floor of the test and teacher judgment matched as closely as possible within assessment system (NECAP: lowest available within level 1. MEA: Level 1).
student performance within 1 S.E.M. of the floor of the test and teacher judgment too high (NECAP: next higher available within level 1. MEA: Level 2).
Comparison student performance ≥ 1 S.E.M. above sub-proficient/proficient cutscore and teacher judgment ≥ Proficient.
Student questionnaires (answered after taking the test)1. How difficult was the mathematics test?
A. harder than my regular mathematics schoolworkB. about the same as my regular mathematics schoolworkC. easier than my regular mathematics schoolwork
2. How hard did you try on the mathematics test?
A. I tried harder on this test than I do on my regular mathematics schoolwork.
B. I tried about the same as I do on my regular mathematics schoolwork.
C. I did not try as hard on this test as I do on my regular mathematics schoolwork
Accommodations (used during the mathematics test)
16 accommodations listed by category:SettingScheduling/timingPresentation formatsResponse formats
Student-focused teacher interviews Student profile data math test scores (both overall and on subtests)specific responses to released math test items student’s responses to the questionnaire special program statusaccommodations used during testing
Teacher interview questions Questions regarding perceptions of the students in each
gap on various aspects of gap criteria, 17 Likert scale questions on the student’s class work and
participation in classroom activities.
Student-focused teacher interview samples
20 8th grade math and special ed teachers
7 schools across three states (NH, RI, and VT).
51 students: gap 1=19, gap 2=18, and comparison group=14.
Results: Percentages of students in the gaps
Breakdown of Gap Group Designations Group (N=24,168)
Validity Gap 1 8.6% Non-gap 1 8.8%†
Relevance Gap 2 0.8% [2.3%]* Non-gap 2 1.5% [1.2%]*
Comparison 39.0% † 188 (i.e., 8.7% of) non-gap 1 students scored so low that they also fit the criterion for gap 2 * Shown in brackets: If teacher judgments were collapsed to four achievement levels
Relevance gap 2 and non-gap 2 percentages are different when
fine or gross grained ratings are used.
Accommodations use
Mathematics Accommodation Frequencies within Gap and Comparison Groups Within Group 0 1 2-3 4-6 7+
Gap 1 (n=2,070) 89.8%+ 3.1%- 5.6%- 1.6%- none- Non-gap 1 (n=2,129) 54.3%- 10.4%+ 23.7%+ 10.1%+ 1.6%+ Gap 2 (n=188) 26.5% 15.1% 30.8% 22.2% 5.4%
Non-gap 2 (n=369) 33.9% 16.3% 32.5% 15.5% 1.9% Comparison
(n=9,429) 97.9%+ 1.3%- 0.6%- 0.2%- none- Overall Population 89.8% 3.1% 5.6% 1.6% none + Statistically higher than expected - Statistically lower than expected
•Students in validity gap 1 were significantly less likely to use accommodations than students in non-gap 1. •Only a small percentage of students in validity gap 1 used any accommodations at all.•The majority of students in both relevance gap 2 and non-gap 2 used one or more accommodations.
Performance of students in validity gap 1 compared to non-gap 1
Subpopulation Mean Mathematics Scaled Scores* within Gap Group Designations: Within Group IEP only ELL only IEP&ELL General Ed
Gap 1 (n=2,070) 830.9+ 829.7+ 827.7+ 833.2+
Non-gap 1 (n=2,129) 819.7- 819.1- 815.8- 829.3-
Comparison
(n=9,429) 847.4 848.8 none 850.2
Overall Population 828.2 827.3 817.5 842.3
*AL scale score ranges
AL 1: 800-833
AL 2: 834-839
AL 3: 840-851
AL 4: 852-880
Below proficient Above proficient
+ Statistically higher than expected- Statistically lower than expected
Special program status of students in validity gap 1
Breakdown of Subpopulations within Gap 1 and Comparison Groups Within Group IEP only ELL only IEP&ELL General Ed
Gap 1 (n=2,070) 14.2%- 2.3% 0.1% 83.4%+
Non-gap 1 (n=2,129) 50.8%+ 5.0% 0.9% 43.3%-
Comparison
(n=9,429) 2.2% 0.5% none 97.3%
Overall Population 15.1% 1.9% 0.2% 82.8%
•The majority of students in validity gap 1 were in general education.
•Students with IEPs were under-represented in validity gap 1 and over-represented in non-gap 1.
+ Statistically higher than expected- Statistically lower than expected
Disability designations in validity gap
Learning disabilitiesValidity Gap 1: 57.7% of the IEP gap 1 group (n=208)Non-gap 1: 49.7% of the IEP non-gap 1 group (n=860)Comparison: 49.2% of the IEP comparison group (n=83)Total population: 52% of students with IEPs (N=4,465)
Disability designations only seen in non-gap 1:Students with learning impairments (MR), deafness, multiple
disabilities and traumatic brain injury
Additional characteristics of students in validity gap 1 compared to non-gap 1
Validity gap students:Were more likely female and whiteHad the fewest absencesHad higher SESFound the state test about the same level of
difficulty as class workExhibited academic and mathematics-
appropriate behaviors in class
Performance of students in relevance gap 2 on the test
By definition, students in both relevance gap 2 and non-gap 2 scored no better than chance on the assessment.
Special program status of students in relevance gap 2
Breakdown of Sub-Populations within Gap 2 and Comparison Groups Within Group IEP only ELL only IEP&ELL General Ed
Gap 2 (n=185)
80.0% 6.5% 2.7% 10.8%-
Non-gap 2 (n=369)
69.4% 9.8% 1.6% 19.2%
Comparison (n=9,429)
2.2% 0.5% none 97.3%
Overall Population
15.1% 1.9% 0.2% 82.8%
The vast majority of students in relevance gap 2 and non-gap 2 were students with IEPs.
Disability designations in relevance gap
Learning disabilities: Fewer than half of the students in relevance gap 2 groups had learning disabilities
Students who were deaf/blind and those with multiple disabilities were only found in gap 2.
Students with hearing impairments, deafness and traumatic brain injury were only found in non-gap 2.
Additional characteristics of students in relevance gap 2 compared to non-gap 2Students in relevance gap 2 were very similar to students
in non-gap 2 on most variables.
Students from both groups felt that the test was as hard as or harder than their schoolwork.
They tried as hard as or harder on the test as in class.
They used mathematics tools in the classroom (e.g., calculators).
Summary: How many students are in the gaps?
10.9% - 11.4% of the total student population in two systems are in assessment gaps.
NECAPValidity Gap 1 = 8.6% Relevance Gap 2 = 2.3%
MEAValidity Gap 1 = 7.1% Relevance Gap 2 = 4.3%
Summary
We found substantial differences between the composition of the validity gap 1 groups, which held in both NECAP and MEA systems.
Validity gap 1 students may have characteristics and behaviors that mask their difficulties.
Non-gap 1 students are those generally thought to be in the “achievement gap”.
Summary (cont.)
Low performing students in relevance gap 2 and non-gap 2 share many characteristics.
Their extremely low performances in both classroom activities and the test raise issues about the relevancy of the general assessment for them.
Conclusions
For students in validity gap 1, increase focus on classroom supports and training on how to transfer their knowledge and skills from classroom to assessment environments.
For students in non-gap 1, examine expectations and opportunities to learn. Providing a different test based on modified academic achievement standards is premature.
Students with IEPs in relevance gap 2 and non-gap 2 may benefit from the 2% option for AYP and an alternate assessment based on modified academic achievement standards (AA-MAAS).
There will be challenges designing a test based on MAAS that is strictly aligned with grade level content.