AERA 2010 Robert L. Linn Lecture Slide 1May 1, 2010
Integrating Measurement and Sociocognitive Perspectives in Educational Assessment
Robert J. MislevyUniversity of Maryland
Robert L. Linn Distinguished Address Sponsored by AERA Division D. Presented at the Annual Meeting of the American Educational Research Association, Denver, CO, May 1, 2010.
This work was supported by a grant from the Spencer Foundation.
AERA 2010 Robert L. Linn Lecture Slide 2May 1, 2010
Messick, 1994
[W]hat complex of knowledge, skills, or
other attribute should be assessed...
Next, what behaviors or performances
should reveal those constructs, and
what tasks or situations should elicit
those behaviors?
AERA 2010 Robert L. Linn Lecture Slide 3May 1, 2010
Snow & Lohman, 1989
Summary test scores, and factors based on them, have often been though of as “signs” indicating the presence of underlying, latent traits. …
An alternative interpretation of test scores as samples of cognitive processes and contents … is equally justifiable and could be theoretically more useful.
AERA 2010 Robert L. Linn Lecture Slide 4May 1, 2010
Roadmap
Rationale Model-based reasoning A sociocognitive perspective Assessment arguments Measurement models & concepts Why are these issues important? Conclusion
AERA 2010 Robert L. Linn Lecture Slide 5May 1, 2010
Rationale
AERA 2010 Robert L. Linn Lecture Slide 6May 1, 2010
RationaleAn articulated way to think about assessment: Understand task & use situations in “emic”
sociocognitive terms. Identify the shift in to “etic” terms in task-level
assessment arguments. Examine the synthesis of evidence across tasks in
terms of model-based reasoning. Reconceive measurement concepts. Draw implications for assessment practice.
AERA 2010 Robert L. Linn Lecture Slide 7May 1, 2010
Model-Based Reasoning
Entities and relationships
Representational Form B
Representational Form A
y=ax+b (y-b)/a=x
Mappings among representational
systems
Real-World Situation Reconceived Real-World Situation
Measurement concepts
Measurement models
Representational Form B
Representational Form A
y=ax+b (y-b)/a=x
Entities and relationships in lower-level
model
ReconceivedEntities and relationships
in higher-level model
Mappings among representational
systems
Real-World Situation Reconceived Real-World Situation
Measurement concepts
Measurement models
Sociocognitive concepts
AERA 2010 Robert L. Linn Lecture Slide 10May 1, 2010
A Sociocognitive Perspective
AERA 2010 Robert L. Linn Lecture Slide 11May 1, 2010
Some Foundations
Themes from, e.g., cog psych, linguistics, neuroscience, anthropology: » Connectionist metaphor, associative memory,
complex systems (variation, stability, attractors)
Situated cognition & information processing» E.g., Kintsch’s Construction-Integration (CI) theory
of comprehension; diSessa’s “knowledge in pieces”
Intrapersonal & Extrapersonal patterns
AERA 2010 Robert L. Linn Lecture Slide 12May 1, 2010
Some Foundations
Extrapersonal patterns:» Linguistic: Grammar, conventions, constructions» Cultural models: What ‘being sick’ means,
restaurant script, apology situations» Substantive: F=MA, genres, plumbing, etc.
Intrapersonal resources: » Connectionist metaphor for learning» Patterns from experience at many levels
AERA 2010 Robert L. Linn Lecture Slide 13May 1, 2010
B Inside BInside A A
observablenot observable not observable
AERA 2010 Robert L. Linn Lecture Slide 14May 1, 2010
B Inside BInside AContext
A
A la Kintsch: Propositional content of text / speech…
A la Kintsch: Propositional content of text / speech…
and internal and external aspects of context …
and internal and external aspects of context …
AERA 2010 Robert L. Linn Lecture Slide 15May 1, 2010
B Inside BInside AContext
A
The C in CI theory is Construction:Activation of both relevant and irrelevant bits from LTM, past experience. All L/C/S levels involved.Example: Chemistry problems in German.
The C in CI theory is Construction:Activation of both relevant and irrelevant bits from LTM, past experience. All L/C/S levels involved.Example: Chemistry problems in German.
• If a pattern hasn’t been developed in past experience, it can’t be activated (although it may get constructed in the interaction).
• A relevant pattern from LTM may be activated in some contexts but not others (e.g., physics models).
• If a pattern hasn’t been developed in past experience, it can’t be activated (although it may get constructed in the interaction).
• A relevant pattern from LTM may be activated in some contexts but not others (e.g., physics models).
AERA 2010 Robert L. Linn Lecture Slide 16May 1, 2010
B Inside BInside AContext
A
The I in CI theory, Integration:• Situation model: synthesis of coherent /
reinforced activated L/C/S patterns
The I in CI theory, Integration:• Situation model: synthesis of coherent /
reinforced activated L/C/S patterns
AERA 2010 Robert L. Linn Lecture Slide 17May 1, 2010
B Inside BInside AContext
A
Situation model is also the basis of planning and action.
Situation model is also the basis of planning and action.
AERA 2010 Robert L. Linn Lecture Slide 18May 1, 2010
B Inside BInside AContext
Context
Context
Context
A
AERA 2010 Robert L. Linn Lecture Slide 19May 1, 2010
B Inside BInside AContext
Context
Context
Context
A
Ideally, activation of relevant and compatible intrapersonal patterns…
Ideally, activation of relevant and compatible intrapersonal patterns…
AERA 2010 Robert L. Linn Lecture Slide 20May 1, 2010
B Inside BInside AContext
Context
Context
Context
A
to lead to (sufficiently) shared understanding;
i.e., co-constructed meaning.
to lead to (sufficiently) shared understanding;
i.e., co-constructed meaning.
• Persons’ capabilities, situations, and performances are intertwined –
• Meaning co-determined, through L/C/S patterns
• Persons’ capabilities, situations, and performances are intertwined –
• Meaning co-determined, through L/C/S patterns
AERA 2010 Robert L. Linn Lecture Slide 21May 1, 2010
What can we say about individuals?
Use of resources in appropriate contexts in appropriate ways; i.e.,
Attunement to targeted L/C/S patterns: Recognize markers of externally-viewed patterns?
Construct internal meanings in their light?
Act in ways appropriate to targeted L/C/S patterns?
What is the range and circumstances of activation? (variation of performance across contexts)
AERA 2010 Robert L. Linn Lecture Slide 22May 1, 2010
Assessment Arguments
AERA 2010 Robert L. Linn Lecture Slide 23May 1, 2010
Messick, 1994
[W]hat complex of knowledge, skills, or
other attribute should be assessed...
Next, what behaviors or performances
should reveal those constructs, and
what tasks or situations should elicit
those behaviors?
AERA 2010 Robert L. Linn Lecture Slide 24May 1, 2010
Toulmin’s Argument
Claim
Backing
unless
sinceWarrant
Alternativeexplanation
so
Data
Structure
Student acting inassessment situation
Alternative explanations
unlesson account of
Backing concerning assessment situation
Warrantconcerning assessment
since
Warrant concerning task design since
Other information concerning student vis a vis
assessment situation
so
Claim about student
Data concerning student
performance
Data concerning task situation
Warrant concerning evaluation since
Concerns features of (possibly evolving) context as seen from the view of the assessor – in particular, those seen as relevant to targets of inference.
Concerns features of (possibly evolving) context as seen from the view of the assessor – in particular, those seen as relevant to targets of inference.
Evaluation of performance seeks evidence of attunement to features of targeted L/C/S patterns.
Evaluation of performance seeks evidence of attunement to features of targeted L/C/S patterns.
Note the move from the emic to the etic!Choice in light of assessment purpose and conception of capabilities.
Note the move from the emic to the etic!Choice in light of assessment purpose and conception of capabilities.
Depends on contextual features implicitly, since evaluated in light of targeted L/C/S patterns.
Depends on contextual features implicitly, since evaluated in light of targeted L/C/S patterns.
Student acting inassessment situation
on account of
Backing concerning assessment situation
Alternative explanations
unless
Warrantconcerning assessment
since
Warrant concerning evaluation since
Warrant concerning task design since
Other information concerning student vis a vis
assessment situation
so
Claim about student
Data concerning student
performance
Data concerning task situation
“Hidden” aspects of context—not in test theory model but essential to argument: What attunements to linguistic cultural / substantive patterns can be presumed or arranged for among examinees, to condition inference re targeted l/c/s patterns?
“Hidden” aspects of context—not in test theory model but essential to argument: What attunements to linguistic cultural / substantive patterns can be presumed or arranged for among examinees, to condition inference re targeted l/c/s patterns?
Fundamental to situated meaning of student variables in measurement models;Both critical and implicit.
Fundamental to situated meaning of student variables in measurement models;Both critical and implicit.
Student acting inassessment situation
on account of
Backing concerning assessment situation
Alternative explanations
unless
Warrantconcerning assessment
since
Warrant concerning evaluation since
Warrant concerning task design since
Other information concerning student vis a vis
assessment situation
so
Claim about student
Data concerning student
performance
Data concerning task situation
Macro features of performance
Micro features of performance
Unfolding situated performance
Micro features of situation as it
evolves
Macro features of situation
Time
Features of context arise over time as student acts / interacts.
Features of context arise over time as student acts / interacts.
Features of performance evaluated in light of emerging context.
Features of performance evaluated in light of emerging context.
Especially important in simulation, game, and extended performance contexts (e.g., Shute)
Especially important in simulation, game, and extended performance contexts (e.g., Shute)
Student acting inassessment situation
on account of
Backing concerning assessment situation
Alternative explanations
unless
Warrantconcerning assessment
since
Warrant concerning evaluation since
Warrant concerning task design since
Other information concerning student vis a vis
assessment situation
so
Claim about student
Data concerning student
performance
Data concerning task situation
Design Argument
Claim about student in use situation
Other information concerning student vis a
vis use situation
Warrant concerning use situation since
on account of
Alternative explanations
unless
Design Argument
Use Argument
Data concerning use situation
Student acting inassessment situation
on account of
Backing concerning assessment situation
Alternative explanations
unless
Warrantconcerning assessment
since
Warrant concerning evaluation since
Warrant concerning task design since
Other information concerning student vis a vis
assessment situation
so
Claim about student
Data concerning student
performance
Data concerning task situation
Backing concerning use situation
(Bachman)
Claim about student in use situation
Other information concerning student vis a
vis use situation
Warrant concerning use situation since
on account of
Alternative explanations
unless
Design Argument
Use Argument
Data concerning use situation
Student acting inassessment situation
on account of
Backing concerning assessment situation
Alternative explanations
unless
Warrantconcerning assessment
since
Warrant concerning evaluation since
Warrant concerning task design since
Other information concerning student vis a vis
assessment situation
so
Claim about student
Data concerning student
performance
Data concerning task situation
Backing concerning use situation
(Bachman)
Claim about student is output of the assessment argument, input to the use argument.
Claim about student is output of the assessment argument, input to the use argument.
How it is cast depends on psychological perspective and intended use.
How it is cast depends on psychological perspective and intended use.When measurement models
are used, the claim is an etic synthesis of evidence, expressed as values of student-model variable(s).
When measurement models are used, the claim is an etic synthesis of evidence, expressed as values of student-model variable(s).
Claim about student in use situation
Other information concerning student vis a
vis use situation
Warrant concerning use situation since
on account of
Alternative explanations
unless
Design Argument
Use Argument
Data concerning use situation
Student acting inassessment situation
on account of
Backing concerning assessment situation
Alternative explanations
unless
Warrantconcerning assessment
since
Warrant concerning evaluation since
Warrant concerning task design since
Other information concerning student vis a vis
assessment situation
so
Claim about student
Data concerning student
performance
Data concerning task situation
Backing concerning use situation
Claim about student in use situation
Other information concerning student vis a
vis use situation
Warrant concerning use situation since
on account of
Alternative explanations
unless
Design Argument
Use Argument
Data concerning use situation
Student acting inassessment situation
on account of
Backing concerning assessment situation
Alternative explanations
unless
Warrantconcerning assessment
since
Warrant concerning evaluation since
Warrant concerning task design since
Other information concerning student vis a vis
assessment situation
so
Claim about student
Data concerning student
performance
Data concerning task situation
Backing concerning use situation
Claim about student in use situation
Other information concerning student vis a
vis use situation
Warrant concerning use situation since
on account of
Alternative explanations
unless
Design Argument
Use Argument
Data concerning use situation
Student acting inassessment situation
on account of
Backing concerning assessment situation
Alternative explanations
unless
Warrantconcerning assessment
since
Warrant concerning evaluation since
Warrant concerning task design since
Other information concerning student vis a vis
assessment situation
so
Claim about student
Data concerning student
performance
Data concerning task situation
Backing concerning use situation
Claim about student in use situation
Other information concerning student vis a
vis use situation
Warrant concerning use situation since
on account of
Alternative explanations
unless
Design Argument
Use Argument
Data concerning use situation
Student acting inassessment situation
on account of
Backing concerning assessment situation
Alternative explanations
unless
Warrantconcerning assessment
since
Warrant concerning evaluation since
Warrant concerning task design since
Other information concerning student vis a vis
assessment situation
so
Claim about student
Data concerning student
performance
Data concerning task situation
Backing concerning use situation
Warrant for inference: Increased likelihood of activation in use situation if was activated in task situations.
Warrant for inference: Increased likelihood of activation in use situation if was activated in task situations.
What features do tasks and use situations share?• Implicit in trait
arguments • Explicit in
sociocognitive arguments
What features do tasks and use situations share?• Implicit in trait
arguments • Explicit in
sociocognitive arguments
Empirical question: Degrees of stability, ranges and conditions of variability (Chalhoub-Deville)
Empirical question: Degrees of stability, ranges and conditions of variability (Chalhoub-Deville)
Claim about student in use situation
Other information concerning student vis a
vis use situation
Warrant concerning use situation since
on account of
Alternative explanations
unless
Design Argument
Use Argument
Data concerning use situation
Student acting inassessment situation
on account of
Backing concerning assessment situation
Alternative explanations
unless
Warrantconcerning assessment
since
Warrant concerning evaluation since
Warrant concerning task design since
Other information concerning student vis a vis
assessment situation
so
Claim about student
Data concerning student
performance
Data concerning task situation
Backing concerning use situation
What features do tasks and use situations not have in common?
What features do tasks and use situations not have in common?
• Use situation features call for other L/C/S patterns that weren’t in task and may or may not be in examinee’s resources.
• Target patterns activated in task but not use context.
• Target patterns activated in use but not task context.
Issues of validity & generalizabilitye.g., “method factors”
• Use situation features call for other L/C/S patterns that weren’t in task and may or may not be in examinee’s resources.
• Target patterns activated in task but not use context.
• Target patterns activated in use but not task context.
Issues of validity & generalizabilitye.g., “method factors”
• Knowing about relation of target examinees and use situations strengthen inferences
• “bias for the best” (Swain, 1985)
• Knowing about relation of target examinees and use situations strengthen inferences
• “bias for the best” (Swain, 1985)
AERA 2010 Robert L. Linn Lecture Slide 36May 1, 2010
Multiple TasksClaim about student
…Dp1
OI1
A1
Ds1Dp1 Dp1
OI2
A2
Ds2Dp2Dp1
OIn
An
DsnDpn
Synthesize evidence from multiple tasks, in terms of proficiency variables in a measurement model
Snow & Lohman’s sampling What accumulates? L/C/S patterns, but variation What is similar from analyst’s perspective need
not be from examinee’s.
AERA 2010 Robert L. Linn Lecture Slide 37May 1, 2010
AS IF Tendencies for certain kinds of performance in certain
kinds of situations expressed as student model variables q.
Probability models for individual performances (X) modeled as probabilistic functions of q – variability.
Probability models permit sophisticated reasoning about evidentiary relationships in complex and subtle situations,
BUT they are models, with all the limitations implied!
Measurement Models & Concepts
AERA 2010 Robert L. Linn Lecture Slide 38May 1, 2010
Xs result from particular persons calling upon resources in particular contexts (or not, or how)
Mechanically qs simply accumulate info across situations
Our choosing situations and what to observe drives their situated meaning.
Situated meaning of qs are tendencies toward these actions in these situations that call for certain interactional resources, via L/C/S patterns.
Measurement Models & Concepts
AERA 2010 Robert L. Linn Lecture Slide 39May 1, 2010
Classical Test Theory
Probability model: “true score” = stability along implied dimension, “error” = variation
Situated meaning from task features & evaluation Can organize around traits, task features, or both,
depending on task sets and performance features. Profile differences unaddressed
Claim about student
…Dp1
OI1
A1
Ds1Dp1 Dp1
OI2
A2
Ds2Dp2Dp1
OIn
An
DsnDpn
t
X
AERA 2010 Robert L. Linn Lecture Slide 40May 1, 2010
Item Response Theory
q = propensity to act in targeted way, bj=typical evocation, IRT function = typical variation
Situated meaning from task features & evaluation Task features still implicit Profile differences / misfit highlights where the
narrative doesn’t fit – for sociocognitive reasons
Claim about student
…Dp1
OI1
A1
Ds1Dp1 Dp1
OI2
A2
Ds2Dp2Dp1
OIn
An
DsnDpn
q
X1 X2 Xn…
Complex systems concepts: Attractors & stability regularities in response patterns, quantified in parameters; Typical variation prob model
Complex systems concepts: Attractors & stability regularities in response patterns, quantified in parameters; Typical variation prob model
Will work best when most nontargeted L/C/S patterns are familiar…Item-parameter invariance
vs Population dependence(Tatsuoka, Linn, Tatsuoka, & Yamamoto, 1988)
Will work best when most nontargeted L/C/S patterns are familiar…Item-parameter invariance
vs Population dependence(Tatsuoka, Linn, Tatsuoka, & Yamamoto, 1988)
AERA 2010 Robert L. Linn Lecture Slide 41May 1, 2010
Multivariate Item Response Theory (MIRT)
q s = propensities to act in targeted ways in situations with different mixes of L/C/S demands.
Good for controlled mixes of situations
AERA 2010 Robert L. Linn Lecture Slide 42May 1, 2010
Structured Item Response Theory
Explicitly model task situations in terms of L/C/S demands. Links TD with sociocognitive view.
Work explicitly with features in controlled and evolved situations (design / agents)
Can use with MIRT; Cognitive diagnosis models
Claim about student
…Dp1
OI1
A1
Ds1Dp1 Dp1
OI2
A2
Ds2Dp2Dp1
OIn
An
DsnDpn
q
X1 X2 Xn…
q1
vi1
q2
vi2
qn
vin
AERA 2010 Robert L. Linn Lecture Slide 43May 1, 2010
Mixtures of IRT Models
Different IRT models for different unobserved groups of people
Modeling different attractor states Can be theory driven or discovered in data
Claim about student
…Dp1
OI1
A1
Ds1Dp1 Dp1
OI2
A2
Ds2Dp2 Dp1
OIn
An
DsnDpn
q
X1 X2 Xn…
Claim about student
…Dp1
OI1
A1
Ds1Dp1 Dp1
OI2
A2
Ds2Dp2 Dp1
OIn
An
DsnDpn
q
X1 X2 Xn…
OR
AERA 2010 Robert L. Linn Lecture Slide 44May 1, 2010
Measurement Concepts
Validity» Soundness of model for local inferences» Breadth of scope is an empirical question» Construct representation in L/C/S terms» Construct irrelevant sources of variation in
L/C/S terms
Reliability» Through model, strength of evidence for
inferences about tendencies, given variabilities … or about characterizations of variability.
AERA 2010 Robert L. Linn Lecture Slide 45May 1, 2010
Measurement Concepts
Method Effects» What accumulates in terms of L/C/S patterns in
assessment situations but not use situations
Generalizability Theory (Cronbach)» Watershed in emphasizing evidentiary reasoning
rather than simply measurement» Focus on external features of context; can be recast
in L/C/S terms, & attend to correlates of variability
AERA 2010 Robert L. Linn Lecture Slide 46May 1, 2010
Why are these issues important?
Connect assessment/measurement with current psychological research » Connect assessment with learning
Appropriate constraints on interpreting large scale assessments
Inference in complex assessments» Games, simulations, performances» Assessment modifications & accommodations» Individualized yet comparable assessments
AERA 2010 Robert L. Linn Lecture Slide 47May 1, 2010
Conclusion
Communication at the interface
Communication at the interface
We have work we need to do, together.
We have work we need to do, together.