Download - Michigan Assessment Consortium Common Assessment Development Series Establishing Validity
Michigan Assessment Michigan Assessment ConsortiumConsortium
Common Assessment Common Assessment Development SeriesDevelopment Series
Establishing ValidityEstablishing Validity
Developed byDeveloped by
Bruce R. Fay, PhD Bruce R. Fay, PhD Wayne RESAWayne RESA
James Gullen, PhD James Gullen, PhD Oakland SchoolsOakland Schools
SupportSupport
The Michigan Assessment Consortium The Michigan Assessment Consortium professional development series in professional development series in common assessment development is common assessment development is funded in part by the funded in part by the Michigan Michigan Association of Intermediate School Association of Intermediate School AdministratorsAdministrators in cooperation with … in cooperation with …
What You Will LearnWhat You Will Learn
Validity: what it is, what it isn’t, why Validity: what it is, what it isn’t, why it’s importantit’s important
Types/Sources of Evidence for Types/Sources of Evidence for ValidityValidity
ValidityValidityThe Old(er) ViewThe Old(er) View
Validity is the degree to which a test Validity is the degree to which a test measures what it is intended to measures what it is intended to measure.measure.
This view suggests that validity is a This view suggests that validity is a property of a test.property of a test.
ValidityValidityThe New(er) ViewThe New(er) View
Is not a property of a testIs not a property of a test Relates to the meaningful use of Relates to the meaningful use of
resultsresults Key question: Is it appropriate to use Key question: Is it appropriate to use
the results of this test to make the the results of this test to make the decision(s) we are trying to make?decision(s) we are trying to make?
Validity & Proposed UseValidity & Proposed Use
““Validity refers to the degree to which Validity refers to the degree to which evidence and theory support the evidence and theory support the interpretations of test scores entailed interpretations of test scores entailed by proposed uses of the testsby proposed uses of the tests.” .”
(AERA, APA, & NCME, 1999, p. 9)(AERA, APA, & NCME, 1999, p. 9)
Validity as EvaluationValidity as Evaluation
““Validity is an integrated evaluative Validity is an integrated evaluative judgment of the degree to which judgment of the degree to which empirical evidence and theoretical empirical evidence and theoretical rationales support the adequacy and rationales support the adequacy and appropriateness of inferences and appropriateness of inferences and actions based on test scores or other actions based on test scores or other modes of assessmentmodes of assessment.”.”
(Messick, 1989, p. 13)(Messick, 1989, p. 13)
Meaning in ContextMeaning in Context
Validity is contextual – it does not Validity is contextual – it does not exist in a vacuumexist in a vacuum
Validity is not an all or nothing thing –Validity is not an all or nothing thing –
it has to do with the degree to which it has to do with the degree to which test results can be meaningfully test results can be meaningfully interpreted and correctly used with interpreted and correctly used with respect to a question to be answered respect to a question to be answered or a decision to be madeor a decision to be made
Prerequisites to ValidityPrerequisites to Validity
Certain things have to be in place Certain things have to be in place before validity can be addressedbefore validity can be addressed
ReliabilityReliability
A property of the testA property of the test Statistical in natureStatistical in nature ““Consistency” or repeatabilityConsistency” or repeatability The test actually measures somethingThe test actually measures something
Fairness Fairness means freedom means freedom from bias with respect to:from bias with respect to:
ContentContent Item constructionItem construction Test administration/environmentTest administration/environment Anything else that would cause Anything else that would cause
differential performance based on differential performance based on factors other than knowledge/ability factors other than knowledge/ability with respect to the subject matterwith respect to the subject matter
The Natural Order of The Natural Order of ThingsThings
Reliability precedes Fairness Reliability precedes Fairness precedes Validity.precedes Validity.
Only if a test is reliable can you then Only if a test is reliable can you then determine if it is fair, and only if it is determine if it is fair, and only if it is fair, can you then make any fair, can you then make any defensible use of the results.defensible use of the results.
However, having a reliable, fair test However, having a reliable, fair test does not guarantee valid use.does not guarantee valid use.
Validity RecapValidity Recap
Not a property of the testNot a property of the test Not essentially statisticalNot essentially statistical
Interpretation of resultsInterpretation of results Establishing meaning in contextEstablishing meaning in context Requires professional judgmentRequires professional judgment
Types/Sources of Types/Sources of ValidityValidityInternal ValidityInternal Validity FaceFace ContentContent ResponseResponse Criterion (int)Criterion (int) ConstructConstruct
External ValidityExternal Validity Criterion (ext)Criterion (ext)
ConcurrentConcurrent PredictivePredictive
ConsequentialConsequential
Internal ValidityInternal Validity
PracticalPractical
ContentContent ResponseResponse Criterion (int)Criterion (int)
Not so muchNot so much
FaceFace ConstructConstruct
External ValidityExternal ValidityCriterion (ext)Criterion (ext) Usually statistical Usually statistical
(measures of (measures of association or association or correlation)correlation)
Requires the existence Requires the existence of other tests or points of other tests or points of quantitative of quantitative comparisoncomparison
May require a “known May require a “known good” assumptiongood” assumption
ConsequentialConsequential Relates directly to the Relates directly to the
“correctness” of “correctness” of decisions based on decisions based on resultsresults
Usually established Usually established over multiple cases over multiple cases and timeand time
To Validate or Not to To Validate or Not to Validate…Validate…
……is that the question?is that the question?
Decision-making without Decision-making without data…data…
is just guessing.is just guessing.
But use of improperly But use of improperly validated data…validated data…
……leads to confidently leads to confidently arriving at potentially arriving at potentially
false conclusions.false conclusions.
Practical RealitiesPractical Realities Although validity is not a statistical Although validity is not a statistical
property of a test, both quantitative and property of a test, both quantitative and qualitative methods are used to establish qualitative methods are used to establish evidence for the validity for any particular evidence for the validity for any particular useuse
Many of these methods are beyond the Many of these methods are beyond the scope of what most schools/districts can scope of what most schools/districts can do for themselves…but there are things do for themselves…but there are things you can doyou can do
Clear PurposeClear Purpose
Be clear and explicit about the Be clear and explicit about the intended purpose for which a test is intended purpose for which a test is developed and how the results are to developed and how the results are to be usedbe used
Documented ProcessDocumented Process
Implementing the process outlined in Implementing the process outlined in this training, with fidelity, will provide this training, with fidelity, will provide a big step in this direction, especially a big step in this direction, especially if you if you document what you are doingdocument what you are doing
First Internal ,First Internal , Then Then ExternalExternal Focus first on Internal ValidityFocus first on Internal Validity
ContentContent ResponseResponse CriterionCriterion
Focus next on External ValidityFocus next on External Validity ConcurrentConcurrent PredictivePredictive ConsequentialConsequential
Content &Content &Criterion EvidenceCriterion Evidence
Using test blueprints to design and Using test blueprints to design and explicitly document the relationship explicitly document the relationship (alignment and coverage) of the items (alignment and coverage) of the items on a test to content standardson a test to content standards
Specifying appropriate numbers, Specifying appropriate numbers, types, and levels of items for the types, and levels of items for the content to be assessedcontent to be assessed
More on Content & More on Content & Criterion ValidityCriterion Validity Have test items written and reviewed by Have test items written and reviewed by
people with content/assessment expertisepeople with content/assessment expertise Use a defined process such as the one Use a defined process such as the one
described in this seriesdescribed in this series Review for bias and other criteriaReview for bias and other criteria Create rubrics, scoring guides, or answer Create rubrics, scoring guides, or answer
keys as neededkeys as needed Check everything for accuracyCheck everything for accuracy
It’s Not Just the Items…It’s Not Just the Items…
Establish/document administration Establish/document administration proceduresprocedures
Determine how the results will be Determine how the results will be reported and to whom. Develop draft reported and to whom. Develop draft reporting formats.reporting formats.
Field Testing and Field Testing and ScoringScoring Field test your assessmentField test your assessment Evaluate the test administrationEvaluate the test administration For open-ended items, train scorers and For open-ended items, train scorers and
check that scoring is consistent (establish check that scoring is consistent (establish inter-rater reliability if possible)inter-rater reliability if possible)
Create annotated scoring guides using Create annotated scoring guides using actual (anonymous) student papers as actual (anonymous) student papers as exemplarsexemplars
Field Test Results AnalysisField Test Results Analysis
Analyze the field test results for reliability, Analyze the field test results for reliability, bias, and response patternsbias, and response patterns
Make adjustments based on this analysisMake adjustments based on this analysis Report results to field testers and evaluate Report results to field testers and evaluate
their ability to interpret the data and make their ability to interpret the data and make correct inferences/decisionscorrect inferences/decisions
Repeat the field testing if neededRepeat the field testing if needed
How Good is Good Enough?How Good is Good Enough?
Establish your initial performance Establish your initial performance standards in light of your field test standards in light of your field test data, and adjust if neededdata, and adjust if needed
Consider external validity by Consider external validity by “comparing” pilot results to results “comparing” pilot results to results from other “known good” tests or data from other “known good” tests or data pointspoints
Ready, Set, Go! (?)Ready, Set, Go! (?)
When the test “goes live”:When the test “goes live”: Take steps to ensure the it is Take steps to ensure the it is
administered properlyadministered properly Monitor and document thisMonitor and document this Note any anomaliesNote any anomalies
Behind the ScenesBehind the Scenes
Ensure that tests are scored Ensure that tests are scored accurately.accurately.
Pay particular attention to the scoring Pay particular attention to the scoring of open-ended items.of open-ended items.
Use a process that allows you to Use a process that allows you to check on inter-rater reliability, at least check on inter-rater reliability, at least on a sample basison a sample basis
Making MeaningMaking Meaning
Ensure that test results are reported:Ensure that test results are reported: Using previously developed formatsUsing previously developed formats To the correct usersTo the correct users In a timely fashionIn a timely fashion
Follow up on whether the users Follow up on whether the users can/do make meaningful use of the can/do make meaningful use of the resultsresults