Download - Michigan Assessment Consortium Common Assessment Development Series Establishing Validity

Michigan Assessment Michigan Assessment ConsortiumConsortium

Common Assessment Common Assessment Development SeriesDevelopment Series

Establishing ValidityEstablishing Validity

Developed byDeveloped by

Bruce R. Fay, PhD Bruce R. Fay, PhD Wayne RESAWayne RESA

James Gullen, PhD James Gullen, PhD Oakland SchoolsOakland Schools

SupportSupport

The Michigan Assessment Consortium The Michigan Assessment Consortium professional development series in professional development series in common assessment development is common assessment development is funded in part by the funded in part by the Michigan Michigan Association of Intermediate School Association of Intermediate School AdministratorsAdministrators in cooperation with … in cooperation with …

What You Will LearnWhat You Will Learn

Validity: what it is, what it isn’t, why Validity: what it is, what it isn’t, why it’s importantit’s important

Types/Sources of Evidence for Types/Sources of Evidence for ValidityValidity

ValidityValidityThe Old(er) ViewThe Old(er) View

Validity is the degree to which a test Validity is the degree to which a test measures what it is intended to measures what it is intended to measure.measure.

This view suggests that validity is a This view suggests that validity is a property of a test.property of a test.

ValidityValidityThe New(er) ViewThe New(er) View

Is not a property of a testIs not a property of a test Relates to the meaningful use of Relates to the meaningful use of

resultsresults Key question: Is it appropriate to use Key question: Is it appropriate to use

the results of this test to make the the results of this test to make the decision(s) we are trying to make?decision(s) we are trying to make?

Validity & Proposed UseValidity & Proposed Use

““Validity refers to the degree to which Validity refers to the degree to which evidence and theory support the evidence and theory support the interpretations of test scores entailed interpretations of test scores entailed by proposed uses of the testsby proposed uses of the tests.” .”

(AERA, APA, & NCME, 1999, p. 9)(AERA, APA, & NCME, 1999, p. 9)

Validity as EvaluationValidity as Evaluation

““Validity is an integrated evaluative Validity is an integrated evaluative judgment of the degree to which judgment of the degree to which empirical evidence and theoretical empirical evidence and theoretical rationales support the adequacy and rationales support the adequacy and appropriateness of inferences and appropriateness of inferences and actions based on test scores or other actions based on test scores or other modes of assessmentmodes of assessment.”.”

(Messick, 1989, p. 13)(Messick, 1989, p. 13)

Meaning in ContextMeaning in Context

Validity is contextual – it does not Validity is contextual – it does not exist in a vacuumexist in a vacuum

Validity is not an all or nothing thing –Validity is not an all or nothing thing –

it has to do with the degree to which it has to do with the degree to which test results can be meaningfully test results can be meaningfully interpreted and correctly used with interpreted and correctly used with respect to a question to be answered respect to a question to be answered or a decision to be madeor a decision to be made

Prerequisites to ValidityPrerequisites to Validity

Certain things have to be in place Certain things have to be in place before validity can be addressedbefore validity can be addressed

ReliabilityReliability

A property of the testA property of the test Statistical in natureStatistical in nature ““Consistency” or repeatabilityConsistency” or repeatability The test actually measures somethingThe test actually measures something

Fairness Fairness means freedom means freedom from bias with respect to:from bias with respect to:

ContentContent Item constructionItem construction Test administration/environmentTest administration/environment Anything else that would cause Anything else that would cause

differential performance based on differential performance based on factors other than knowledge/ability factors other than knowledge/ability with respect to the subject matterwith respect to the subject matter

The Natural Order of The Natural Order of ThingsThings

Reliability precedes Fairness Reliability precedes Fairness precedes Validity.precedes Validity.

Only if a test is reliable can you then Only if a test is reliable can you then determine if it is fair, and only if it is determine if it is fair, and only if it is fair, can you then make any fair, can you then make any defensible use of the results.defensible use of the results.

However, having a reliable, fair test However, having a reliable, fair test does not guarantee valid use.does not guarantee valid use.

Validity RecapValidity Recap

Not a property of the testNot a property of the test Not essentially statisticalNot essentially statistical

Interpretation of resultsInterpretation of results Establishing meaning in contextEstablishing meaning in context Requires professional judgmentRequires professional judgment

Types/Sources of Types/Sources of ValidityValidityInternal ValidityInternal Validity FaceFace ContentContent ResponseResponse Criterion (int)Criterion (int) ConstructConstruct

External ValidityExternal Validity Criterion (ext)Criterion (ext)

ConcurrentConcurrent PredictivePredictive

ConsequentialConsequential

Internal ValidityInternal Validity

PracticalPractical

ContentContent ResponseResponse Criterion (int)Criterion (int)

Not so muchNot so much

FaceFace ConstructConstruct

External ValidityExternal ValidityCriterion (ext)Criterion (ext) Usually statistical Usually statistical

(measures of (measures of association or association or correlation)correlation)

Requires the existence Requires the existence of other tests or points of other tests or points of quantitative of quantitative comparisoncomparison

May require a “known May require a “known good” assumptiongood” assumption

ConsequentialConsequential Relates directly to the Relates directly to the

“correctness” of “correctness” of decisions based on decisions based on resultsresults

Usually established Usually established over multiple cases over multiple cases and timeand time

To Validate or Not to To Validate or Not to Validate…Validate…

……is that the question?is that the question?

Decision-making without Decision-making without data…data…

is just guessing.is just guessing.

But use of improperly But use of improperly validated data…validated data…

……leads to confidently leads to confidently arriving at potentially arriving at potentially

false conclusions.false conclusions.

Practical RealitiesPractical Realities Although validity is not a statistical Although validity is not a statistical

property of a test, both quantitative and property of a test, both quantitative and qualitative methods are used to establish qualitative methods are used to establish evidence for the validity for any particular evidence for the validity for any particular useuse

Many of these methods are beyond the Many of these methods are beyond the scope of what most schools/districts can scope of what most schools/districts can do for themselves…but there are things do for themselves…but there are things you can doyou can do

Clear PurposeClear Purpose

Be clear and explicit about the Be clear and explicit about the intended purpose for which a test is intended purpose for which a test is developed and how the results are to developed and how the results are to be usedbe used

Documented ProcessDocumented Process

Implementing the process outlined in Implementing the process outlined in this training, with fidelity, will provide this training, with fidelity, will provide a big step in this direction, especially a big step in this direction, especially if you if you document what you are doingdocument what you are doing

First Internal ,First Internal , Then Then ExternalExternal Focus first on Internal ValidityFocus first on Internal Validity

ContentContent ResponseResponse CriterionCriterion

Focus next on External ValidityFocus next on External Validity ConcurrentConcurrent PredictivePredictive ConsequentialConsequential

Content &Content &Criterion EvidenceCriterion Evidence

Using test blueprints to design and Using test blueprints to design and explicitly document the relationship explicitly document the relationship (alignment and coverage) of the items (alignment and coverage) of the items on a test to content standardson a test to content standards

Specifying appropriate numbers, Specifying appropriate numbers, types, and levels of items for the types, and levels of items for the content to be assessedcontent to be assessed

More on Content & More on Content & Criterion ValidityCriterion Validity Have test items written and reviewed by Have test items written and reviewed by

people with content/assessment expertisepeople with content/assessment expertise Use a defined process such as the one Use a defined process such as the one

described in this seriesdescribed in this series Review for bias and other criteriaReview for bias and other criteria Create rubrics, scoring guides, or answer Create rubrics, scoring guides, or answer

keys as neededkeys as needed Check everything for accuracyCheck everything for accuracy

It’s Not Just the Items…It’s Not Just the Items…

Establish/document administration Establish/document administration proceduresprocedures

Determine how the results will be Determine how the results will be reported and to whom. Develop draft reported and to whom. Develop draft reporting formats.reporting formats.

Field Testing and Field Testing and ScoringScoring Field test your assessmentField test your assessment Evaluate the test administrationEvaluate the test administration For open-ended items, train scorers and For open-ended items, train scorers and

check that scoring is consistent (establish check that scoring is consistent (establish inter-rater reliability if possible)inter-rater reliability if possible)

Create annotated scoring guides using Create annotated scoring guides using actual (anonymous) student papers as actual (anonymous) student papers as exemplarsexemplars

Field Test Results AnalysisField Test Results Analysis

Analyze the field test results for reliability, Analyze the field test results for reliability, bias, and response patternsbias, and response patterns

Make adjustments based on this analysisMake adjustments based on this analysis Report results to field testers and evaluate Report results to field testers and evaluate

their ability to interpret the data and make their ability to interpret the data and make correct inferences/decisionscorrect inferences/decisions

Repeat the field testing if neededRepeat the field testing if needed

How Good is Good Enough?How Good is Good Enough?

Establish your initial performance Establish your initial performance standards in light of your field test standards in light of your field test data, and adjust if neededdata, and adjust if needed

Consider external validity by Consider external validity by “comparing” pilot results to results “comparing” pilot results to results from other “known good” tests or data from other “known good” tests or data pointspoints

Ready, Set, Go! (?)Ready, Set, Go! (?)

When the test “goes live”:When the test “goes live”: Take steps to ensure the it is Take steps to ensure the it is

administered properlyadministered properly Monitor and document thisMonitor and document this Note any anomaliesNote any anomalies

Behind the ScenesBehind the Scenes

Ensure that tests are scored Ensure that tests are scored accurately.accurately.

Pay particular attention to the scoring Pay particular attention to the scoring of open-ended items.of open-ended items.

Use a process that allows you to Use a process that allows you to check on inter-rater reliability, at least check on inter-rater reliability, at least on a sample basison a sample basis

Making MeaningMaking Meaning

Ensure that test results are reported:Ensure that test results are reported: Using previously developed formatsUsing previously developed formats To the correct usersTo the correct users In a timely fashionIn a timely fashion

Follow up on whether the users Follow up on whether the users can/do make meaningful use of the can/do make meaningful use of the resultsresults

Download - Michigan Assessment Consortium Common Assessment Development Series Establishing Validity

Top Related