validation of score meaning and justification of a score use: a comprehensive model of defensible...

49
Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice Triangle Assessment Networking Group 22 February 2017 Gregory J. Cizek Guy B. Phillips Distinguished Professor of Educational Measurement and Evaluation School of Education, University of North Carolina-Chapel Hill

Upload: castle-worldwide-inc

Post on 12-Apr-2017

13 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

Validation of Score Meaning and Justification of a Score

Use:

A Comprehensive Model of Defensible Testing Practice

Triangle Assessment Networking Group22 February 2017

Gregory J. CizekGuy B. Phillips Distinguished Professor of Educational Measurement and

EvaluationSchool of Education, University of North Carolina-Chapel Hill

[email protected]

Page 2: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

Overview

I. Some FoundationsII. A Major Flaw in Current Validity TheoryIII. A Revised FrameworkIV. Questions, Comments...

Page 3: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

Some Foundations* Measurement instruments are integral

to training, practice, development and research in the social sciences.

3

Page 4: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

Some Foundations* Measurement instruments are integral

to training, practice, development and research in the social sciences.

* The quality of the data yielded by these instruments is the focus of measurement specialists and an essential concern of test users and consumers of test information.

4

Page 5: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

Some Foundations (cont’d)

Professional standards applicable to these instruments have existed for over 50 years:

*Technical Recommendations for Psychological Tests and Diagnostic Techniques (APA, 1954)

* Standards for Educational and Psychological Testing (AERA, APA, NCME, 2014)

5

Page 6: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

Validity is important.

“one of the major deities in the pantheon of the psychometrician”

(Ebel, 1961, p. 640)

6

Page 7: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

Validity is important.

“one of the major deities in the pantheon of the psychometrician” (Ebel, 1961, p. 640)

“the most fundamental consideration in developing tests and evaluating tests” (AERA, APA, NCME, 2014, p. 11)

7

Page 8: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

Validity has problems.

“For a concept that is the foundation of virtually all aspects of our measurement work, it seems that the term validity continues to be one of the most misunderstood or widely misused of all.”

(Frisbie, 2005, p. 21)

8

Page 9: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

9

The Canon

1. Validity pertains to score inferences.

“I don’t want to know which questions you answered correctly. I want to know how much....you know. I need to leap from what I know and don’t want to what I want but can’t know. That’s called inference.”

(Wright, 1994)

Page 10: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

10

The Canon (cont’d)

2. Validity is not a characteristic of a test.

“One validates, not a test, but an interpretation of data arising from a specified procedure.”

(Cronbach, 1971, p. 447)

Page 11: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

11

The Canon (cont’d)

3. Validity is a unitary concept; all validity is construct validity.

“What is singular in the unified theory is the kind of validity: All validity is of one kind, namely, construct validity.” (Messick, 1998, p. 37)

Page 12: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

12

The Canon (cont’d)

4. Validity is a matter of degree.

“Validity statements are not dichotomous (valid/ invalid) but rather are described on a continuum.”

(Zumbo, 2007, p.

50)

Page 13: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

13

The Canon (cont’d)

5. Validation involves gathering and evaluating evidence bearing on intended test score inferences.

“Validity is an integrated, or unified, evaluation of the [score] interpretation.”

(Kane, 2001, p. 329)

Page 14: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

14

The Canon (cont’d)

6. Validation is an on-going endeavor.

“Every treatise on the topic [indicates that] construct validation is a never-ending process.”

(Shepard, 1993, p. 407)

Page 15: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

15

The Canon (cont’d)

Five Sources of Validity Evidence:Evidence based on...

1) test content2) response process3) internal structure4) relations to other variables5) consequences of testing

Page 16: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

A Major Flaw

“Validity is an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores....”

(Messick, 1989, p. 13, emphasis added)

Page 17: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

Two Problems:

17

Page 18: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

Two Problems:

1. conflates validity (a property) with validation

(a process)

18

Page 19: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

Two Problems:

1. conflates validity (a property) with validation (a process)

2. requires integration of that which cannot be integrated

19

Page 20: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

Is it even possible?….

* “Validity for a specific test use” is nonsensical:

Once evidence bearing on the intended score interpretation has been gathered, the validity—that is, the meaning of scores—does not change depending on the consequences of test use, or whenever a new use is envisioned.

20

Page 21: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

Is it even possible?….* The value of evidence based on most of the currently accepted sources of validity is straightforward; however, evidence based on consequences is not unequivocally interpretable as positive or negative.

21

Page 22: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

Is it even possible?….* The value of evidence based on most of the currently accepted sources of validity is straightforward; however, evidence based on consequences is not unequivocally interpretable as positive or negative.

* Evidence about the extent to which a test yields accurate inferences and information about real or potential consequences of using the test are not compensatory in any logical sense.

22

Page 23: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

Is it even possible?….* The value of evidence based on most of the currently accepted sources of validity is straightforward; however, evidence based on consequences is not unequivocally interpretable as positive or negative.

* Evidence about the extent to which a test yields accurate inferences and information about real or potential consequences of using the test are not compensatory in any logical sense.

* Any attempted integration confounds conclusions about both the scientific meaning of the score and the desirability of using the test. 23

Page 24: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

Example

* Consider a total raw score on a test comprising 20 items measuring French vocabulary and 20 items measuring Geometry.

* An overall score could be calculated, but what would be a defensible interpretation of a total score of, say, 30?

24

Page 25: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

The Most Telling Finding:

In over 20 years, no example of Messick’s proposed outcome (i.e., a synthesis of theoretical, empirical, and social consequences data yielding an overall, integrated judgment about validity) has ever been produced.

Validation practice is the victim.

25

Page 26: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

First: An Improved Definition

“Validity is the degree to which scores on an appropriately administered instrument support inferences about variation in the characteristic that the instrument was developed to measure.”

(Cizek, 2012, p. 5)

26

Page 27: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

Second: A Comprehensive Framework for

Defensible Testing

27

Page 28: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

A Comprehensive Framework for Defensible Testing

Validation of Score Meaning

“What is the evidence that the score can be interpreted as intended?”

28

Page 29: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

A Comprehensive Framework for Defensible Testing

Validation of Justification Score Meaning of Score Use

“What is the evidence that “What is the evidence that the score can be interpreted the score should be used as intended?” as proposed?”

29

Page 30: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

A Comprehensive Framework for Defensible Testing

Validation of Justification Score Meaning of Score Use

“What is the evidence that “What is the evidence that the score can be interpreted the score should be used as intended?” as proposed?”

30

Page 31: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

How theStandards for Educational and

Psychological TestingMust Change

31

Page 32: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

The Standards for Educational and Psychological Testing

Evidence based on...

32

Page 33: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

The Standards for Educational and Psychological Testing

Evidence based on...

1) Test Content

33

Page 34: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

The Standards for Educational and Psychological Testing

Evidence based on...

1) Test Content

2) Response Process

34

Page 35: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

The Standards for Educational and Psychological Testing

Evidence based on...

1) Test Content

2) Response Process

3) Internal Structure

35

Page 36: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

The Standards for Educational and Psychological Testing

Evidence based on...

1) Test Content

2) Response Process

3) Internal Structure

4) Relations to Other Variables

36

Page 37: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

The Standards for Educational and Psychological Testing

Evidence based on...

1) Test Content

2) Response Process

3) Internal Structure

4) Relations to Other Variables

5) Consequences of Testing37

Page 38: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

The Standards for Educational and Psychological Testing

Evidence based on...

1) Test Content

2) Response Process

3) Internal Structure

4) Relations to Other Variables

5) Consequences of Testing38

Page 39: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

A Comprehensive Framework for Defensible Testing

Standards for Justification Educational and of Score Use Psychological Test “What is the evidence that Use the score should be used

as proposed?”

39

Page 40: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

Standards for Educational and Psychological Test Use

A beginning outline of sources of evidence

40

Page 41: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

Sources of Evidence for Justifying Test Use

Examples

Evidence Based on Consequences of Testing

* Evaluation of anticipated benefits* Consideration of negative consequences (e.g., opt-out, curriculum narrowing, etc.)* Consideration of false positive/ negative rates

Evidence Based on Costs of Testing* Overall cost of testing* Cost-benefit evaluation* Consideration of opportunity costs (e.g., instructional time)

Evidence Based on Alternatives to Testing

* Evaluation of relative value of alternative testing methods, formats, or procedures* Evaluation of non-test alternatives to accomplish intended goals 

Evidence Based on Fundamental Fairness

* Evaluation of stakeholder inclusion* Investigation of opportunity to learn* Provision of due notice* Examination of differential impact across relevant subpopulations

Page 42: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

The Standards for Educational and Psychological Testing

Evidence based on...

1) Test Content

2) Response Process

3) Internal Structure

4) Relations to Other Variables

42

Page 43: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

The Standards for Educational and Psychological Testing

Evidence based on...

1) Test Content

2) Response Process

3) Internal Structure

4) Relations to Other Variables

43

Page 44: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

The Standards for Educational and Psychological Testing

Evidence based on...

1) Test Content

2) Response Process

3) Hypothesized Relationships among Variables

44

Page 45: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

The Standards for Educational and Psychological Testing

Evidence based on...

1) Test Content

2) Response Process

3) Hypothesized Relationships among Variables

4) Test Development and Administration Procedures

45

Page 46: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

Sources of Evidence Supporting Score Meaning

 Source

 Examples

Evidence Based on Test Content

* Grounding in relevant theoretical dimensions or relationships* Job analyses* Content/curricular alignment studies

Evidence Based on Response Processes

* Cognitive labs* Think-aloud protocols* Cognitive mapping

46

Page 47: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

Sources of Evidence Supporting Score Meaning

Source Examples

Evidence Based on Analyses of Hypothesized Relationships

Internal* Coefficient alpha, KR-20* Confirmatory factor analysis* Correlations among subscoresExternal* Correlations with criterion variables; convergent and discriminant analyses* Investigations of mean differences for relevant groups (e.g., treated/untreated; pre-treatment/ post-treatment; males/females, etc.)* Multi-trait, multi-method analyses

Evidence Based onTest Development and Administration Processes

* Item/task generation procedures* Bias/sensitivity reviews* Test administration and scoring procedures* Test security procedures

Page 48: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

Conclusion

“In developing [the Standards], the organizations have put the requirement for evaluation of proposed interpretations and uses under the heading of validity. We can change that and put some or all of these issues under some other heading... but if we do so, we will have to reformulate much of the standard advice provided to test developers and test users.”

(Kane, 2009, p. 62)

Page 49: Validation of Score Meaning and Justification of a Score Use: A Comprehensive Model of Defensible Testing Practice

Questions? Comments?

Thank you...

49