validity and validation: an introduction note: i have included explanatory notes for each slide. to...
TRANSCRIPT
Validity and Validation: An introduction
Note: I have included explanatory notes for each slide. To access these, you will probably have to save the file to your
computer, then look at it in the “normal view”
Conceptions of Validity1. Does this test measure what it is supposed to
measure?2. Does the measure correlate with other (perhaps
more expensive or invasive) measures of the concept?
3. Validity is present when the attribute to be measured causes variations in the measurement outcomes
4. What interpretations can fairly be placed on scores produced by this measurement?
5. How well does this measure fit in with a hypothesized network of related, and contrasting, constructs?
The Poetry of Donald Rumsfeldmay offer insights into validity...?
The UnknownAs we know, There are known knowns: there are things we know we know. We also know There are known unknowns. That is to say We know there are some things We do not know. But there are also unknown unknowns,
The ones we don't know We don't know. —Feb. 12, 2002, Department of Defense news briefing
ClarityI think what you'll find, I think what you'll find is, Whatever it is we do substantively, There will be near-perfect clarity As to what it is. And it will be known, And it will be known to the Congress, And it will be known to you, Probably before we decide it, But it will be known. —Feb. 28, 2003, Department of Defense briefing
Source: http://www.slate.com/id/2081042/
Reliability and ValidityReliability
Low High
Validity Low
High
•
••
•
•
•
•
• •
••
•
•••• ••
•••• ••
Biasedresult!
☺Average of these inaccurate results is not bad. This is probably how screening questionnaires (e.g., for depression) work
Validity viewed as error in measurement. Types of Error in
Repeated Measurements
True value
¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦
¦ ¦ ¦ ¦ ¦ ¦ ¦
individual measures
Bias!
Randomerror
The Sequence of Validation Checks (1)
“Does it measure what it’s supposed to measure?”
1. First, decide and define what you want to measure (if it’s an abstract concept, explain the conceptual basis)
2. Select indicators or items that represent that topic (this involves content validity: a sampling of potential questions)
3. Check that items are clear, comprehensible and relevant (face validity, “sensibility”)
4. This produces a pool of items ready for the item analysis stage, which involves administering the test and analyzing responses (next slide).
Validation Sequence (2): Checking the internal structure
“Item analysis” refers to a series of checks on the performance of each “item” (or question). Some analyses fall under the heading of reliability, some validity. Faulty items are discarded or replaced. Analyses include:
2.1 Item distributions & missing values: an item that does not vary or that people don’t answer cannot measure anything
2.2 Correlations among items, maybe using factor analysis
2.3 Item response theory (IRT) analyses.
Validation Sequence (3):External associations
3.1 Compare the measure against a criterion, if a “gold standard” exists. Sensitivity & specificity are the normal statistics.
3.2 Where there is no single, clear gold standard you can use correlations with other indicators. This leads to construct validation. You begin from a set of hypotheses covering the expected relationships among as wide a range of indicators as possible.
3.3 Correlations often divided into convergent and discriminant coefficients, according to hypothesized associations.
These analyses tend to use the entire test administered to selected samples; inadequate performance leads back to basic design.
Validation Sequence (4):Group Discrimination
Once you show that test scores correlate with other measures as intended, its actual performance is evaluated in rating groups of respondents. Analyses generally use representative samples
4.1 “Known groups” (can it distinguish well from sick? Similar to criterion validity)
4.2 Responsiveness: sensitivity to change over time (which is important in an evaluative measure)
4.3 Do scores show ceiling or floor effects?
Conclusion
• Validation is rarely complete. Many instruments continue to be checked for validity years after their invention. Times change, phrasing makes old items obsolete, and you can also test its validity for different purposes.
• Validation is long and expensive. Basic test development and validation may take 3 - 5 years: it’s not a thesis project.
• Remember: validity is about the interpretation of scores. It is a relative concept: a test is not valid or invalid, but only valid or not for a particular application.
• But recall the Viagra principle: a test intended for one purpose may prove good for an unanticipated application.