validity/reliability matters
DESCRIPTION
Validity/Reliability Matters. Really?. Can a test be valid and not be reliable?. Can a test be reliable and not be valid?. Justifiable Relevant True to its purpose (consistently). Validity. Validity. Design Issues Application Issues. Validity. Design Issues Application Issues. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/1.jpg)
Validity/Reliability MattersReally?
Beverly Mitchell, Kennesaw State University
![Page 2: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/2.jpg)
Can a test be valid and not be reliable?
Beverly Mitchell, Kennesaw State University
![Page 3: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/3.jpg)
Can a test be reliable and not be valid?
Beverly Mitchell, Kennesaw State University
![Page 4: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/4.jpg)
JustifiableRelevant
True to its purpose(consistently)
Validity
Beverly Mitchell, Kennesaw State University
![Page 5: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/5.jpg)
Validity
Design Issues
Application Issues
Beverly Mitchell, Kennesaw State University
![Page 6: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/6.jpg)
Validity
Design IssuesApplication Issues
Beverly Mitchell, Kennesaw State University
![Page 7: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/7.jpg)
Design: Creating the Instrument
1-Inference 2-Complexity
Beverly Mitchell, Kennesaw State University
![Page 8: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/8.jpg)
Inference
Low High
Beverly Mitchell, Kennesaw State University
![Page 9: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/9.jpg)
HighInference
To draw a conclusion
To guess, surmise
To suggest, hint
Beverly Mitchell, Kennesaw State University
![Page 10: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/10.jpg)
LowInferenceStraightforward
Language = precise & targeted
Clear – no competing interpretations of words
No doubt as to what point is being made
Beverly Mitchell, Kennesaw State University
![Page 11: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/11.jpg)
Inference
Low High
Beverly Mitchell, Kennesaw State University
![Page 12: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/12.jpg)
Complexity
Low High
Beverly Mitchell, Kennesaw State University
![Page 13: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/13.jpg)
HighComplexity
Complicated
Comprised of interrelated parts or sections
Developed with great care or with much detail
Beverly Mitchell, Kennesaw State University
![Page 14: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/14.jpg)
LowComplexity
Simplistic
Plain
Unsophisticated
Beverly Mitchell, Kennesaw State University
![Page 15: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/15.jpg)
Complexity
Low High
Beverly Mitchell, Kennesaw State University
![Page 16: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/16.jpg)
Low HighInference
Complexity
Low
High
How They Are Related
Beverly Mitchell, Kennesaw State University
![Page 17: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/17.jpg)
Low HighInference
Complexity
Low
High
Designing the Instrument
Beverly Mitchell, Kennesaw State University
![Page 18: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/18.jpg)
Low HighInference
Complexity
Low
High
Due “Yesterday”!
Beverly Mitchell, Kennesaw State University
![Page 19: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/19.jpg)
Low HighInference
Complexity
Low
High
“Overachieving”
Beverly Mitchell, Kennesaw State University
![Page 20: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/20.jpg)
Low HighInference
Complexity
Low
High
How Much Error Are You Willing to Risk?
Error
Error
Beverly Mitchell, Kennesaw State University
![Page 21: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/21.jpg)
Low HighInference
Complexity
Low
High
Compromise
Beverly Mitchell, Kennesaw State University
![Page 22: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/22.jpg)
Does the OBSERVED Behavior = TrueBehavior?
Observed SCORE ≠ TRUE SCORE
E R R O R
Beverly Mitchell, Kennesaw State University
![Page 23: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/23.jpg)
Design: Creating the Instrument
1-Inference
General Rubric - high
Qualitative analytic rubric – low
2-Complexity
Easy to develop – question worthiness, guidance, single interpretation - low
Time to develop – labor intensive, onerous, long - high
Beverly Mitchell, Kennesaw State University
![Page 24: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/24.jpg)
Validity
Design Issues
Application Issues
Beverly Mitchell, Kennesaw State University
![Page 25: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/25.jpg)
Application Issues
Designated Use
Limitations/Conditions
Beverly Mitchell, Kennesaw State University
![Page 26: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/26.jpg)
Application Issues
Designated UseDon’t borrow from neighbor!
Beverly Mitchell, Kennesaw State University
![Page 27: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/27.jpg)
Application Issues
Limitations/ConditionsOne size does not fit all or apply to all circumstances
Beverly Mitchell, Kennesaw State University
![Page 28: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/28.jpg)
Ways to Increase Probability for Accuracy
Compare language: standards & concepts
The concepts/expectations in the standards are apparent in the assessments – same depth and breadth
Good example of Content Validity
Behavior (performance) expected in the standard matches the performance expected in the assessment – i.e., knowledge of…demonstrating skill…
Identify Key/Critical items/concepts to evaluate
Give it away for analysis (many eyes)
Invite external “expert” review
Be receptive to feedback
Surveys from P-12 partners, candidates
Regular evaluation and analysis: revise, revise, revise
Awareness of design and application issues
Beverly Mitchell, Kennesaw State University
![Page 29: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/29.jpg)
Ways to Increase Reliability
Begin with a valid instrument Two reliability issues:
Reliability of the instrument: repeated use of instrument by same evaluators
If problematic: revise, re-think, abandon Reliability of the scoring: performance rated same by
different evaluators, i.e., objectivity If problematic: ensure qualifications of evaluators, check
rubric, check language, minimize generalized concepts applied to all subject areas
Train evaluators frequently
Beverly Mitchell, Kennesaw State University
![Page 30: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/30.jpg)
AN APPLICATION: A KSU Workshop (Handouts Available)
Thirty experienced teachers participated in a daylong workshop to help us evaluate three student teaching observation rating forms.
Beverly Mitchell, Kennesaw State University
![Page 31: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/31.jpg)
Three Instruments
Traditional Candidate Performance Instrument (CPI) Observation of Student Teaching. Observer is asked to indicate strengths and weaknesses and areas for improvement in three broad outcomes (Subject matter, Facilitation of Learning, and Collaborative Professional).
Modified CPI Observation of Student Teaching (Observer is asked to explicitly rate each proficiency within each outcome and then provide narrative indicating any strengths, weaknesses, suggestions for improvement.
Formative Analysis Class Keys: Observer is asked to rate 26 elements from Georgia Department of Education’s Class Keys. No required narrative.
Beverly Mitchell, Kennesaw State University
![Page 32: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/32.jpg)
Generally we were interested in two areas……………….
Validity/Accuracy – Which instrument provides us the best inference about the present of positive behaviors (proficiencies) we deem important? AND
Reliability/Consistency – Which instrument demonstrates the best inter-rater reliability?
Beverly Mitchell, Kennesaw State University
![Page 33: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/33.jpg)
Study Design Instrument Group 1 Group 2 Group 3
Period 1:Traditional CPI-Narrative
Video A Video B Video C
Period 2: Modified CPI Rating and Narrative
Video B Video C Video A
Period 3: Class Key Formative Analysis
Video C Video A Video B
Beverly Mitchell, Kennesaw State University
![Page 34: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/34.jpg)
Reliability
Strongest inter-rater agreement between Modified CPI with performance level rating followed by Class Keys Formative Assessment Instrument with a performance level rating.
Very little agreement between behaviors noted in Traditional CPI narratives and no performance level ratings were available. Probably not a reliable instrument for rating student teaching behaviors.
Beverly Mitchell, Kennesaw State University
![Page 35: Validity/Reliability Matters](https://reader035.vdocument.in/reader035/viewer/2022081506/568131a5550346895d9814b9/html5/thumbnails/35.jpg)
Validity
Both the traditional CPI and Modified CPI are explicitly aligned with institutional (and other) standards but the Traditional CPI is a global assessment and the Modified CPI requires a rating and narrative for each proficiency.
However, the traditional CPI has not demonstrated reliability….so
Participants were also asked to provide information about the language, clarity, ease of use for all instruments.
Beverly Mitchell, Kennesaw State University