psychometrics 101: foundational knowledge for testing professionals steve saladin, ph.d. university...
TRANSCRIPT
![Page 1: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/1.jpg)
Psychometrics 101: Foundational Knowledge for Testing Professionals
Steve Saladin, Ph.D.
University of Idaho
![Page 2: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/2.jpg)
Criterion-referenced vs norm-referenced• Is performance rated on some pre-established cut
points or is it based on comparisons with othersClass room grading is generally criterion based
• 90% right=A, 80%=B, 70%=C, etc.• Typically reported as a percentage correct or P/F
Grading on the curve means grade based on comparison with rest of class (norm-referenced) • 80% might be a B, an A, a C or something else.
![Page 3: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/3.jpg)
• Standardized tests are typically norm-referencedSAT, ACT, GRE, IQ testTypically reported as percentile or standard score
• Certification exams are often criterion-referencedProctor certification, licensing examsTypically reported as percentage correct or P/F
• Sometimes you get a mixGED uses norms to establish cut-scores
• Important to note difference between percentile and percentage correct
Criterion-referenced vs norm-referenced
![Page 4: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/4.jpg)
Damn the Statistics & full speed ahead!
• Testing is all about quantifying something about people (skills, knowledge, behavior, etc.)
• Stats are just a way to describe the numbers Make it more understandable Reveal relationships
• To understand norm-referenced test scores, you need to know two general things What is the typical score? To what degree did others score differently?
![Page 5: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/5.jpg)
What’s typical?
• Mean
• Median
• Mode
How different are the scores?• Range = highest – lowest = 40
• Variance = average of squared differences from mean = 163.6
• Standard Deviation = square root of Variance = 12.8
10 10 20 20 30 30 40 40 40 40 50
= arithmetic average = 30
= # in the middle = 30
= most frequently occurring # = 40
![Page 6: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/6.jpg)
Standard Normal Distribution
• Normal Curve
• Assumes trait is normally distributed in population
Mean
Standard deviation
![Page 7: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/7.jpg)
The Normal Curve
%tile <1% 2.5% 16% 50% 84% 97.5% 99.5%GRE 200 300 400 500 600 700 800SAT 200 300 400 500 600 700 800IQ 55 70 85 100 115 130 145ACT 1 6 12 18 24 30 36
![Page 8: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/8.jpg)
How are these things related?
GRE scores and Grad School grades CLEP scores and final exam scores Compass/Accuplacer scores and success in
entry classes Motivation and cheating
• Correlation tells us if things vary or change in a related way Higher GRE scores means higher grades Lower motivation suggests higher levels of
cheating
![Page 9: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/9.jpg)
Some Facts About Correlation• Ranges from +1.0 to -1.0
• Sign tells you direction of correlation + as A gets bigger so does B - as A gets bigger, B gets smaller
![Page 10: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/10.jpg)
How To Lie With Statistics!
• Test Taking linked to Longevity! A recent study found that people who had taken more tests during early adulthood tended to live longer. The number of tests taken between the ages of 16 and 30 correlated strongly with the age of death. The more tests you take, the longer you will live!
![Page 11: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/11.jpg)
Some Facts About Correlation• It is not causation, but can be used to predict
• Small samples may miss relationship
• Heterogeneous samples may miss relationship
0.87
0.78
0.42
![Page 12: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/12.jpg)
Error, Error Everywhere
• No test is perfect, no measurement is perfect
________
• Get more precise, but never get exact
• Score = Truth + Error
![Page 13: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/13.jpg)
Error, Error Everywhere
• Error can be lots of things including The environment The test-taker Procedural variations The test itself
• Since error makes scores inconsistent or unreliable, a measure of reliability of scores is important
![Page 14: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/14.jpg)
Reliability• Test-Retest
Test group on two different occasions and correlate the results
Are results stable over time
• Internal Consistency Correlate score on each item to total Are they all measuring the same thing
• Alternate Forms Develop two versions of same test and correlate
scores on each Are your versions comparable
• All correlations so subject to same problems
![Page 15: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/15.jpg)
So what’s good?
• GRE has reported reliability of 0.89 (Quantitative), 0.92 (Verbal) GRE Guide to Use of Scores, 2007-2008
• ACT Technical Manual reports Composite score reliability of .97
• SAT reports reliabilities of .89-.93 Test Caharacteristics of the SAT on http://
professionals.collegeboard.com/data-reports-research/sat/data-tables
• COMPASS alternate forms reliability reported to be .73-.90 http://www.nationalcommissiononadultliteracy.org/content/
assessmentmellard.pdf
![Page 16: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/16.jpg)
Reliability & Error
• Can’t totally get rid of Error, but can estimate how much is there
• Using reliability you can estimate how much a persons score would vary due to error.
• Standard Error of the Measurement
SEM =SD * an index of the extent to which an individual’s
scores vary over multiple administrations gives the range within which the true score is
likely to exist
![Page 17: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/17.jpg)
SEM for some tests
• GRE Verbal .34, Quantitative .51, so 68% confidence interval for score of 500 is 470-530 for Verbal, 450-550 for Quantitative Only reported in increments of 10 GRE Guide to Use of Scores, 2007-2008
• ACT Composite SEM .91, so 68% confidence interval for score of 20 is 19-21 ACT Technical Manual
• WAIS-IV FSIQ SEM is 2.16, so 68 % confidence interval for score of 100 is 98-102
![Page 18: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/18.jpg)
Does Reliability = Validity?
• Getting a consistent result means reliability
• Having that result be meaningful is validity
• Validity is based on inferences you make from results Test has to be reliable to be valid Test does not have to be valid to be reliable
NO !
![Page 19: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/19.jpg)
Validity
• Any evidence that a test measures what it says it is measuring
• Any evidence that inferences made from the test are useful and meaningful
• 3 types of evidence Content Criterion-Related Construct
![Page 20: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/20.jpg)
Content Validity
• Think of a test as a sample of possible problems/items 4th grade spelling test should be a representative
sample of 4th grade spelling words GRE Quantitative should be a representative
sample of the math problems a grad school applicant might be expected to solve
• Should be part of design Identifying # of algebra, trig, calculus, etc. should
be on test (table of specifications)
• Frequently evaluated by item analysis or expert opinions
![Page 21: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/21.jpg)
Criterion-Related Validity
• How does test score correlate with some external measure (criterion) Placement test score and performance in class Admission test score and GPA for first semester
• Sometimes called Predictive or Concurrent Validity
• Correlation that is effected by error in the test and error in the criterion Only top students take GRE Graduate School grade restriction
![Page 22: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/22.jpg)
To use or not to use….
• Depends on the question…. What is impact of decision? What is cost of using? Of not using?
• Decision Theory can be a guide to determining incremental validity Net gain in using scores
![Page 23: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/23.jpg)
Decision Theory
False negative True positive
True negative False Positive
GPA
GRE score200 400 600 800
A
B
C
Maximize success
![Page 24: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/24.jpg)
Decision Theory
False negative True positive
True negative False Positive
GPA
GRE score200 400 600 800
A
B
C
Maximize opportunity
![Page 25: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/25.jpg)
Predictive Utility
• Effectiveness = True Positive + True Negative
True Pos+False Pos+True Neg+False Neg
Have to weigh effectiveness against cost
![Page 26: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/26.jpg)
Construct Validity
• Most important for psychological test where what you are measuring is abstract or theoretical Intelligence Personality characteristics Attitudes and beliefs
• Usually involves multiple pieces of evidence
![Page 27: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/27.jpg)
Construct Validity
• Convergent—correlates with measures of same thing
• Divergent—does not correlate with measures of something else
• Scores show expected changes after treatment, education, maturation, etc.
• Factor analysis supports expected factor structure
![Page 28: Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho](https://reader035.vdocument.in/reader035/viewer/2022062619/55178f5755034645368b55af/html5/thumbnails/28.jpg)
Things to remember
• The normal curve
• Correlation
• Reliability
• Standard Error of the Measurement
• Validity
• Decision Theory