psychometrics timothy a. steenbergh and christopher j. devers indiana wesleyan university
TRANSCRIPT
A. Psychometrics
• Psychological measurement• Reliability• Validity
• Tests• Items
(Jones & Thissen, 2007; Kaplan & Saccuzzo, 2012)
Adding it up…
Depression Level (True score)
Error
Depression Level
+ Measurement Error
Observed Score
C. Reliability
• What does it mean to be reliable?• Consistency of scores over time, across test forms,
or across variable testing conditions• Types of Reliability
• Test-Retest• Inter-item (internal)• Inter-rater
(Anastasi, 1988)
C.1. Test-Retest Reliability
• Are test scores stable over time?• Give test to same group at 2 points in time and
correlate test scores• Must consider stability of construct when
• establishing test-retest interval• interpreting test-retest correlation
C.2. Internal (inter-item) Consistency• Assumption: A composite score has to be made up
of items that are measuring the same phenomenon• Heterogenous items will produce a lower internal
consistency reliability coefficient• Measures of internal consistency:
• Split Half• Cronbach’s Alpha (coefficient α)• Kuder Richardson-20 (KR20; for dichotomous items)
(Pedhazur & Schmelkin, 1991)
Interpreting Reliability Coefficients• What is a reasonable level of reliability?
• Research ≥ .80• Clinical ≥ .90
• Factors to consider when evaluating a reliability coefficient:
• Stability of construct• Dimensional nature of construct (uni- vs. multi-)• Number of items (short tests are less reliable)
C.3. Inter-Rater Reliability
• Accuracy (consistency) with which different raters arrive at the same scores
• Extremely important for tests that require any rater judgment (eg, WAIS vocabulary)
• Agreement is computed with Kappa statistic• Ranges from -1.0 - +1.0• K = 1.0 perfect agreement, 0 chance agreement, -
1.0 less than chance agreement• .40 - .75 “fair” • >.75 “excellent”
(Fleiss, 1981)
D. Validity
• If something is valid, what does that mean?• Validity: degree to which a test measures that
which it purports to measure• Types
• Content• Criterion-related• Construct
D.1. Content Validity
• How well does the instrument sample from the domain of interest?
• Lack of adequate item sampling can lead to invalid findings
• Examples• GBQ (see p. 144 of article)• WAIS
• Assess with Expert raters
D.2. Criterion-Related Validity• Does the test score correlate with other measures
as we would expect? • Concurrent validity: test score relates to a criterion
measured at the same time• Predictive validity: test score predicts a future
criterion• Validity coefficient: correlation coefficient between
test score and criterion measure
D.3. Construct Validity
• Is there evidence that the measure adequately assesses the construct of interest?
• Do test scores change over time or as a result of certain events, as theorized?
• Are items homogeneous, or do certain items “hang together?” (Factor Analysis)
Factor Analysis
• Statistical method for examining underlying constructs (latent traits) within a test
• Uses correlation matrices to identify underlying relationships among test items
• Example: GBQ
Overview
• Psychometrics• Psychological
measurement
• Classical Test Theory• Reliability
• Test-Retest• Inter-item (internal)• Inter-rater
• Validity• Content• Criterion-related• Construct
(Trochim, 2006)
Resources
• Software• SPSS• PSPP• R
• Videos• Educator.com• CLI: Research Seminars• Andy Field
• Websites• Social Research Methods• Institute for Digital Research and Education• Statistics Help for Students• Stat Pages
References
Anastasi, A. (1988). Psychological testing (6th ed.). New York, NY: MacMillan.Fleiss, J. L. (1981). Statistical methods for rates and proportions (2nd ed.).
New York, NY: John Wiley & Sons.
Jones, L. V., & Thissen, D. (2007). A history and overview of psychometrics. Handbook of statistics, 26, 1-28.
Kaplan, R., & Saccuzzo, D. (2012). Psychological testing: Principles, applications, and issues. Belmont, CA: Cengage Learning.
Kline, T. J. B. (2005). Classical test theory: Assumptions, equations, limitations, and item analyses. In T. J. B. Kline, Psychological testing: A practical approach to design and evaluation (pp. 91-106). Thousand Oaks, CA: Sage.
Pedhazur, E. J., & Schmelkin, L. P. (1991). Measurement, design and analysis: An integrated approach. Hillsdale, NJ: Lawrence Earlbaum.
Trochim, W. M. K. (2006). Reliability and validity. Retrieved from http://www.socialresearchmethods.net/kb/relandval.php