measurement: reliability lu ann aday, ph.d. the university of texas school of public health

24
MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

Upload: miriam-sager

Post on 14-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

MEASUREMENT: RELIABILITY

Lu Ann Aday, Ph.D.The University of Texas School of Public Health

Page 2: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

RELIABILITY: Definition Extent of random variation in

answers to questions as a function of when they are asked (test-retest), who asked them (inter-rater), and the fact that a given question is one of a number of questions that could have been asked to measure the concept of interest (internal consistency).

Page 3: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

RELIABILITY: TypesTest-test reliabilityInter-rater reliabilityInternal consistency reliability

Page 4: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

RELIABILITY: Computation

Requires repeated measures to estimate stability over time (test-retest) or equivalence across data gatherers (inter-rater) or across questions/ items intended to measure the same underlying concept (internal consistency).

Page 5: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

RELIABILITY: Test-retest

Definition: correlation between answers to same question by same respondent at two different points in time

Page 6: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

RELIABILITY: Test-retest

Factors affecting: Vague question wording Transient personal states, e.g., physical or mental

Situational factors, e.g., presence of other people

Page 7: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

RELIABILITY: Test-retest Computation: Compute

correlation coefficient between answers to same question by same respondent at two different points in time:Respondent Q1, Time 1 Q1, Time 21 Agree Agree2 Agree Agree3 Agree Agree44 Agree Agree DisagreeDisagree5 Agree Agree

Page 8: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

RELIABILITY: Test-retest

Correlation coefficients: Interval: Pearson r Ordinal: Spearman rho Nominal: Chi-square-based measures of association

Correlation desired: .70+

Page 9: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

RELIABILITY: Test-retest Comparisons of means:

Interval: paired t-test, repeated measures analysis of variance

Advantages: more accurately take into account

that the first and second measurements are not independent

more directly compare the actual answers at the two points in time

Page 10: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

RELIABILITY: Inter-rater Definition: correlation between answers to same question by same respondent obtained by different data gatherers at (approximately) the same point in time

Page 11: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

RELIABILITY: Inter-rater

Factors affecting: Lack of adequate interviewer training

Lack of standardization of data collection protocols and procedures

Page 12: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

RELIABILITY: Inter-rater Computation: Compute correlation

coefficient between answers to same question by same respondent obtained by different data gatherers:Respondent Q1, Int. A Q1, Int. B1 BP=140/90 BP=140/90 2 BP=150/80 BP=150/80 3 BP=145/95 BP=145/95 44 BP=145/95BP=145/95 BP=120/80BP=120/805 BP=140/90 BP=140/90

Page 13: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

RELIABILITY: Inter-rater Correlation coefficients:

(correlation coefficients for 3+ data gatherers noted in parentheses):

Interval: Pearson r (eta)

Ordinal: Spearman rho (chi-square)

Nominal: Kappa (chi-square)

Correlation desired: .80+

Page 14: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

RELIABILITY: Internal Consistency Definition: correlation between answers by same respondent to different questions about the same underlying concept (usually summarized in scales)

Page 15: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

RELIABILITY: Internal Consistency Factors affecting:

Number of different questions asked to capture the underlying concept

Level of association (correlation) between answers the same respondents give to different questions about the concept

Page 16: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

RELIABILITY: Internal Consistency

Computation: Compute internal consistency (underlying correlation) coefficients between answers by same respondent to different questions about the same concept:Respondent Q1 Q2 Q3

1 Agree DisagreeDisagree Agree2 Agree DisagreeDisagree Agree3 Agree DisagreeDisagree Agree44 Agree Agree AgreeAgree AgreeAgree5 Agree DisagreeDisagree Agree

Page 17: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

RELIABILITY:Internal Consistency

Internal consistency coefficients Corrected item-total correlation* Split-half reliability coefficient Cronbach alpha coefficient

Coefficient desired: .70+ (group) .90+ (individual) .40+ (corrected item-total)*

Page 18: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

RELIABILITY:Internal Consistency

Computation: Corrected item-total correlation Add up the scores for answers to different

questions about the same concept to create a total score

Subtract the score for answer to a given question from the total score to create item-specific “corrected” total scores

Compute Pearson correlation coefficients between score for each of the items and corresponding “corrected” total score

Page 19: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

RELIABILITY:Internal Consistency

Computation: Split-half reliability coefficient Randomly divide a series of questions

about the same concept into halves and add up the scores for answers to the questions in the respective halves

Compute Spearman-Brown prophecy coefficient for correlation between the scores for each half, adjusting for the fact that the respective scores are based on only half the original number of items

Page 20: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

RELIABILITY:Spearman-Brown prophecy adjustments

Original alpha/ Scale length -/+

-.75 -.67 -.50 2x 3x 4x

.50 .20 .25 .33 .67 .75 .80

.60 .25 .33 .43 .75 .82 .86

.70 .37 .44 .54 .82 .88 .90

.80 .50 .57 .67 .89 .92 .94

.90 .69 .75 .82 .95 .96 .97

Page 21: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

RELIABILITY:Spearman-Brown prophecy formula

Computation:k * ro /1 + [(k-1) * ro] where,

k = factor by which scale is increased or decreased

ro= alpha based on original length

Example:2 * .70/1 + [(2-1) * .70] = .82

Page 22: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

RELIABILITY:Cronbach alpha coefficient Computation:

k * ra /1 + [(k-1) * ra] where,

k = number of items in the scale ra= average Pearson r between

items

Example:10 * .32/1 + [(10-1) * .32] = .82

Page 23: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

WHEN TO UNDERTAKE RELIABILITY ANALYSIS

RELIABILITY/DIMENSIONS

TEST-RETEST

INTER-RATER INTERNALCONSISTENCY

QUESTIONS Concerned about stability of wording

Concerned about equivalence of data gatherers

Constructing summary scales of attitudes or other abstract concepts

STUDIES Esp. important in longitudinal or experimental designs

Monitored, but not usually measured directly in surveys

Esp. used in attitudinal surveys

STAGES Pilot test or pretest

Pretest plus monitor in final study

Pretest or final study

Page 24: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

REFERENCES DeVellis, Robert F. (2003). Scale

Development: Theory and Applications. Second Edition. Thousand Oaks, CA: Sage.

Ware, J.E., Jr., & Gandek, B., for the IQOLA Project (1998). Methods for testing data quality, scaling assumptions, and reliability: The IQOLA Project Approach. J. Clinical Epidemiology, 51 (11), 945-952.