measurement: reliability lu ann aday, ph.d. the university of texas school of public health

MEASUREMENT: RELIABILITY

Lu Ann Aday, Ph.D.The University of Texas School of Public Health

RELIABILITY: Definition Extent of random variation in

answers to questions as a function of when they are asked (test-retest), who asked them (inter-rater), and the fact that a given question is one of a number of questions that could have been asked to measure the concept of interest (internal consistency).

RELIABILITY: TypesTest-test reliabilityInter-rater reliabilityInternal consistency reliability

RELIABILITY: Computation

Requires repeated measures to estimate stability over time (test-retest) or equivalence across data gatherers (inter-rater) or across questions/ items intended to measure the same underlying concept (internal consistency).

RELIABILITY: Test-retest

Definition: correlation between answers to same question by same respondent at two different points in time


Factors affecting: Vague question wording Transient personal states, e.g., physical or mental

Situational factors, e.g., presence of other people

RELIABILITY: Test-retest Computation: Compute

correlation coefficient between answers to same question by same respondent at two different points in time:Respondent Q1, Time 1 Q1, Time 21 Agree Agree2 Agree Agree3 Agree Agree44 Agree Agree DisagreeDisagree5 Agree Agree


Correlation coefficients: Interval: Pearson r Ordinal: Spearman rho Nominal: Chi-square-based measures of association

Correlation desired: .70+

RELIABILITY: Test-retest Comparisons of means:

Interval: paired t-test, repeated measures analysis of variance

Advantages: more accurately take into account

that the first and second measurements are not independent

more directly compare the actual answers at the two points in time

RELIABILITY: Inter-rater Definition: correlation between answers to same question by same respondent obtained by different data gatherers at (approximately) the same point in time

RELIABILITY: Inter-rater

Factors affecting: Lack of adequate interviewer training

Lack of standardization of data collection protocols and procedures

RELIABILITY: Inter-rater Computation: Compute correlation

coefficient between answers to same question by same respondent obtained by different data gatherers:Respondent Q1, Int. A Q1, Int. B1 BP=140/90 BP=140/90 2 BP=150/80 BP=150/80 3 BP=145/95 BP=145/95 44 BP=145/95BP=145/95 BP=120/80BP=120/805 BP=140/90 BP=140/90

RELIABILITY: Inter-rater Correlation coefficients:

(correlation coefficients for 3+ data gatherers noted in parentheses):

Interval: Pearson r (eta)

Ordinal: Spearman rho (chi-square)

Nominal: Kappa (chi-square)

Correlation desired: .80+

RELIABILITY: Internal Consistency Definition: correlation between answers by same respondent to different questions about the same underlying concept (usually summarized in scales)

RELIABILITY: Internal Consistency Factors affecting:

Number of different questions asked to capture the underlying concept

Level of association (correlation) between answers the same respondents give to different questions about the concept

RELIABILITY: Internal Consistency

Computation: Compute internal consistency (underlying correlation) coefficients between answers by same respondent to different questions about the same concept:Respondent Q1 Q2 Q3

1 Agree DisagreeDisagree Agree2 Agree DisagreeDisagree Agree3 Agree DisagreeDisagree Agree44 Agree Agree AgreeAgree AgreeAgree5 Agree DisagreeDisagree Agree

RELIABILITY:Internal Consistency

Internal consistency coefficients Corrected item-total correlation* Split-half reliability coefficient Cronbach alpha coefficient

Coefficient desired: .70+ (group) .90+ (individual) .40+ (corrected item-total)*


Computation: Corrected item-total correlation Add up the scores for answers to different

questions about the same concept to create a total score

Subtract the score for answer to a given question from the total score to create item-specific “corrected” total scores

Compute Pearson correlation coefficients between score for each of the items and corresponding “corrected” total score


Computation: Split-half reliability coefficient Randomly divide a series of questions

about the same concept into halves and add up the scores for answers to the questions in the respective halves

Compute Spearman-Brown prophecy coefficient for correlation between the scores for each half, adjusting for the fact that the respective scores are based on only half the original number of items

RELIABILITY:Spearman-Brown prophecy adjustments

Original alpha/ Scale length -/+

-.75 -.67 -.50 2x 3x 4x

.50 .20 .25 .33 .67 .75 .80

.60 .25 .33 .43 .75 .82 .86

.70 .37 .44 .54 .82 .88 .90

.80 .50 .57 .67 .89 .92 .94

.90 .69 .75 .82 .95 .96 .97

RELIABILITY:Spearman-Brown prophecy formula

Computation:k * ro /1 + [(k-1) * ro] where,

k = factor by which scale is increased or decreased

ro= alpha based on original length

Example:2 * .70/1 + [(2-1) * .70] = .82

RELIABILITY:Cronbach alpha coefficient Computation:

k * ra /1 + [(k-1) * ra] where,

k = number of items in the scale ra= average Pearson r between

items

Example:10 * .32/1 + [(10-1) * .32] = .82

WHEN TO UNDERTAKE RELIABILITY ANALYSIS

RELIABILITY/DIMENSIONS

TEST-RETEST

INTER-RATER INTERNALCONSISTENCY

QUESTIONS Concerned about stability of wording

Concerned about equivalence of data gatherers

Constructing summary scales of attitudes or other abstract concepts

STUDIES Esp. important in longitudinal or experimental designs

Monitored, but not usually measured directly in surveys

Esp. used in attitudinal surveys

STAGES Pilot test or pretest

Pretest plus monitor in final study

Pretest or final study

REFERENCES DeVellis, Robert F. (2003). Scale

Development: Theory and Applications. Second Edition. Thousand Oaks, CA: Sage.

Ware, J.E., Jr., & Gandek, B., for the IQOLA Project (1998). Methods for testing data quality, scaling assumptions, and reliability: The IQOLA Project Approach. J. Clinical Epidemiology, 51 (11), 945-952.

measurement: reliability lu ann aday, ph.d. the university of texas school of public health

Documents

time slide

concept slide

agreeagree slide

procedures slide

time testretest

testretest computation

testretest definition

testretest factors