reliability

Reliability Testing for Item AnalysisReliability Testing for Item Analysis

Dr. Debdulal Dutta Roy , Ph.D. Dr. Debdulal Dutta Roy , Ph.D. Psychology Research Unit Psychology Research Unit

Indian Statistical Institute Indian Statistical Institute 203, B.T. Road 203, B.T. Road

Kolkata - 700108 Kolkata - 700108 E-mail: [email protected]: [email protected]

Fax : 91 - 33 - 25776680Fax : 91 - 33 - 25776680 Tel (o) : 91 - 33 - 2575 3454Tel (o) : 91 - 33 - 2575 3454

Presentation at the Department of Clinical Psychology, Presentation at the Department of Clinical Psychology, Ram Chandra University, ChennaiRam Chandra University, Chennai

21.2.0921.2.09

http://www.isical.ac.in/%7Epsy/index.html

mailto:[email protected]

Reliability AnalysisReliability AnalysisReliability refers to the consistency of Reliability refers to the consistency of scores obtained by the same persons scores obtained by the same persons when reexamined with the same test when reexamined with the same test on different occasions, or with different on different occasions, or with different sets of equivalent items, or under other sets of equivalent items, or under other variable examining conditions variable examining conditions (Anastasi, 1990). It indicates the extent (Anastasi, 1990). It indicates the extent to which individual differences in test to which individual differences in test scores are attributable to “true” scores are attributable to “true” differences in the characteristics under differences in the characteristics under consideration and the extent to which consideration and the extent to which they are attributable to chance errors. they are attributable to chance errors. Reliability of a test is given by the Reliability of a test is given by the proportion of true variance resulting proportion of true variance resulting from the presence of specific situation from the presence of specific situation under consideration and error variance under consideration and error variance resulting from the presence of some resulting from the presence of some factors irrelevant to the present factors irrelevant to the present situation. situation.

Reliability AnalysisReliability AnalysisTime Consistency

Test- Retest reliability

Test-Retest multi-item response consistencyTest-Retest multi-trait consistency

Internal consistency

Split-half reliability

Split-half Canonical correlation

Others

Rational Equivalence (non-metric)

Item-Item Correspondence

Cronbach's alpha (metric data)

Correpondence map of traits

Test-Retest ReliabilityTest-Retest Reliability

Johnny, Johnny, Johnny, Johnny, Yes, Papa, Yes, Papa, Eating sugar? Eating sugar? No, Papa No, Papa Telling lies? Telling lies? No, Papa No, Papa Open your mouth Open your mouth O Ha! Ha! Ha! O Ha! Ha! Ha!

Test retest reliability (8 months interval)

First session Second session

Mean SD Mean SD t-ratio(df=71) Correlation

Application 1A 0.75 0.44 0.65 0.48 1.41

6B 0.78 0.42 0.72 0.45 0.81

7A 0.96 0.2 0.92 0.28 1

9B 0.85 0.36 0.94 0.23 -1.84

17A 0.71 0.46 0.69 0.46 0.19

21A 0.78 0.42 0.93 0.26 -2.99

Total 4.82 1.14 4.86 1.01 -0.23 0.706

Knowledge 1B 0.25 0.44 0.35 0.44 -1.41

2A 0.94 0.23 0.94 0.23 0

4A 0.85 0.36 0.89 0.32 -0.83

14B 0.89 0.32 0.86 0.35 0.53

16B 0.79 0.41 0.63 0.49 2.43

19A 0.74 0.44 0.83 0.38 -1.41

Total 4.46 1.05 4.5 1.01 -0.27 0.93**

Test retest reliability of Reading Motivation Questionnaire (n=72) using both t-test and correlation coefficients

Test-Retest Multi-item responseTest-Retest Multi-item response

All items do not All items do not behave in same behave in same fashion always.fashion always.Identify inconsistent Identify inconsistent items in the set items in the set across periods..across periods..

Last supper: Leonardo Da Vinci

Test-Retest Multi-item response Test-Retest Multi-item response Consistency (8 months interval) Consistency (8 months interval)

Tree Diagram for 6 Variables

Complete Linkage (after 8 months)

Euclidean distances of Items

Reading for application

Linkage Distance

APX

APAB

APP

APN

APM

APH

2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0


Complete Linkage

Euclidean distances

Linkage Distance

APAB

APP

APN

APM

APX

APH

2.5 3.0 3.5 4.0 4.5 5.0 5.5

Test-Retest Multi-Trait Consistency (8 months Test-Retest Multi-Trait Consistency (8 months interval)interval)


Complete Linkage

Euclidean distances

Linkage Distance

HATOT

RCTOT

AETOT

AFFTOT

ACHTOT

KNTOT

APPTOT

5 10 15 20 25 30 35 40

Euclidean distances (test retest reliability.sta)

APPTOT KNTOT AFFTOT RCTOT AETOT HATOT ACHTOTAPPTOT 0.00 13.34 27.96 22.07 25.34 36.22 16.25KNTOT 13.34 0.00 24.45 19.21 21.86 34.50 12.33AFFTOT 27.96 24.45 0.00 13.23 12.88 15.75 18.00RCTOT 22.07 19.21 13.23 0.00 13.15 20.66 13.89AETOT 25.34 21.86 12.88 13.15 0.00 18.00 15.17HATOT 36.22 34.50 15.75 20.66 18.00 0.00 26.91ACHTOT 16.25 12.33 18.00 13.89 15.17 26.91 0.00


Complete Linkage

Euclidean distances

Linkage Distance

HATOT

AETOT

RCTOT

AFFTOT

ACHTOT

KNTOT

APPTOT

5 10 15 20 25 30 35 40

Euclidean distances (test retest reliability.sta)

APPTOT KNTOT AFFTOT RCTOT AETOT HATOT ACHTOTAPPTOT 0.00 11.66 29.15 26.17 22.98 37.27 15.26KNTOT 11.66 0.00 26.65 23.81 19.34 34.91 14.18AFFTOT 29.15 26.65 0.00 14.11 16.06 16.03 23.73RCTOT 26.17 23.81 14.11 0.00 14.25 18.11 21.73AETOT 22.98 19.34 16.06 14.25 0.00 20.47 18.89HATOT 37.27 34.91 16.03 18.11 20.47 0.00 31.72ACHTOT 15.26 14.18 23.73 21.73 18.89 31.72 0.00

After 8 mothsTool: Reading motivation questionnaire (Dutta Roy, 2002); N=72 students of same school

Alternate form Alternate form

Correlating scores of two parallel forms of a Correlating scores of two parallel forms of a single testsingle test– Number of items in both forms should be same.Number of items in both forms should be same.– Both have uniform content, range of agreement and Both have uniform content, range of agreement and

disagreementdisagreement– Means and standard deviations of both forms should Means and standard deviations of both forms should

be equalbe equal– Mode of administration and scoring of both should be Mode of administration and scoring of both should be

uniformuniform

Split-half Split-half

Upper and lower part Upper and lower part of the questionnaire of the questionnaire sometimes differ in sometimes differ in item content.item content.All items do not reflect All items do not reflect same content always. same content always.

Split-half reliability Split-half reliability Correlating equal halves of test Correlating equal halves of test Correlating odd and even no. of itemsCorrelating odd and even no. of items

2 X reliability of half test2 X reliability of half testrtt= -------------------------------------------rtt= ------------------------------------------- 1 + reliability of half test1 + reliability of half testReliability of half test = Product moment corr of two halvesReliability of half test = Product moment corr of two halvesAdvantages : On-the-spot relibilityAdvantages : On-the-spot relibilityDisadvantages : Failure to assess temporal stabilityDisadvantages : Failure to assess temporal stability

Split-half Canonical correlationSplit-half Canonical correlation

Split-half Canonical Split-half Canonical correlation provides correlation provides knowledge about the knowledge about the percent of variance in percent of variance in the one set explained the one set explained by the other set of by the other set of variables along a variables along a given dimensiongiven dimension . .

Study: Split-half Canonical correlationStudy: Split-half Canonical correlation

12-item Likert type 5 point 12-item Likert type 5 point scale assessing attitude scale assessing attitude towards workers education towards workers education was administered to 1600 was administered to 1600 rural workers of WB. rural workers of WB.

Min-Max25%-75%Median value

Box & Whisker Plot

-1

1

3

5

7

9

11

REV_1 REV_2 REV_3 REV_4 REV_5 REV_6

Min-Max25%-75%Median value

Box & Whisker Plot

0

2

4

6

8

10

REV_7 REV_8 REV_9 REV_10 REV_11 REV_12

Split-half rtt=0.85; Cronbach’s alpha = 0.87•Canonical correlation coefficient between the sets (first 6 and last 6 items) = 0.78, Chisq(36)=1558.3, p<0.0000.

REV_7 REV_8 REV_9 REV_10 REV_11 REV_12REV_1 0.46 0.39 0.39 0.33 0.31 0.22REV_2 0.37 0.34 0.38 0.36 0.39 0.28REV_3 0.35 0.32 0.33 0.42 0.45 0.39REV_4 0.34 0.31 0.32 0.29 0.32 0.24REV_5 0.48 0.43 0.41 0.39 0.36 0.30REV_6 0.57 0.48 0.48 0.37 0.36 0.32

Plot of Canonical Correlations

Number of Canonical Roots

Valu

e

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 2 3 4 5 6

Internal ConsistencyInternal ConsistencyIt measures whether several It measures whether several items that propose to measure items that propose to measure the same general construct the same general construct produce similar scores. For produce similar scores. For example, if a respondent example, if a respondent expressed agreement with the expressed agreement with the statements "I like to ride statements "I like to ride bicycles" and "I've enjoyed bicycles" and "I've enjoyed riding bicycles in the past", and riding bicycles in the past", and disagreement with the disagreement with the statement "I hate bicycles", statement "I hate bicycles", this would be indicative of this would be indicative of good internal consistency of good internal consistency of the test. the test.

Coefficient alpha Coefficient alpha Cronbach’s coefficient alpha is an useful index to assess Cronbach’s coefficient alpha is an useful index to assess

internal consistency of the scale. It is equivalent of internal consistency of the scale. It is equivalent of Hoyt’s ANOVA procedure. The formula is:Hoyt’s ANOVA procedure. The formula is:

N Sum of item N Sum of item variancevariance

Alpha = -------------- X 1 - Alpha = -------------- X 1 - ----------------------------------------------------------------------------------------

N -1 Variance of total N -1 Variance of total compositecomposite

EXAMPLEEXAMPLE

Item-Item correspondence: Internal consistency among 42 Item-Item correspondence: Internal consistency among 42 items of Reading Motivation questionnaireitems of Reading Motivation questionnaire

2D Plot of Column Coordinates; Dimension: 1 x 2

Input Table (Rows x Columns): 72 cases x 42 items

Standardization: Row and column profiles

Reading Motivation questionnaire

Dimension 1; Eigenvalue: .16550 (16.55% of Inertia)

Dim

ensi

on 2

; Eig

enva

lue:

.098

53 (9

.853

% o

f Ine

rtia)

APHAPM

APNAPP

APXAPAB

KNH

KNIKNK

KNUKNWKNZ

AFI

AFT

AFV

AFY

AFAA

AFAB

RCJ

RCL

RCP

RCSRCY

RCZ

AEK

AEM

AEO

AEQ

AES

AET

HAL

HAN

HAO

HARHAU

HAV

ACHJ

ACHQ ACHR

ACHW

ACHX

ACHAA

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0

Correspondence Map shows cluster of intrinsic reading motivation items and extrinsic motivation items are scattered widely.

Correspondence map of traits Correspondence map of traits

2D Plot of Column Coordinates; Dimension: 1 x 2

Input Table (Rows x Columns): 14 x 6


Dimension 1; Eigenvalue: .24804 (70.26% of Inertia) Dim

ensi

on 2

; Eig

enva

lue:

.075

17 (2

1.29

% o

f Ine

rtia)

C1

C2

C3C4

C5

C6

-0.5

-0.4

-0.3

-0.2

-0.1

0.0

0.1

0.2

0.3

0.4

-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2Row.CoordsCol.Coords

2D Plot of Row and Column Coordinates; Dimension: 1 x 2Input Table (Rows x Columns): 14 items x 6 response categories



Dim

ensi

on 2

; Eig

enva

lue:

.075

17 (2

1.29

% o

f Ine

rtia)

Row1

Row2

Row3

Row4

Row5

Row6

Row7

Row8

Row9

Row10

Row11

Row12

Row13

Row14C1

C2

C3C4

C5

C6

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

-1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

2D Plot of Row Coordinates; Dimensions: 1 x 2Input Table (Rows x Columns): 14 x 6



Dim

ensi

on 2

; Eig

enva

lue:

.075

17 (2

1.29

% o

f Ine

rtia)

Row1

Row2

Row3

Row4

Row5

Row6

Row7

Row8

Row9

Row10

Row11

Row12

Row13

Row14

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

-1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

Rational Equivalence Rational Equivalence The formula is given below: The formula is given below: rtt = (n/(n-1)) X ((rtt = (n/(n-1)) X ((2t- 2t- pq) / pq) / 2t)2t)in which,in which,rtt= reliability coefficient of the rtt= reliability coefficient of the whole testwhole testn= number of itemsn= number of itemst= the SD of the total scorest= the SD of the total scoresp= proportion of the group p= proportion of the group giving ‘yes’ responsesgiving ‘yes’ responsesq= (1-p)= the proportion of the q= (1-p)= the proportion of the group giving ‘no’ responsesgroup giving ‘no’ responses

AttitudesNo. of Items

Kuder Richardson’s Reliability coefficients

Cleanliness 5 0.58

Safety 7 0.68

Comfort 5 0.42

Adequacy 12 0.58

Exploring 12 0.5

Reliability 5 0.5

Easiness 7 0.68

Equal Opportunity 5 0.63

Willingness to Participate 10 0.5

Reliability Coefficients of Attitude towards School Infrastructure Questionnaire (N=175)

Ref: Dutta Roy,D. (2008). Attitude towards school infrastructure in rural areas. Unpublished project report submitted to Indian Statistical Institute, P45.

Factors influencing test scores Factors influencing test scores Extrinsic factorsExtrinsic factors– Group variability Group variability – GuessingGuessing– Environmental conditionsEnvironmental conditions

Intrinsic factorsIntrinsic factors– Length of testLength of test– Homogeneity of itemsHomogeneity of items– Discrimination valueDiscrimination value– Scorer reliabilityScorer reliability

Maximize your efficiency Maximize your efficiency

Groups should be heterogeneousGroups should be heterogeneousItems should be homogeneousItems should be homogeneousScale should be preferably longer oneScale should be preferably longer oneItems should be discriminatory oneItems should be discriminatory one

reliability

Education