reliability
TRANSCRIPT
Reliability Testing for Item AnalysisReliability Testing for Item Analysis
Dr. Debdulal Dutta Roy , Ph.D. Dr. Debdulal Dutta Roy , Ph.D. Psychology Research Unit Psychology Research Unit
Indian Statistical Institute Indian Statistical Institute 203, B.T. Road 203, B.T. Road
Kolkata - 700108 Kolkata - 700108 E-mail: [email protected]: [email protected]
Fax : 91 - 33 - 25776680Fax : 91 - 33 - 25776680 Tel (o) : 91 - 33 - 2575 3454Tel (o) : 91 - 33 - 2575 3454
Presentation at the Department of Clinical Psychology, Presentation at the Department of Clinical Psychology, Ram Chandra University, ChennaiRam Chandra University, Chennai
21.2.0921.2.09
Reliability AnalysisReliability AnalysisReliability refers to the consistency of Reliability refers to the consistency of scores obtained by the same persons scores obtained by the same persons when reexamined with the same test when reexamined with the same test on different occasions, or with different on different occasions, or with different sets of equivalent items, or under other sets of equivalent items, or under other variable examining conditions variable examining conditions (Anastasi, 1990). It indicates the extent (Anastasi, 1990). It indicates the extent to which individual differences in test to which individual differences in test scores are attributable to “true” scores are attributable to “true” differences in the characteristics under differences in the characteristics under consideration and the extent to which consideration and the extent to which they are attributable to chance errors. they are attributable to chance errors. Reliability of a test is given by the Reliability of a test is given by the proportion of true variance resulting proportion of true variance resulting from the presence of specific situation from the presence of specific situation under consideration and error variance under consideration and error variance resulting from the presence of some resulting from the presence of some factors irrelevant to the present factors irrelevant to the present situation. situation.
Reliability AnalysisReliability AnalysisTime Consistency
Test- Retest reliability
Test-Retest multi-item response consistencyTest-Retest multi-trait consistency
Internal consistency
Split-half reliability
Split-half Canonical correlation
Others
Rational Equivalence (non-metric)
Item-Item Correspondence
Cronbach's alpha (metric data)
Correpondence map of traits
Test-Retest ReliabilityTest-Retest Reliability
Johnny, Johnny, Johnny, Johnny, Yes, Papa, Yes, Papa, Eating sugar? Eating sugar? No, Papa No, Papa Telling lies? Telling lies? No, Papa No, Papa Open your mouth Open your mouth O Ha! Ha! Ha! O Ha! Ha! Ha!
Test retest reliability (8 months interval)
First session Second session
Mean SD Mean SD t-ratio(df=71) Correlation
Application 1A 0.75 0.44 0.65 0.48 1.41
6B 0.78 0.42 0.72 0.45 0.81
7A 0.96 0.2 0.92 0.28 1
9B 0.85 0.36 0.94 0.23 -1.84
17A 0.71 0.46 0.69 0.46 0.19
21A 0.78 0.42 0.93 0.26 -2.99
Total 4.82 1.14 4.86 1.01 -0.23 0.706
Knowledge 1B 0.25 0.44 0.35 0.44 -1.41
2A 0.94 0.23 0.94 0.23 0
4A 0.85 0.36 0.89 0.32 -0.83
14B 0.89 0.32 0.86 0.35 0.53
16B 0.79 0.41 0.63 0.49 2.43
19A 0.74 0.44 0.83 0.38 -1.41
Total 4.46 1.05 4.5 1.01 -0.27 0.93**
Test retest reliability of Reading Motivation Questionnaire (n=72) using both t-test and correlation coefficients
Test-Retest Multi-item responseTest-Retest Multi-item response
All items do not All items do not behave in same behave in same fashion always.fashion always.Identify inconsistent Identify inconsistent items in the set items in the set across periods..across periods..
Last supper: Leonardo Da Vinci
Test-Retest Multi-item response Test-Retest Multi-item response Consistency (8 months interval) Consistency (8 months interval)
Tree Diagram for 6 Variables
Complete Linkage (after 8 months)
Euclidean distances of Items
Reading for application
Linkage Distance
APX
APAB
APP
APN
APM
APH
2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
Tree Diagram for 6 Variables
Complete Linkage
Euclidean distances
Linkage Distance
APAB
APP
APN
APM
APX
APH
2.5 3.0 3.5 4.0 4.5 5.0 5.5
Test-Retest Multi-Trait Consistency (8 months Test-Retest Multi-Trait Consistency (8 months interval)interval)
Tree Diagram for 7 Variables
Complete Linkage
Euclidean distances
Linkage Distance
HATOT
RCTOT
AETOT
AFFTOT
ACHTOT
KNTOT
APPTOT
5 10 15 20 25 30 35 40
Euclidean distances (test retest reliability.sta)
APPTOT KNTOT AFFTOT RCTOT AETOT HATOT ACHTOTAPPTOT 0.00 13.34 27.96 22.07 25.34 36.22 16.25KNTOT 13.34 0.00 24.45 19.21 21.86 34.50 12.33AFFTOT 27.96 24.45 0.00 13.23 12.88 15.75 18.00RCTOT 22.07 19.21 13.23 0.00 13.15 20.66 13.89AETOT 25.34 21.86 12.88 13.15 0.00 18.00 15.17HATOT 36.22 34.50 15.75 20.66 18.00 0.00 26.91ACHTOT 16.25 12.33 18.00 13.89 15.17 26.91 0.00
Tree Diagram for 7 Variables
Complete Linkage
Euclidean distances
Linkage Distance
HATOT
AETOT
RCTOT
AFFTOT
ACHTOT
KNTOT
APPTOT
5 10 15 20 25 30 35 40
Euclidean distances (test retest reliability.sta)
APPTOT KNTOT AFFTOT RCTOT AETOT HATOT ACHTOTAPPTOT 0.00 11.66 29.15 26.17 22.98 37.27 15.26KNTOT 11.66 0.00 26.65 23.81 19.34 34.91 14.18AFFTOT 29.15 26.65 0.00 14.11 16.06 16.03 23.73RCTOT 26.17 23.81 14.11 0.00 14.25 18.11 21.73AETOT 22.98 19.34 16.06 14.25 0.00 20.47 18.89HATOT 37.27 34.91 16.03 18.11 20.47 0.00 31.72ACHTOT 15.26 14.18 23.73 21.73 18.89 31.72 0.00
After 8 mothsTool: Reading motivation questionnaire (Dutta Roy, 2002); N=72 students of same school
Alternate form Alternate form
Correlating scores of two parallel forms of a Correlating scores of two parallel forms of a single testsingle test– Number of items in both forms should be same.Number of items in both forms should be same.– Both have uniform content, range of agreement and Both have uniform content, range of agreement and
disagreementdisagreement– Means and standard deviations of both forms should Means and standard deviations of both forms should
be equalbe equal– Mode of administration and scoring of both should be Mode of administration and scoring of both should be
uniformuniform
Split-half Split-half
Upper and lower part Upper and lower part of the questionnaire of the questionnaire sometimes differ in sometimes differ in item content.item content.All items do not reflect All items do not reflect same content always. same content always.
Split-half reliability Split-half reliability Correlating equal halves of test Correlating equal halves of test Correlating odd and even no. of itemsCorrelating odd and even no. of items
2 X reliability of half test2 X reliability of half testrtt= -------------------------------------------rtt= ------------------------------------------- 1 + reliability of half test1 + reliability of half testReliability of half test = Product moment corr of two halvesReliability of half test = Product moment corr of two halvesAdvantages : On-the-spot relibilityAdvantages : On-the-spot relibilityDisadvantages : Failure to assess temporal stabilityDisadvantages : Failure to assess temporal stability
Split-half Canonical correlationSplit-half Canonical correlation
Split-half Canonical Split-half Canonical correlation provides correlation provides knowledge about the knowledge about the percent of variance in percent of variance in the one set explained the one set explained by the other set of by the other set of variables along a variables along a given dimensiongiven dimension . .
Study: Split-half Canonical correlationStudy: Split-half Canonical correlation
12-item Likert type 5 point 12-item Likert type 5 point scale assessing attitude scale assessing attitude towards workers education towards workers education was administered to 1600 was administered to 1600 rural workers of WB. rural workers of WB.
Min-Max25%-75%Median value
Box & Whisker Plot
-1
1
3
5
7
9
11
REV_1 REV_2 REV_3 REV_4 REV_5 REV_6
Min-Max25%-75%Median value
Box & Whisker Plot
0
2
4
6
8
10
REV_7 REV_8 REV_9 REV_10 REV_11 REV_12
Split-half rtt=0.85; Cronbach’s alpha = 0.87•Canonical correlation coefficient between the sets (first 6 and last 6 items) = 0.78, Chisq(36)=1558.3, p<0.0000.
REV_7 REV_8 REV_9 REV_10 REV_11 REV_12REV_1 0.46 0.39 0.39 0.33 0.31 0.22REV_2 0.37 0.34 0.38 0.36 0.39 0.28REV_3 0.35 0.32 0.33 0.42 0.45 0.39REV_4 0.34 0.31 0.32 0.29 0.32 0.24REV_5 0.48 0.43 0.41 0.39 0.36 0.30REV_6 0.57 0.48 0.48 0.37 0.36 0.32
Plot of Canonical Correlations
Number of Canonical Roots
Valu
e
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 2 3 4 5 6
Internal ConsistencyInternal ConsistencyIt measures whether several It measures whether several items that propose to measure items that propose to measure the same general construct the same general construct produce similar scores. For produce similar scores. For example, if a respondent example, if a respondent expressed agreement with the expressed agreement with the statements "I like to ride statements "I like to ride bicycles" and "I've enjoyed bicycles" and "I've enjoyed riding bicycles in the past", and riding bicycles in the past", and disagreement with the disagreement with the statement "I hate bicycles", statement "I hate bicycles", this would be indicative of this would be indicative of good internal consistency of good internal consistency of the test. the test.
Coefficient alpha Coefficient alpha Cronbach’s coefficient alpha is an useful index to assess Cronbach’s coefficient alpha is an useful index to assess
internal consistency of the scale. It is equivalent of internal consistency of the scale. It is equivalent of Hoyt’s ANOVA procedure. The formula is:Hoyt’s ANOVA procedure. The formula is:
N Sum of item N Sum of item variancevariance
Alpha = -------------- X 1 - Alpha = -------------- X 1 - ----------------------------------------------------------------------------------------
N -1 Variance of total N -1 Variance of total compositecomposite
EXAMPLEEXAMPLE
Item-Item correspondence: Internal consistency among 42 Item-Item correspondence: Internal consistency among 42 items of Reading Motivation questionnaireitems of Reading Motivation questionnaire
2D Plot of Column Coordinates; Dimension: 1 x 2
Input Table (Rows x Columns): 72 cases x 42 items
Standardization: Row and column profiles
Reading Motivation questionnaire
Dimension 1; Eigenvalue: .16550 (16.55% of Inertia)
Dim
ensi
on 2
; Eig
enva
lue:
.098
53 (9
.853
% o
f Ine
rtia)
APHAPM
APNAPP
APXAPAB
KNH
KNIKNK
KNUKNWKNZ
AFI
AFT
AFV
AFY
AFAA
AFAB
RCJ
RCL
RCP
RCSRCY
RCZ
AEK
AEM
AEO
AEQ
AES
AET
HAL
HAN
HAO
HARHAU
HAV
ACHJ
ACHQ ACHR
ACHW
ACHX
ACHAA
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0
Correspondence Map shows cluster of intrinsic reading motivation items and extrinsic motivation items are scattered widely.
Correspondence map of traits Correspondence map of traits
2D Plot of Column Coordinates; Dimension: 1 x 2
Input Table (Rows x Columns): 14 x 6
Standardization: Row and column profiles
Dimension 1; Eigenvalue: .24804 (70.26% of Inertia) Dim
ensi
on 2
; Eig
enva
lue:
.075
17 (2
1.29
% o
f Ine
rtia)
C1
C2
C3C4
C5
C6
-0.5
-0.4
-0.3
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2Row.CoordsCol.Coords
2D Plot of Row and Column Coordinates; Dimension: 1 x 2Input Table (Rows x Columns): 14 items x 6 response categories
Standardization: Row and column profiles
Dimension 1; Eigenvalue: .24804 (70.26% of Inertia)
Dim
ensi
on 2
; Eig
enva
lue:
.075
17 (2
1.29
% o
f Ine
rtia)
Row1
Row2
Row3
Row4
Row5
Row6
Row7
Row8
Row9
Row10
Row11
Row12
Row13
Row14C1
C2
C3C4
C5
C6
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
-1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
2D Plot of Row Coordinates; Dimensions: 1 x 2Input Table (Rows x Columns): 14 x 6
Standardization: Row and column profiles
Dimension 1; Eigenvalue: .24804 (70.26% of Inertia)
Dim
ensi
on 2
; Eig
enva
lue:
.075
17 (2
1.29
% o
f Ine
rtia)
Row1
Row2
Row3
Row4
Row5
Row6
Row7
Row8
Row9
Row10
Row11
Row12
Row13
Row14
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
-1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
Rational Equivalence Rational Equivalence The formula is given below: The formula is given below: rtt = (n/(n-1)) X ((rtt = (n/(n-1)) X ((2t- 2t- pq) / pq) / 2t)2t)in which,in which,rtt= reliability coefficient of the rtt= reliability coefficient of the whole testwhole testn= number of itemsn= number of itemst= the SD of the total scorest= the SD of the total scoresp= proportion of the group p= proportion of the group giving ‘yes’ responsesgiving ‘yes’ responsesq= (1-p)= the proportion of the q= (1-p)= the proportion of the group giving ‘no’ responsesgroup giving ‘no’ responses
AttitudesNo. of Items
Kuder Richardson’s Reliability coefficients
Cleanliness 5 0.58
Safety 7 0.68
Comfort 5 0.42
Adequacy 12 0.58
Exploring 12 0.5
Reliability 5 0.5
Easiness 7 0.68
Equal Opportunity 5 0.63
Willingness to Participate 10 0.5
Reliability Coefficients of Attitude towards School Infrastructure Questionnaire (N=175)
Ref: Dutta Roy,D. (2008). Attitude towards school infrastructure in rural areas. Unpublished project report submitted to Indian Statistical Institute, P45.
Factors influencing test scores Factors influencing test scores Extrinsic factorsExtrinsic factors– Group variability Group variability – GuessingGuessing– Environmental conditionsEnvironmental conditions
Intrinsic factorsIntrinsic factors– Length of testLength of test– Homogeneity of itemsHomogeneity of items– Discrimination valueDiscrimination value– Scorer reliabilityScorer reliability
Maximize your efficiency Maximize your efficiency
Groups should be heterogeneousGroups should be heterogeneousItems should be homogeneousItems should be homogeneousScale should be preferably longer oneScale should be preferably longer oneItems should be discriminatory oneItems should be discriminatory one