© ucles 2013 assessing the fit of irt models in language testing muhammad naveed khalid ardeshir...
TRANSCRIPT
© UCLES 2013
Assessing the Fit of IRT Models in Language Testing
Muhammad Naveed Khalid
Ardeshir Geranpayeh
© UCLES 2013
Outline
• Item Response Theory (IRT)• Importance of Model Fit within IRT• Fit Procedures
• Issues and Limitations• Lagrange Multiplier (LM) Test
• An empirical study using LM Fit statistics• Sharing Results
• Conclusions
© UCLES 2013
Item Response Theory (IRT)
A family of mathematical models that provide a common framework for describing people and items
Examinee performance can be predicted in terms of the underlying trait
Provides a means for estimating abilities of people and characteristics of items
© UCLES 2013
IRT Models
Dichotomous or Discrete
1 Parameter Logistic Model / Rasch (1PL)
2 Parameter Logistic Model (2PL)
3 Parameter Logistic Model (3PL)
Polytomous or Scalar
Partial Credit Model (PCM)
Generalized Partial Credit Model (GPCM)
Graded Response Model (GRM)
© UCLES 2013
Shape of Item Response Function
© UCLES 2013
Model for Item with 5 response categories
ProbabilityResponseCategory
© UCLES 2013
IRT Applications IRT applications in language testing are mainly used in
Test developmentItem bankingDifferential item functioning (DIF)Computerized adaptive testing (CAT)Test equating, linking and scaling Standard setting
The utility of the IRT model is dependent upon the extent to which the model accurately reflects the data
© UCLES 2013
Model Fit from Item Perspective
Measurement Invariance (MI): Item responses can be described by the same parameters in all sub-populations.
Item Characteristic Curve (ICC): Describes the relation between the latent variable and the observable responses to items.
Local Independence (LI): Responses to different items are independent given the latent trait variable value.
Uni-dimensionaltySpeedednessGlobal
© UCLES 2013
Consequences of Misfit
Yen (2000) and Wainer & Thissen (2003) have shown the inadequacy of model-data fit
Some of the adverse consequences are:
Biased ability estimatesUnfair ranksWrongly equated scores Student misclassificationsScore precisionValidity
© UCLES 2013
Existing Item Fit Procedures
Chi – Square StatisticsTests of the discrepancy between the observed and
expected frequencies.
Pearson-Type Item-Fit Indices (Yen, 1984; Bock, 1972).
Likelihood Ratio Based Item-Fit Indices (McKinley & Mills, 1985).
© UCLES 2013
Issues in Existing Fit Procedures The standard theory for chi-square statistics does not
hold. Failure to take into account the stochastic nature of the
item parameter estimates. Forming of subgroups for the test are based on model-
dependent trait estimates. There is an issue of the number of degrees of freedom. It is sensitive to test length and sample size.
© UCLES 2013
Lagrange Multiplier (LM) Test
Glas(1999) proposed the LM test to the evaluation of model fit.
The LM tests are used for testing a restricted model against a more general alternative one.
Consider a null hypothesis about a model with parametersThis model is a special case of a general model with parameters
0
' '0 01 = ( , c)
' 1( ) ( ) ( )LM c h c W h c
© UCLES 2013
LM Item Fit Statistics
0i
exp( ( ) ))( )
1 exp( ( ) ))i n i n i
i ni n i n i
yP
y
0i Null Model Alternative Model
MI / DIF
LI
ICCexp( ( ))
( 1| , )1 exp( ( ))
igni n ig
ig
i n i
i n i
P X
exp( ( ))( 1, 1| , )
1 exp( ( ))ni nl n ili n i n l il
i n i n l il
P X X
Null Model 0il Alternative Model 0il
Null Model 0ig Alternative Model 0ig
© UCLES 2013
Empirical Example
Data from Cambridge English First (FCE)– Reading 3 parts/30 questions – Listening 4 parts/30 questions
Sample size over 35000
The approach can be applied to any other language exam
© UCLES 2013
Lagrange tests MI for Rasch MODEL
-------------------------------------------------------------- Focal-Group Reference Abs. Item LM df Prob Obs Exp Obs Exp Dif. -------------------------------------------------------------- 1 Item1 0.60 1 0.44 0.74 0.72 0.75 0.76 0.01 2 Item2 0.34 1 0.56 0.94 0.94 0.96 0.95 0.00 3 Item3 0.04 1 0.84 0.70 0.71 0.75 0.75 0.00 4 Item4 2.10 1 0.15 0.78 0.75 0.78 0.79 0.02 5 Item5 1.77 1 0.18 0.82 0.80 0.81 0.82 0.02 6 Item6 0.15 1 0.69 0.70 0.71 0.75 0.75 0.01 7 Item7 1.43 1 0.23 0.71 0.68 0.70 0.71 0.02 8 Item8 0.40 1 0.53 0.87 0.87 0.89 0.90 0.01 9 Item9 0.17 1 0.68 0.89 0.88 0.90 0.90 0.00 10 Item10 0.85 1 0.36 0.77 0.78 0.83 0.82 0.01 11 Item11 0.97 1 0.32 0.87 0.85 0.87 0.87 0.01 12 Item12 0.09 1 0.76 0.87 0.87 0.89 0.89 0.00 13 Item13 7.10 1 0.01 0.45 0.50 0.59 0.56 0.04 14 Item14 2.04 1 0.15 0.51 0.55 0.61 0.60 0.02 15 Item15 0.00 1 0.97 0.72 0.72 0.75 0.75 0.00 16 Item16 0.03 1 0.85 0.62 0.62 0.68 0.68 0.00 17 Item17 2.63 1 0.10 0.48 0.52 0.60 0.59 0.03 18 Item18 0.01 1 0.91 0.44 0.44 0.49 0.49 0.00 19 Item19 0.36 1 0.55 0.78 0.79 0.83 0.83 0.01 20 Item20 1.05 1 0.31 0.66 0.69 0.73 0.72 0.02 21 Item21 2.77 1 0.10 0.80 0.83 0.88 0.87 0.02 22 Item22 4.17 1 0.04 0.71 0.75 0.81 0.80 0.02 23 Item23 0.58 1 0.44 0.87 0.85 0.87 0.87 0.01 24 Item24 0.13 1 0.71 0.83 0.83 0.87 0.87 0.00 25 Item25 0.94 1 0.33 0.92 0.93 0.95 0.95 0.01 26 Item26 5.05 1 0.02 0.60 0.55 0.59 0.61 0.03 27 Item27 4.55 1 0.03 0.64 0.60 0.64 0.65 0.03 28 Item28 2.76 1 0.10 0.49 0.45 0.49 0.50 0.03 29 Item29 0.26 1 0.61 0.62 0.61 0.66 0.67 0.01 30 Item30 3.07 1 0.08 0.70 0.66 0.69 0.71 0.03 ---------------------------------------------------------------
© UCLES 2013
Lagrange tests MI for Rasch MODEL
-------------------------------------------------------------- Focal-Group Reference Abs. Item LM df Prob Obs Exp Obs Exp Dif. -------------------------------------------------------------- 1 Item1 0.60 1 0.44 0.74 0.72 0.75 0.76 0.01 2 Item2 0.34 1 0.56 0.94 0.94 0.96 0.95 0.00 3 Item3 0.04 1 0.84 0.70 0.71 0.75 0.75 0.00 4 Item4 2.10 1 0.15 0.78 0.75 0.78 0.79 0.02 5 Item5 1.77 1 0.18 0.82 0.80 0.81 0.82 0.02 6 Item6 0.15 1 0.69 0.70 0.71 0.75 0.75 0.01 7 Item7 1.43 1 0.23 0.71 0.68 0.70 0.71 0.02 8 Item8 0.40 1 0.53 0.87 0.87 0.89 0.90 0.01 9 Item9 0.17 1 0.68 0.89 0.88 0.90 0.90 0.00 10 Item10 0.85 1 0.36 0.77 0.78 0.83 0.82 0.01 11 Item11 0.97 1 0.32 0.87 0.85 0.87 0.87 0.01 12 Item12 0.09 1 0.76 0.87 0.87 0.89 0.89 0.00 13 Item13 7.10 1 0.01 0.45 0.50 0.59 0.56 0.04 14 Item14 2.04 1 0.15 0.51 0.55 0.61 0.60 0.02 15 Item15 0.00 1 0.97 0.72 0.72 0.75 0.75 0.00 16 Item16 0.03 1 0.85 0.62 0.62 0.68 0.68 0.00 17 Item17 2.63 1 0.10 0.48 0.52 0.60 0.59 0.03 18 Item18 0.01 1 0.91 0.44 0.44 0.49 0.49 0.00 19 Item19 0.36 1 0.55 0.78 0.79 0.83 0.83 0.01 20 Item20 1.05 1 0.31 0.66 0.69 0.73 0.72 0.02 21 Item21 2.77 1 0.10 0.80 0.83 0.88 0.87 0.02 22 Item22 4.17 1 0.04 0.71 0.75 0.81 0.80 0.02 23 Item23 0.58 1 0.44 0.87 0.85 0.87 0.87 0.01 24 Item24 0.13 1 0.71 0.83 0.83 0.87 0.87 0.00 25 Item25 0.94 1 0.33 0.92 0.93 0.95 0.95 0.01 26 Item26 5.05 1 0.02 0.60 0.55 0.59 0.61 0.03 27 Item27 4.55 1 0.03 0.64 0.60 0.64 0.65 0.03 28 Item28 2.76 1 0.10 0.49 0.45 0.49 0.50 0.03 29 Item29 0.26 1 0.61 0.62 0.61 0.66 0.67 0.01 30 Item30 3.07 1 0.08 0.70 0.66 0.69 0.71 0.03 ---------------------------------------------------------------
© UCLES 2013
Lagrange tests MI for Rasch MODEL
-------------------------------------------------------------- Focal-Group Reference Abs. Item LM df Prob Obs Exp Obs Exp Dif. -------------------------------------------------------------- 1 Item1 0.60 1 0.44 0.74 0.72 0.75 0.76 0.01 2 Item2 0.34 1 0.56 0.94 0.94 0.96 0.95 0.00 3 Item3 0.04 1 0.84 0.70 0.71 0.75 0.75 0.00 4 Item4 2.10 1 0.15 0.78 0.75 0.78 0.79 0.02 5 Item5 1.77 1 0.18 0.82 0.80 0.81 0.82 0.02 6 Item6 0.15 1 0.69 0.70 0.71 0.75 0.75 0.01 7 Item7 1.43 1 0.23 0.71 0.68 0.70 0.71 0.02 8 Item8 0.40 1 0.53 0.87 0.87 0.89 0.90 0.01 9 Item9 0.17 1 0.68 0.89 0.88 0.90 0.90 0.00 10 Item10 0.85 1 0.36 0.77 0.78 0.83 0.82 0.01 11 Item11 0.97 1 0.32 0.87 0.85 0.87 0.87 0.01 12 Item12 0.09 1 0.76 0.87 0.87 0.89 0.89 0.00 13 Item13 7.10 1 0.01 0.45 0.50 0.59 0.56 0.04 14 Item14 2.04 1 0.15 0.51 0.55 0.61 0.60 0.02 15 Item15 0.00 1 0.97 0.72 0.72 0.75 0.75 0.00 16 Item16 0.03 1 0.85 0.62 0.62 0.68 0.68 0.00 17 Item17 2.63 1 0.10 0.48 0.52 0.60 0.59 0.03 18 Item18 0.01 1 0.91 0.44 0.44 0.49 0.49 0.00 19 Item19 0.36 1 0.55 0.78 0.79 0.83 0.83 0.01 20 Item20 1.05 1 0.31 0.66 0.69 0.73 0.72 0.02 21 Item21 2.77 1 0.10 0.80 0.83 0.88 0.87 0.02 22 Item22 4.17 1 0.04 0.71 0.75 0.81 0.80 0.02 23 Item23 0.58 1 0.44 0.87 0.85 0.87 0.87 0.01 24 Item24 0.13 1 0.71 0.83 0.83 0.87 0.87 0.00 25 Item25 0.94 1 0.33 0.92 0.93 0.95 0.95 0.01 26 Item26 5.05 1 0.02 0.60 0.55 0.59 0.61 0.03 27 Item27 4.55 1 0.03 0.64 0.60 0.64 0.65 0.03 28 Item28 2.76 1 0.10 0.49 0.45 0.49 0.50 0.03 29 Item29 0.26 1 0.61 0.62 0.61 0.66 0.67 0.01 30 Item30 3.07 1 0.08 0.70 0.66 0.69 0.71 0.03 ---------------------------------------------------------------
© UCLES 2013
Lagrange multipliers ICC for Rasch MODEL
--------------------------------------------------------------------------- Groups: 1 2 3 Abs. Item LM df Prob Obs. Exp. Obs. Exp. Obs. Exp. Dif. --------------------------------------------------------------------------- 1 Item1 3.56 2 0.17 0.56 0.55 0.72 0.71 0.82 0.83 0.01 2 Item2 1.98 2 0.37 0.60 0.59 0.79 0.78 0.89 0.90 0.01 3 Item3 1.25 2 0.54 0.54 0.56 0.76 0.74 0.86 0.87 0.01 4 Item4 1.23 2 0.54 0.67 0.66 0.83 0.83 0.91 0.92 0.01 5 Item5 2.81 2 0.24 0.71 0.71 0.86 0.84 0.91 0.92 0.01 6 Item6 2.96 2 0.23 0.58 0.57 0.68 0.71 0.84 0.83 0.02 7 Item7 2.65 2 0.27 0.17 0.19 0.33 0.31 0.49 0.49 0.01 8 Item8 4.82 2 0.09 0.65 0.66 0.76 0.77 0.87 0.86 0.01 9 Item9 4.40 2 0.11 0.20 0.20 0.33 0.36 0.60 0.58 0.02 10 Item10 3.89 2 0.14 0.24 0.23 0.51 0.54 0.84 0.82 0.02 11 Item11 1.62 2 0.44 0.73 0.72 0.86 0.88 0.95 0.95 0.01 12 Item12 19.55 2 0.00 0.42 0.37 0.50 0.57 0.77 0.76 0.04 13 Item13 0.94 2 0.63 0.43 0.44 0.76 0.75 0.91 0.92 0.01 14 Item14 2.82 2 0.24 0.64 0.63 0.89 0.88 0.96 0.97 0.01 15 Item15 11.03 2 0.00 0.36 0.36 0.65 0.63 0.81 0.84 0.02 16 Item16 3.88 2 0.14 0.52 0.51 0.83 0.83 0.95 0.96 0.01 17 Item17 0.84 2 0.66 0.51 0.51 0.77 0.77 0.92 0.92 0.01 18 Item18 0.85 2 0.65 0.25 0.25 0.41 0.41 0.59 0.60 0.01 19 Item19 0.99 2 0.61 0.49 0.50 0.70 0.70 0.86 0.85 0.01 20 Item20 0.90 2 0.64 0.34 0.33 0.59 0.59 0.81 0.81 0.00 21 Item21 1.02 2 0.60 0.18 0.17 0.27 0.28 0.44 0.43 0.01 22 Item22 2.92 2 0.23 0.43 0.44 0.72 0.72 0.90 0.89 0.01 23 Item23 0.26 2 0.88 0.73 0.73 0.93 0.93 0.98 0.98 0.00 24 Item24 1.47 2 0.48 0.69 0.70 0.91 0.90 0.97 0.97 0.01 25 Item25 0.61 2 0.74 0.45 0.46 0.61 0.59 0.71 0.72 0.01 26 Item26 8.56 2 0.01 0.53 0.56 0.74 0.71 0.81 0.82 0.02 27 Item27 2.76 2 0.25 0.36 0.36 0.56 0.58 0.79 0.78 0.01 28 Item28 1.64 2 0.44 0.38 0.36 0.53 0.56 0.76 0.75 0.02 29 Item29 0.31 2 0.86 0.55 0.55 0.78 0.79 0.92 0.92 0.00 30 Item30 2.21 2 0.33 0.37 0.39 0.53 0.50 0.62 0.63 0.02 ---------------------------------------------------------------------------
© UCLES 2013
Lagrange multipliers LI for Rasch MODEL
------------------------------------------------------- Itm Itm LM df Prob Observed Expected Abs.Dif ------------------------------------------------------- 2 1 0.15 1 0.70 0.55 0.55 0.62 0.63 0.01 3 2 6.31 1 0.04 0.57 0.59 0.71 0.69 0.01 4 3 1.79 1 0.18 0.62 0.64 0.72 0.71 0.02 5 4 0.26 1 0.61 0.72 0.73 0.77 0.77 0.01 6 5 0.07 1 0.79 0.75 0.75 0.82 0.82 0.01 7 6 0.02 1 0.88 0.51 0.52 0.62 0.61 0.03 8 7 23.95 1 0.00 0.53 0.59 0.70 0.66 0.03 9 8 0.27 1 0.61 0.61 0.61 0.76 0.76 0.01 10 9 1.97 1 0.16 0.40 0.42 0.68 0.67 0.01 11 10 1.20 1 0.27 0.61 0.60 0.78 0.79 0.01 12 11 24.08 1 0.00 0.72 0.77 0.93 0.91 0.05 13 12 2.11 1 0.15 0.53 0.56 0.81 0.80 0.01 14 13 4.24 1 0.06 0.68 0.71 0.91 0.90 0.01 15 14 41.66 1 0.00 0.14 0.25 0.62 0.60 0.05 16 15 4.02 1 0.07 0.70 0.69 0.84 0.85 0.02 17 16 7.04 1 0.01 0.66 0.70 0.87 0.86 0.01 18 17 4.37 1 0.08 0.51 0.55 0.80 0.79 0.01 19 18 13.69 1 0.00 0.52 0.57 0.84 0.82 0.04 20 19 2.04 1 0.12 0.69 0.70 0.93 0.91 0.02 21 20 3.85 1 0.05 0.41 0.46 0.67 0.66 0.01 22 21 1.71 1 0.11 0.80 0.82 0.92 0.91 0.01 23 22 2.01 1 0.16 0.79 0.82 0.94 0.94 0.01 24 23 10.60 1 0.00 0.62 0.72 0.93 0.92 0.03 25 24 1.02 1 0.31 0.61 0.58 0.84 0.84 0.02 26 25 2.34 1 0.13 0.58 0.60 0.82 0.82 0.01 27 26 2.10 1 0.09 0.41 0.45 0.67 0.65 0.02 28 27 1.62 1 0.92 0.86 0.85 0.89 0.91 0.02 29 28 0.17 1 0.68 0.48 0.47 0.63 0.63 0.01 30 29 0.47 1 0.49 0.77 0.77 0.86 0.86 0.01 -------------------------------------------------------
© UCLES 2013
Conclusions
LM statistics overcome existing FIT issuesLess computational intensiveSize of residuals in the form of Abs.Dif is
highly valuableFit of IRT model holds reasonably (FCE)Items violated - MI (4); ICC (3); LI (7)Magnitude of violation is not severe
© UCLES 2013
Thank you!&
Questions