applications of irt models
DESCRIPTION
Applications of IRT Models. DIF and CAT. Which of these is the situation of a biased test?. The average score for males and females is different on an item is not the same. The correlation between males’ scores on an item is stronger than that for the females’ scores. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/1.jpg)
Applications of Applications of IRT ModelsIRT Models
DIF and CATDIF and CAT
![Page 2: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/2.jpg)
Which of these is the Which of these is the situation of a biased test? situation of a biased test?
The average score for males and The average score for males and females is different on an item is females is different on an item is not the same.not the same.
The correlation between males’ The correlation between males’ scores on an item is stronger than scores on an item is stronger than that for the females’ scores. that for the females’ scores.
A group of males and females with A group of males and females with exactly the same ability achieve exactly the same ability achieve different scores on an item.different scores on an item.
![Page 3: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/3.jpg)
Disentangling the Disentangling the TerminologyTerminology Item impactItem impact
Item impact is evident when examinees from different groups Item impact is evident when examinees from different groups have differing probabilities of responding correctly to (or have differing probabilities of responding correctly to (or endorsing) an item because there are true differences endorsing) an item because there are true differences between the groups in the underlying ability being measured between the groups in the underlying ability being measured by the item.by the item.
DIFDIF The differential probability of a correct response for The differential probability of a correct response for
examinees at the same trait level but from different groups. examinees at the same trait level but from different groups. DIF occurs when examinees from different groups show DIF occurs when examinees from different groups show
differing probabilities of success on (or endorsing) the item differing probabilities of success on (or endorsing) the item after matching on the underlying ability after matching on the underlying ability that the item is that the item is intended to measure.intended to measure.
Item biasItem bias Item bias occurs when examinees of one group are less likely Item bias occurs when examinees of one group are less likely
to answer an item correctly (or endorse an item) than to answer an item correctly (or endorse an item) than examinees of another group because of some characteristic of examinees of another group because of some characteristic of the test item or testing situation that is not relevant to the test the test item or testing situation that is not relevant to the test purpose. purpose.
Adverse ImpactAdverse Impact Adverse impact is a legal term describing the situation in Adverse impact is a legal term describing the situation in
which group differences in test performance result in which group differences in test performance result in disproportionate examinee selection or related decisions (e.g., disproportionate examinee selection or related decisions (e.g., promotion). This is promotion). This is not not evidence for test bias.evidence for test bias.
![Page 4: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/4.jpg)
No DIFNo DIF
![Page 5: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/5.jpg)
There are two types of There are two types of DIFDIF
Uniform DIF Uniform DIF The referent group always has a higher The referent group always has a higher
probability of a correct response than probability of a correct response than that for the focal group.that for the focal group.
Non-uniform DIFNon-uniform DIF The direction of the advantage of one The direction of the advantage of one
group’s likelihood of a correct response group’s likelihood of a correct response changes in different regions of the changes in different regions of the ability scale.ability scale.
![Page 6: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/6.jpg)
Uniform DIFUniform DIF
![Page 7: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/7.jpg)
Non uniform DIFNon uniform DIF
![Page 8: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/8.jpg)
Differential Test FunctioningDifferential Test Functioning
DTF Against Reference Group
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0
Theta
Pro
po
rtio
n C
orr
ec
t T
rue
Sc
ore
Focal
Reference
![Page 9: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/9.jpg)
Relationship between IRT Relationship between IRT and CTST modelsand CTST models
It has been shown that there is a It has been shown that there is a relationship between 2 PL normal ogive relationship between 2 PL normal ogive IRT models and the single factor FA model IRT models and the single factor FA model (Lord & Novick, 1968)(Lord & Novick, 1968) The b-parameter is related to the threshold The b-parameter is related to the threshold
parameter divided by the item factor loadingparameter divided by the item factor loading The discrimination parameter is e2qual to the The discrimination parameter is e2qual to the
factor loading divided by the communality of factor loading divided by the communality of the itemthe item Highly discriminating items will have high factor Highly discriminating items will have high factor
loadingsloadings
![Page 10: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/10.jpg)
Examining Measurement Examining Measurement Invariance in CTSTInvariance in CTST
Examining factorial invarianceExamining factorial invariance Configural invarianceConfigural invariance
Zero and non-zero loading patterns are the same across Zero and non-zero loading patterns are the same across groupsgroups
Pattern (metric) invariancePattern (metric) invariance The factor loadings are equal across groupsThe factor loadings are equal across groups
Scalar (strong) invarianceScalar (strong) invariance The factor loadings and intercepts are equal across The factor loadings and intercepts are equal across
groupsgroups Any group differences in means can be attributed to the Any group differences in means can be attributed to the
common factors, which allows for meaningful group mean common factors, which allows for meaningful group mean comparisonscomparisons
Strict invarianceStrict invariance Factor loadings, intercepts, and unique variances are Factor loadings, intercepts, and unique variances are
equal across groupsequal across groups Any systematic differences in group means, variances, or Any systematic differences in group means, variances, or
covariances are due to the common factorscovariances are due to the common factors
![Page 11: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/11.jpg)
Examining DIF in IRTExamining DIF in IRT IRT tests of DIF examine if the IRC (Item response IRT tests of DIF examine if the IRC (Item response
curve) the same for the reference group as it is for curve) the same for the reference group as it is for the focal group.the focal group. The focal group is the smaller group in questions (the The focal group is the smaller group in questions (the
minority group).minority group). The reference group is the larger group that generally has The reference group is the larger group that generally has
the established parameters.the established parameters. If they are different, then this means that the probability of If they are different, then this means that the probability of
an individual in one group with ability x responding an individual in one group with ability x responding correctly is different than the probability of an individual correctly is different than the probability of an individual with the same ability x in group two if getting the item with the same ability x in group two if getting the item correct.correct.
DTF refers to a difference in the test characteristic DTF refers to a difference in the test characteristic curves, obtained by summing the item response curves, obtained by summing the item response functions for each group.functions for each group.
DTF is perhaps more important for selection because DTF is perhaps more important for selection because decisions are made based on test scores, not individual decisions are made based on test scores, not individual item responses.item responses.
![Page 12: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/12.jpg)
Procedures for Detecting Procedures for Detecting DIF/DTFDIF/DTF
Parametric ProceduresParametric Procedures Compare item parameters from Compare item parameters from
two groups of examineestwo groups of examineesLord’s Chi-SquareLord’s Chi-SquareLikelihood Ratio Test Likelihood Ratio Test
Compare IRFs from two groups Compare IRFs from two groups of examinees by measuring of examinees by measuring areas between themareas between themRaju’s Area MeasuresRaju’s Area Measures
![Page 13: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/13.jpg)
Likelihood Ratio TestLikelihood Ratio Test
Distributed as a chi-square with degrees of Distributed as a chi-square with degrees of freedom equal to the difference in the number freedom equal to the difference in the number of parameters estimated in the compact and of parameters estimated in the compact and the augmented modelthe augmented model The compact model assumes item parameters are The compact model assumes item parameters are
the same for both groupsthe same for both groups The augmented model constrains anchor items to The augmented model constrains anchor items to
be equal, but allows items of interest to have be equal, but allows items of interest to have parameters that vary across groupsparameters that vary across groups
2 2log (compact model) 2 log (augmented model)jG L L
![Page 14: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/14.jpg)
Raju’s Area MeasuresRaju’s Area Measures Signed and unsigned areasSigned and unsigned areas
Indicates the area between two IRCsIndicates the area between two IRCs Requires separate calibrations of the item parameters in each Requires separate calibrations of the item parameters in each
group, then use a linear transformation to put them on the group, then use a linear transformation to put them on the same scalesame scale
2 1
2 1
1 2 1 2 2 12 1
1 2 1 2
Signed area
Unsigned area
2Unsigned area ln 1 exp
D
D
![Page 15: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/15.jpg)
Procedures for Detecting Procedures for Detecting DIF/DTFDIF/DTF
Non Parametric ProceduresNon Parametric Procedures Bivariate frequencies between item Bivariate frequencies between item
responses and group memberships responses and group memberships conditional on levels of ability or trait conditional on levels of ability or trait estimation Logistic Regressionestimation Logistic Regression Simultaneous Item Bias Test (SIBTEST)Simultaneous Item Bias Test (SIBTEST) Mantel-Haenszel (MH)Mantel-Haenszel (MH) Logistic RegressionLogistic Regression
![Page 16: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/16.jpg)
Procedures for Detecting Procedures for Detecting DIF/DTFDIF/DTF
Simultaneous Item Bias Test (SIBTEST)Simultaneous Item Bias Test (SIBTEST) Examinees are matched on a true score Examinees are matched on a true score
ability estimate of abilityability estimate of ability Creates a weighted mean difference Creates a weighted mean difference
between the reference and focal groups, between the reference and focal groups, which is then tested statisticallywhich is then tested statistically The means are adjusted to correct for The means are adjusted to correct for
differences in the ability distributions with a differences in the ability distributions with a regression correction procedureregression correction procedure Some examination of this procedure has been Some examination of this procedure has been
conducted to examine changes in Type I error rates conducted to examine changes in Type I error rates when the percent of DIF items is largewhen the percent of DIF items is large
![Page 17: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/17.jpg)
SIBTESTSIBTEST
0
1
: 0
: 0UNI
UNI
UNI F
H
H
B f d
, ,
is the density function for in the focal group
is the differential of theta
F
B P R P F
f
d
![Page 18: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/18.jpg)
Mantel-Haenszel (MH)Mantel-Haenszel (MH)
Compares the item performance of Compares the item performance of two groups who were previously two groups who were previously matched on the ability scalematched on the ability scale Total test score can be usedTotal test score can be used K 2x2 contingency tables are made for K 2x2 contingency tables are made for
each item for K number of ability levelseach item for K number of ability levels DIF is shown if the odds of correctly DIF is shown if the odds of correctly
answering the item at a given score answering the item at a given score level is difference for the two groupslevel is difference for the two groups
![Page 19: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/19.jpg)
Mantel-Haenszel (MH)Mantel-Haenszel (MH)
Group j Right (1) Wrong (0)Reference
group Aj Bj
Focal group Cj Dj
Response to Suspect Item
1 1j j
j j
R F
R F
p pp p
![Page 20: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/20.jpg)
Mantel-Haenszel (MH)Mantel-Haenszel (MH)
The statistic for detecting DIF in an item isThe statistic for detecting DIF in an item is
2
1 1
1
..1
..1
0.5
/
/
2.35ln( )MH
K K
j jj j
K
jj
K
j j jj
MH K
j j jj
MH
A E A
MHVar A
A D N
B C N
•Type A items – negligible DIF with ΔαMH
< |1|
•Type B items – moderate DIF with |1|<= ΔαMH <= |1.5, and MH test is statistically significant|
•Type C items – large DIF with ΔαMH > |1.5|
![Page 21: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/21.jpg)
Logistic RegressionLogistic Regression( )
( )
0 1 2 3
( 1 | )1
( 1 | ) is the conditional probability of obtaining a
correct answer given independent variables
( )
is the independent (group) variable
is the m
f x
f x
ep u
ep u
f x G G
G
X
X
X
atching criterion (normally test score)
If the group effect is significant and the interaction is not, then If the group effect is significant and the interaction is not, then there is uniform DIFthere is uniform DIF
If the interaction is significant, then there is non-uniform DIFIf the interaction is significant, then there is non-uniform DIF Conduct model comparisons by adding each successive model termConduct model comparisons by adding each successive model term
![Page 22: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/22.jpg)
Computerized Adaptive Computerized Adaptive Testing (CAT)Testing (CAT)
To obtain equal precision of To obtain equal precision of measurement to that of a linear test, measurement to that of a linear test, but with greater efficiency.but with greater efficiency. Give people only the items that are Give people only the items that are
informative about them.informative about them. Reduce testing time and opportunity for Reduce testing time and opportunity for
error.error.
![Page 23: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/23.jpg)
CAT SystemCAT System
Initial ability estimate. Mean Prior
Select first item. Most discriminating. Least discriminating.
Estimate ability. MLE Bayesian Methods
Select items. Max info. Exposure control. Content specs.
Check stopping rule. SE stopping rule. Max # of items.
Estimate ability.
![Page 24: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/24.jpg)
Issues of Research in a CAT Issues of Research in a CAT system.system.
Early IssuesEarly Issues Precision of measurementPrecision of measurement
Estimation procedure, Prior estimatesEstimation procedure, Prior estimates EquivalenceEquivalence
Reliability of Estimate, Test Form Equivalence (Test Reliability of Estimate, Test Form Equivalence (Test Information), Testing ModeInformation), Testing Mode
EfficiencyEfficiency Item selection methods, Test lengthItem selection methods, Test length
Newer IssuesNewer Issues SecuritySecurity
Item exposureItem exposure Tetstlet modelsTetstlet models
![Page 25: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/25.jpg)
Item Exposure and Item Item Exposure and Item Selection MethodsSelection Methods
Sympson-HetterSympson-Hetter Directly controls item exposure probabilisticallyDirectly controls item exposure probabilistically
Places a filter between item selection and item Places a filter between item selection and item administrationadministration
Items are administered below a prespecified maximum Items are administered below a prespecified maximum exposure rateexposure rate
P(S) probability that an item is selected as the best P(S) probability that an item is selected as the best itemitem
P(A) probability that an item is administeredP(A) probability that an item is administered P(A|S) conditional probability that an item is P(A|S) conditional probability that an item is
administered given that it is selectedadministered given that it is selected Item exposure parameterItem exposure parameter
P(A)=P(A|S)*P(S)<=P(A)=P(A|S)*P(S)<=rrmaxmax P(A|S) is easy to determine if P(S) is known, but P(S) P(A|S) is easy to determine if P(S) is known, but P(S)
must be determined through an iterative process must be determined through an iterative process
![Page 26: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/26.jpg)
Item Exposure and Item Item Exposure and Item Selection MethodsSelection Methods
Conditional Sympson-Hetter or SLC Conditional Sympson-Hetter or SLC (Sotcking and Lewis, 1998)(Sotcking and Lewis, 1998) SH controls that item exposure for a SH controls that item exposure for a
population, but at various ability levels population, but at various ability levels the exposure rates can be quite highthe exposure rates can be quite high
P(A|S) is determined at specific trait P(A|S) is determined at specific trait levels rather than across a populationlevels rather than across a population
![Page 27: Applications of IRT Models](https://reader033.vdocument.in/reader033/viewer/2022061600/56815a00550346895dc74dbb/html5/thumbnails/27.jpg)
Item Exposure and Item Item Exposure and Item Selection MethodsSelection Methods
aa-stratified design (STR CAT; Chang & Ying, -stratified design (STR CAT; Chang & Ying, 1996, 1999)1996, 1999) Partition the item pool into multilevels and Partition the item pool into multilevels and
multistages according to the discrimination multistages according to the discrimination parametersparameters
Start with the less discriminating itemsStart with the less discriminating items This approach seems to improve item pool This approach seems to improve item pool
utilization and balanced item exposure ratesutilization and balanced item exposure rates Then use a Then use a bb-matching item selection procedure-matching item selection procedure
It is less computationally complexIt is less computationally complex No other restrictions on item exposure is imposedNo other restrictions on item exposure is imposed