pearson educational measurement
Post on 11-Feb-2022
3 Views
Preview:
TRANSCRIPT
Running head: IMPUTATION METHODS FOR NULL CATEGORIES
Imputation Methods for Handling Null Categories in Polytomous Items
Leslie Keng
The University of Texas at Austin
Ahmet Turhan
Pearson Educational Measurement
This paper will be presented at the annual meeting of the National Council on Measurement
in Education, Chicago, IL, April, 2007.
Correspondence concerning this article should be addressed to Leslie Keng, Department of
Educational Psychology, 1 University Station, Mail Station D5800, University of Texas, Austin, TX,
78712. E-mail: lkeng@mail.utexas.edu.
Imputation Methods for Null Categories 1
Abstract
In large-scale assessments, low examinee motivation on field-tests can lead to the lack of
observed scores at the highest score levels of polytomous items. Such a score level is referred to as
an extreme null category. Imputation methods can be used to assign examinees to extreme null category
so that the polytomous item can be calibrated. The current study compares three methods of
imputing extreme null categories. The three methods differ in the amount of information used to
decide on the target and frequency of imputation. Item parameters from 27 field-test forms are
taken from a recent administration of a statewide ELA assessment. These item parameters are used
to simulate item responses for 4,050,000 total examinees with ability parameters generated from a
N(-0.5, 1) distribution. The three imputation methods are applied to the simulated datasets and are
compared on several goodness-of-recovery (GOR) measures. The study finds that, when the overall
imputation demand is high, the use of historical performance data and random sampling to impute
scores leads to poor parameter recovery and negatively biased estimation of the highest step values
for imputed items. In contrast, the use of information from the current test form to impute extreme
null categories produces good overall parameter recovery, and is recommended as the ideal
imputation method because of its ease of implementation. Educational implications and limitations
of the study findings are also discussed.
Imputation Methods for Null Categories 2
Imputation Methods for Handling Null Categories in Polytomous Items
In high-stakes assessments, items are typically field-tested prior to inclusion on an
operational test. Because the quality and characteristics of the item are being tested, an examinee’s
performance on the field-test items typically does not count towards his or her overall score. In
many cases, field-test items are embedded into operational test forms so that the examinees cannot
distinguish between field-test items and operational items. Sometimes, however, embedding field-
test items is not feasible, and separate field-test forms are administered. In these cases, examinees
usually realize that their test scores do not count. As a result, their motivation on the test is low and
they do not try as hard as they would in an operational setting. Researchers have examined this field-
test effect and shown that students generally do not perform as well on low-stakes field tests than on
the high-stakes operational administrations (DeMars, 2000; Wolf and Smith, 1995).
One consequence of the field-test effect on a polytomously-scored item is that it can
negatively skew the item’s score distribution. In cases where the item itself is more difficult, the
field-test effect can skew the distribution to a point where no students receive a score in the highest
category. For example, for a 4-point essay item on an English Language Arts test, there may be no
students who attain a score point of “4” on the field test. This poses a problem during item
calibration. If, for instance, we use the Partial Credit model (Masters, 1982) to calibrate a
polytomous item, then the lack of scores in the top category would preclude us from estimating a
step value for that category.
Wilson and Masters (1993) have termed a score category with zero frequency a null category.
Several approaches have been employed to deal with null categories. One common approach is to
collapse out a null score category by reducing the score of any higher categories by one. This
approach alters the relationship between the substantive framework and the scoring scheme for such
an item, and is generally not recommended (Wilson, 1991). Wilson and Masters (1993) describe a
simple reformulation of the Partial Credit model that allows all categories to be retained and their
step values estimated. This approach has been implemented in several Rasch calibration programs,
such as with the “STKEEP” option in WINSTEPS (Linacre, 2003). The approach, however,
requires the null category to be an intermediate one and does not apply when it is an extreme one, such
as the highest or lowest score category.
One way to handle an extreme null category is with imputation. With imputation, a group of
examinees are chosen and assigned the score value of the extreme null category. The group of
Imputation Methods for Null Categories 3
examinees to impute may be chosen based on their item response information for similar test items
or related test sections. Thus, a variety of imputation methods are possible in a given context.
Imputation Methods
Two key issues need to be considered when imputing data. The first is the frequency; that is,
how many examinees should we impute? The second is the target; that is, for which examinees
should we impute the highest score value? How a practitioner addresses these two issues leads to
imputation methods that vary in the amount of examinee item response information used. These
methods can be broadly classified into three types. The three types of imputation methods are
described below from one that uses the least amount of information to one that uses the most.
1. Use Fixed Percentage and Random Sampling. For imputation methods in this category, the frequency
of examinees imputed is a fixed percentage, and the target for imputation is sampled from the
entire group of examinees. The fixed percentage can be determined from historical data, such as
the mean or median percentage of examinees who achieved the highest score level for similar
items on previous administrations of the assessment. The fixed percentage of examinees is then
randomly sampled from all examinees who took the test form in question; and the sampled
examinees are assigned the highest score value for the polytomous item with a null category.
2. Use Information from Current Test Form Only. For imputation methods in this category, the
frequency and target of imputation are based on the performance of examinees in a related
section of the same test form. For example, one could use the performance of the examinees in
the dichotomously-scored multiple-choice section on the same test form. All examinees who
achieve the highest total score in the multiple-choice section can be assigned the highest score
value for the polytomous item in question.
3. Use Information from Related Test Forms – For this method, the performance of examinees on other
related test forms is used to determine the imputation frequency and target. For example, the
performance of examinees on similar polytomous items on the other test forms in the same field-
test administration can be examined. Similar items can be, for instance, items that are the same
item type and in the same item position on the other test forms. For all similar items that do not
have a null category, the percentage of students who have achieved the highest score level can be
computed. That same percentage (frequency) of examinees from the test form in question can
then be imputed. The examinees to impute (target) should match the ones who achieved the
highest score level on the other test forms. This could be done by sampling examinees that have
equivalent percentile ranks total scores in a related section, such as the multiple-choice items.
Imputation Methods for Null Categories 4
Research Objective
Our review of literature failed to reveal any research that evaluated or compared imputation
methods for handing extreme null categories. In practice, however, imputation is routinely used in
high-stakes assessments on polytomous items from field-test forms that have null categories for
their highest score level. The purpose of our study is to compare different methods of imputing
data for this type of null category. Specifically, our study aims to answer the question: What type of
information used in the imputation method leads to the most accurate item parameter estimates?
Common sense would lead one to believe that it is better to use as much information as
possible. However, using more information requires more time and effort on the part of the
practitioner to implement the method. And the trade-off between accuracy of parameter estimation
and implementation time and effort may not be equitable. Thus, to address our research question,
implementations of the three types of imputation methods described above are applied to simulated
item responses data and their performances are compared. The findings should help inform
researchers and practitioners on the appropriate amount of information to use when it is necessary
to impute data for field-test administrations of large-scale assessments.
Method
Sample
The study used simulated datasets based on the item parameters from a real dataset. The
real dataset was taken from a recent field-test administration of a statewide English language arts
(ELA) assessment. For this separate field-test administration, a total of 92,996 students took 31
field-test forms such that each test form had on average 3,000 students responding. The ELA
assessment consists of 42 multiple-choice items, 3 open-ended short response items, and 1 extended
response essay item. The maximum score an examinee could receive on an open-ended short
response item is 3. Thus, open-ended items were scored on a 4-point scale (i.e. possible open-ended
item scores of 0, 1, 2, or 3). The maximum score for an extended response essay item was 4. So
essay items were on a 5-point scale (i.e. possible essay item scores of 0, 1, 2, 3, or 4). The Partial
Credit model (Masters, 1982) was used to calibrate all field-test items. The Partial Credit model
simplifies to the Rasch model (Rasch, 1980) in the calibration of the dichotomously-scored multiple
choice items.
The simulated datasets were constructed based on the estimated item parameters for 27 of
the 31 field-test forms in the real dataset. The item parameter estimates for these 27 forms were
Imputation Methods for Null Categories 5
used as the true parameter values in the simulated datasets. Thus, each test form in the simulated
dataset consisted of 42 multiple-choice items with true Rasch item difficulties matching those in one
of the real test forms. It also consisted of 3 open-ended items and 1 essay item with true step values
corresponding to those on the corresponding real test form.
Item responses were generated for 3,000 students per test form and a total of 50 replications
were conducted for each test form. This resulted in a total of (3,000 examinees × 50 replications =)
150,000 sets of simulated examinee responses per test form across the replications; and a grand total
of (150,000 examinee response sets × 27 test forms =) 4,050,000 examinee response sets across all
test forms replications. Within each test form replication, the 3,000 examinee ability parameters (θ)
were generated from a normal distribution with a mean of -0.5 and standard deviation of 1. A mean
of -0.5 was used to account for the field-test effect. This value is what has been historically
observed as the mean difference in ability estimates between the field tests and operational tests for
ELA students at this grade level. Each examinee’s item responses were generated based on the
Partial Credit model (Masters, 1982), which reduces to the Rasch model (Rasch, 1980) for the
dichotomous multiple-choice items. The item response generation code was implemented in SAS
(SAS Institute Inc., 2001).
Procedures
Implementations of the three types of imputation methods described above were used to
assign highest score points for any open-ended or essay items in any simulated datasets that have
extreme null categories. The three imputation methods are also implemented in SAS (SAS Institute
Inc., 2001).
For the first imputation method (Method 1), a fixed percentage of examinees was randomly
sampled and imputed. The fixed percentage was determined from historical data. Specifically, the
percentage of examinees who attained the highest score level for each open-ended and essay item
was obtained for the past four (2003-2006) operational ELA administrations at this grade level. The
median of the four percentages was computed for each item and used as the fixed percentage to
impute. The median percentage, instead of the mean, was used to mitigate the influence of outliers;
in particular, the unusually large percentage of score point 4’s observed for the essay item in 2005.
Table 1 gives the historical percentages and median percentage for the open-ended and essay items.
Imputation Methods for Null Categories 6
Table 1: Historical (2003-2006) Percentages for Highest Score Level on the Open-ended and Essay Items for the Operational ELA Assessment
Year Open-ended #1 Open-ended #2 Open-ended #3 Essay 2003 0.17% 0.44% 0.30% 3.55% 2004 0.52% 0.36% 0.89% 3.86% 2005 0.41% 0.69% 0.62% 6.50% 2006 0.31% 0.76% 0.22% 3.96%
Median 0.36% 0.57% 0.26% 3.70%
For the second imputation method (Method 2), all examinees who achieve the highest total
score on the 42 multiple-choice items on the same test form were imputed. Thus, for example,
suppose that on Test Form #18 in the current replication, one of the open-ended items had a null
category (i.e. no examinees with a score point 3). Suppose further that the highest total score on
the multiple-choice section for the current form was 38 and it was attained by two of the simulated
examinees. Then, this method would assign a score point of 3 to these two examinees for the
open-ended item in question.
For the third imputation method (Method 3), the frequency and target of imputation were
based on the performance on examinees on similar items on other related test forms. Specifically, for
each of the 27 simulated test forms in a replication, examinees with percentile ranks of 75 or higher
(i.e. top 25% examinees) on the form’s multiple-choice section were first identified. The proportion
of the top 25% examinees that attained the highest score point were determined and then averaged
across all 27 test forms for each polytomous items. For any test form with an extreme null category
in one of its polytomous items, the average proportion of top 25% examinees who achieved the
highest score point for the item in the same position was used to impute from the top 25%
examinees on the test form in question. So, for example, suppose that on Test Form #18, the first
open-ended item had an extreme null category. Suppose also that across all 27 forms in the current
replication, the mean percentage of top 25% examinees with a score point 3 on the first open-ended
item was .50%. Then, this method would randomly choose .50% of the top 25% examinees on Test
Form #18 and assign them a score of 3 for the first open-ended item.
Data Analysis
Within each replication, the original item responses on all test forms were imputed using the
three imputation methods. This resulted in (3 methods × 27 forms × 50 replications =) 4,050 post-
imputed datasets with item responses. Each post-imputed dataset is calibrated independently with
WINSTEPS (Linacre, 2003). WINSTEPS calibrates items using the Partial Credit model (Masters,
Imputation Methods for Null Categories 7
1982), which simplifies to the Rasch model (Rasch, 1980) for the dichotomous multiple-choice
items.
The performances of the three imputation methods were then evaluated and compared on
how well they recovered the true item parameter values. Two well-known and preferred goodness-
of-recovery (GOR) measures (Maris, 1999) were used to analyze parameter recovery. The first
GOR measure was the BIAS. The BIAS is the average difference between the estimated parameter
value and the true parameter value. That is, 50
1
( )( )
50
jr jr
j
bBIAS
ββ =
−=∑
Where jβ is the true value of parameter j, and jrb is the estimate of parameter j in the rth replicated
dataset (r = 1…50).
The second GOR measure was the root mean square deviation (RMSD). The RMSD is
defined as the square root of the average squared differences between the estimated and true
parameter values. That is,
502
1( )
( )50
jr jr
j
bRMSD
ββ =
−=∑
Where jβ is the true value of parameter j, and jrb is the estimate of parameter j in the rth replicated
dataset (r = 1…50). Lower mean BIAS and RMSD measures are considered more accurate in terms
of item parameter recovery.
For each method, the BIAS and RMSD were computed for every item on each test form
within each replication. Our results aggregated over all 42 multiple-choice items on each test form.
The GOR measures were then averaged across the 27 test forms so that the mean BIAS and RMSD
of the multiple-choice items could be reported and compared for the three imputation methods, 27 42
1 1( )
27 42
k ik i
MC
BIASBIAS
β= ==
×
∑∑ and
27 42
1 1( )
27 42
k ik i
MC
RMSDRMSD
β= ==
×
∑∑
Where kiβ is the true Rasch difficulty value of the ith multiple-choice item on test form k (i = 1…42
and k = 1…27).
In addition, our results separately computed the BIAS and RMSD of the individual step
difficulty values for the three open-ended and one essay items on every test form. These GOR
Imputation Methods for Null Categories 8
measures were then averaged across all test forms so that the mean BIAS and RMSD could be
reported and compared for the three imputation methods,
( )
27
( )1
( )
27m j
m jkk
OE
BIASBIAS
β==∑
and ( )
27
( )1
( )
27m j
m jkk
OE
RMSDRMSD
β==∑
Where ( )m jkβ is the true jth step difficulty value for the mth polytomous item (open-ended or essay) on
test form k (k = 1…27).
Results
Imputation Frequencies
Table 2 gives the number of replications for which each test form required imputation on
the four polytomous items. It also gives the mean number (frequency) of examinee scores on each
test form that were imputed under the three imputation methods for the four polytomous items.
Table 2: Frequency of Imputation for the Three Imputation Methods (Listed by test form)
Form Repsa M1b M2c M3d Repsa M1b M2c M3d Repsa M1b M2c M3d Repsa M1b M2c M3d
1 18 17 1.7 1.9 9 9 1.9 1.0 16 134 1.5 1.12 16 17 1.4 2.0 12 9 1.3 1.0 4 134 1.3 1.03 3 11 1.3 5.3 21 17 1.6 2.0 36 9 1.6 1.0 19 134 1.5 1.14 1 11 2.0 4.0 2 17 1.0 2.0 21 9 1.3 1.0 15 134 1.4 1.15 3 17 1.3 2.0 13 9 1.3 1.0 4 134 1.3 1.06 1 11 1.0 6.0 3 17 2.3 2.0 20 9 1.3 1.0 28 134 1.3 1.17 39 11 1.6 5.3 4 17 1.0 2.3 21 9 1.7 1.0 27 134 1.4 1.18 4 11 1.0 4.8 6 17 1.7 1.7 19 9 2.0 1.0 27 134 1.7 1.19 2 17 1.5 2.0 5 9 1.2 1.0 33 134 1.6 1.110 31 11 1.5 5.1 19 17 1.7 1.8 22 9 1.5 1.0 4 134 1.3 1.011 9 17 1.7 1.9 9 9 2.1 1.0 20 134 2.0 1.112 6 11 1.7 5.0 19 17 1.3 1.9 27 9 1.5 1.0 23 134 1.3 1.113 6 11 1.8 4.8 10 9 1.3 1.0 20 134 1.5 1.014 30 9 1.7 1.0 29 134 1.4 1.115 9 11 2.1 4.9 33 9 1.7 1.0 32 134 1.8 1.016 19 17 1.4 1.9 3 9 2.7 1.0 27 134 1.6 1.017 13 11 1.5 4.9 16 17 1.6 1.9 34 9 1.5 1.0 30 134 1.7 1.118 19 11 1.4 5.1 26 17 1.6 2.0 18 9 1.9 1.0 47 134 1.7 1.019 26 17 1.7 2.0 35 9 1.7 1.0 11 134 1.9 1.020 27 17 1.4 2.0 44 9 1.5 1.0 24 134 1.5 1.121 11 11 1.4 5.2 33 17 1.4 1.9 38 9 1.5 1.0 8 134 1.4 1.122 13 11 1.9 4.9 5 17 2.2 2.0 40 9 1.7 1.0 3 134 1.0 1.023 12 11 1.7 5.4 3 17 1.3 2.0 47 9 1.6 1.0 16 134 1.5 1.024 2 17 3.0 2.0 46 9 1.5 1.0 41 134 1.5 1.125 7 17 1.6 2.1 29 9 1.9 1.0 19 134 2.3 1.126 1 17 1.0 2.0 25 9 1.6 1.0 3 134 2.3 1.027 5 17 1.6 2.0 48 9 1.9 1.0 19 134 1.8 1.1
Open-Ended #1 Open-Ended #2 Open-Ended #3 Essay
a. Reps is the number of replications for this test form that required imputation for the item b. M1 is “Method 1”: the imputation method using fixed percentages and random sampling c. M2 is “Method 2”: the imputation method using information from the current test form only d. M3 is “Method 3”: the imputation method using information from related test forms
Imputation Methods for Null Categories 9
Figure 1 compares the mean number of examinees imputed for each polytomous item under
the three imputation methods, aggregated over the 27 test forms.
Figure 1: Mean Frequency of Imputation for the Three Imputation Method (across all test forms)
11.0
17.0
9.0
1.6 1.6 1.6 1.6
5.1
2.0 1.0 1.1
0
5
10
15
20
Open-Ended #1 Open-Ended #2 Open-Ended #3 Essay
Method 1 Method 2 Method 3
134.0
Table 2 and Figure 1 both clearly indicate that Method 1 imputed scores for the most
examinees. This was particularly apparent for the essay item, where 134 examinees on each test
form were randomly sampled and imputed with an essay score of 4. The high frequency was a result
of the higher percentage of examinees that historically scored 4s on the essay item for the
operational ELA tests. Note that because the frequency of imputation for Method 1 was based on
historical percentages and the number of simulated examinees was the same (3,000) across test
forms, the same number of examinees was always imputed for a particular polytomous item,
regardless of test form.
Method 2 and Method 3 generally imputed scores for a similar number of examinees. Both
methods tended to impute scores for between 1 to 3 examinees; with one exception being the first
open-ended item under Method 3. In that case, a higher number of examinees (4 to 6) were
imputed. This is due to the fact that a relatively smaller number of test forms required imputing for
the first open-ended item. The average percentage of examinees that scored 3s for this item was
therefore considerably higher across all test forms and as a consequence, Method 3 imputed more
scores. In comparison, under Method 2, the imputation target and frequency were chosen based
solely on the examinees’ performances on the multiple-choice section on the current test form. Thus,
the frequency of imputation was quite stable across test forms (as seen in Table 2) and across the
four polytomous items (as seen in Figure 1).
Imputation Methods for Null Categories 10
An additional observation can be made from Table 2 about the general characteristics of the
polytomous items. Based on the number of times each item required imputing across test forms and
replications (i.e. the Reps column for each item), we can inferred that the essay item was generally the
most difficult item (hence requiring the most imputing); while the first open-ended item tended to
be the easiest. This observation was consistent with the true item parameters used to generate the
item responses as the essay item on each test form tended to have the highest average step difficulty
value while the first open-ended item usually had the lowest average step value.
GOR for Multiple Choice Items
Table 3 lists, by test form and imputation method, the mean BIAS and RMSD measures
observed for the 42 multiple-choice items across 50 replications. It also aggregates the mean BIAS
and RMSD across the 27 test forms to give an overall mean BIAS and RMSD for each method (i.e.
MCBIAS and MCRMSD ).
Table 3: Mean BIAS and RMSD of the Multiple-Choice Items for the Three Imputation Methods (listed by form)
Form M1 M2 M3 M1 M2 M31 0.02 0.00 0.00 0.05 0.05 0.052 0.01 0.00 0.00 0.05 0.05 0.053 0.04 0.00 0.00 0.07 0.05 0.054 0.01 -0.01 -0.01 0.05 0.05 0.055 0.00 -0.01 -0.01 0.05 0.05 0.056 0.02 -0.01 -0.01 0.06 0.05 0.057 0.05 0.00 0.02 0.07 0.05 0.058 0.02 -0.01 -0.01 0.06 0.05 0.059 0.02 -0.01 -0.01 0.06 0.05 0.0510 0.03 0.00 0.01 0.06 0.05 0.0511 0.02 -0.01 -0.01 0.05 0.05 0.0512 0.03 0.00 0.00 0.07 0.05 0.0513 0.02 0.00 0.00 0.06 0.05 0.0514 0.03 0.00 0.00 0.06 0.05 0.0515 0.04 0.00 0.00 0.06 0.05 0.0516 0.02 0.00 0.00 0.06 0.05 0.0517 0.06 0.01 0.01 0.08 0.05 0.0518 0.07 0.00 0.01 0.08 0.05 0.0519 0.04 0.01 0.01 0.07 0.05 0.0520 0.07 0.02 0.02 0.09 0.06 0.0621 0.04 0.00 0.01 0.07 0.06 0.0622 0.02 0.00 0.00 0.06 0.05 0.0523 0.05 0.01 0.01 0.07 0.05 0.0524 0.06 0.01 0.02 0.08 0.05 0.0525 0.03 0.00 0.00 0.06 0.05 0.0526 0.01 0.00 0.00 0.05 0.05 0.0527 0.04 0.01 0.01 0.07 0.05 0.05
Mean 0.03 0.00 0.00 0.06 0.05 0.05
BIASMC RMSDMC
Imputation Methods for Null Categories 11
Table 3 shows that the three imputation methods performed similarly in recovering the true
parameter values for the multiple-choice items. At the test form level, the mean BIAS values for the
multiple-choice items ranged from about -0.01 to 0.07 for the three methods; and the overall mean
BIAS ( MCBIAS ) values were all close to zero. The same observation can be made about the RMSD
measures as the three methods had mean RMSD ranging from around 0.05 to 0.09 for the multiple-
choice items; and the overall mean RMSD ( MCRMSD ) were all approximately 0.05. Thus, it appears
that the method with which we chose to impute scores for the polytomous items on a test did not
have any notable effects on the estimation of the dichotomous item parameters.
GOR for Open-Ended Items
Tables 4 and 5 summarize the aggregated mean BIAS and RMSD values ( ( )m jOEBIAS and
( )m jOERMSD ) obtained for the four polytomous items.1 The mean GOR measures are given for
each step difficulty value (b1…b3 for the open-ended items and b1…b4 for the essay item) as well as
for the average step value (b-bar).
Table 4: Mean BIAS for Step Difficulty Values of Polytomous Items (aggregated across 27 forms)
Item b1 b2 b3 b4 b-bar b1 b2 b3 b4 b-bar b1 b2 b3 b4 b-bar
Open-Ended #1 0.01 0.08 -0.14 - -0.01 -0.02 0.08 0.26 - 0.10 -0.02 0.08 0.03 - 0.03Open-Ended #2 0.02 0.07 -0.68 - -0.20 -0.01 0.08 0.06 - 0.04 -0.01 0.08 -0.11 - -0.01Open-Ended #3 0.03 0.10 -1.96 - -0.61 0.01 0.14 -0.72 - -0.19 0.01 0.13 -0.68 - -0.18
Essay -0.02 0.02 0.04 -2.34 -0.58 -0.04 0.05 0.15 -0.10 0.02 -0.03 0.05 0.15 -0.10 0.02
Method 1 Method 2 Method 3
Table 5: Mean RMSD for Step Difficulty Values of Polytomous Items (aggregated across 27 forms)
Item b1 b2 b3 b4 b-bar b1 b2 b3 b4 b-bar b1 b2 b3 b4 b-bar
Open-Ended #1 0.06 0.13 0.80 - 0.27 0.05 0.13 0.63 - 0.21 0.05 0.13 0.68 - 0.23Open-Ended #2 0.05 0.16 1.34 - 0.43 0.05 0.16 0.66 - 0.21 0.05 0.16 0.65 - 0.21Open-Ended #3 0.07 0.23 2.32 - 0.74 0.06 0.25 1.13 - 0.34 0.06 0.24 1.00 - 0.30
Essay 0.05 0.08 0.22 3.37 0.87 0.06 0.08 0.23 0.68 0.16 0.06 0.08 0.23 0.59 0.14
Method 1 Method 2 Method 3
In examining Tables 4 and 5, we see that the three imputation methods performed
equivalently well in recovering the lower step value parameters. That is, the mean BIAS and RMSD
values were similar and low for b1 and b2 of the three open-ended items and b1 to b3 for the essay
item. The mean BIAS values for these parameters ranged from -0.04 to 0.15 and the mean RMSD
measures ranged from 0.05 to 0.25. Thus, the method used to impute scores in the extreme null
1 For those interested in the BIAS and RMSD measures obtained for the polytomous items on each test form, please refer to the tables in the Appendix (Appendix Tables 1 to 8).
Imputation Methods for Null Categories 12
category did not appear to have an effect on the parameter estimation of the non-extreme, non-null
categories.
The contrast in the three imputation methods, however, could be seen in the estimation of
the highest step value for each of the four polytomous items. Figures 2 and 3 visually compare the
three methods’ mean BIAS and RMSD for the highest step values of the four polytomous items.
Figure 2: Comparison of Mean BIAS for the highest step value of each polytomous item (aggregated across 27 forms)
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
Open-Ended #1(b3)
Open-Ended #2(b3)
Open-Ended #3(b3)
Essay (b4)
Polytomous Item
Abs
olut
e m
ean
BIA
S OE
Method 1 Method 2 Method 3
Figure 3: Comparison of Mean RMSD for the highest step value of each polytomous item (aggregated across 27 forms)
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
Open-Ended #1(b3)
Open-Ended #2(b3)
Open-Ended #3(b3)
Essay (b4)
Polytomous Item
Mea
n R
MSD
OE
Method 1 Method 2 Method 3
It is clear from both figures that Method 1 performed poorly, especially in estimating b3 of
open-ended item #3 and b4 of the essay item. The absolute mean BIAS values for these two
parameters are around 2.00 or greater and the mean RMSD values are also greater than 2.00. This
Imputation Methods for Null Categories 13
was in contrast to relatively low mean GOR measures for Methods 2 and 3. Also, the direction of
the mean BIAS values for Method 1 was negative for all four highest step values, meaning that the
method underestimated these extreme step values. This makes sense given that Method 1 imputed a
lot more scores than the other two methods, especially for the essay item, where 134 scores were
imputed for each form requiring imputation (see Table 2). Methods 2 and 3, on the other hand,
appear to be imputing a more reasonable number of examinees. Both of these methods produced
low absolute mean BIAS and mean RMSD in recovering the highest step values, with one exception
being the estimation of b3 for open-ended item #3. Even though the GOR measures in that case
were substantially lower than Method 1, the mean BIAS was still around -0.70, while the mean
RMSD was approximately 1.00.
In addition to comparing the GOR measures of the individual step difficulty values, we also
considered how well the imputation methods estimated the average step difficulty values (b-bar) of
the four polytomous items. The average step difficulty value is simply the mean of a polytomous item’s
step values. It is often used as an aggregate indicator of a polytomous item’s overall difficulty so
that it can be more directly compared to other polytomous items as well as dichotomous items.
The GOR measures for the b-bar values were already given in Table 4. Figures 4 and 5 visually
compare the three methods’ absolute mean BIAS and mean RMSD in recovering the average step
difficulty value of each polytomous item.
Figure 4: Comparison of Absolute Mean BIAS for the average step value (b-bar) of polytomous items (aggregated across 27 forms)
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
Open-Ended #1 Open-Ended #2 Open-Ended #3 Essay
Polytomous Item
Abs
olut
e m
ean
BIA
S
Method 1 Method 2 Method 3
Imputation Methods for Null Categories 14
Figure 5: Comparison of Mean RMSD for the average step value (b-bar) of polytomous items (aggregated across 27 forms)
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
Open-Ended #1 Open-Ended #2 Open-Ended #3 Essay
Polytomous Item
Mea
n R
MSD
Method 1 Method 2 Method 3
These two figures reflect the poor estimation of the highest step values using Method 1 on
the item’s overall perceived difficulty. This is especially apparent for open-ended item #3 and the
essay item. From Table 4, we see that, as with the estimation of the highest step value, the direction
of the BIAS for the average step difficulty for Method 1. This implies that the underestimation of
the highest step values in Method 1 led to an underestimation of the average step difficulty value,
making the imputed items appear easier than they actually are. In contrast, because Methods 2 and 3
performed well in recovering of the individual step values, they also estimate the average step
difficulty fairly accurately. Figures 4 and 5 show that the absolute mean BIAS and mean RMSD
were equally low for the two methods, meaning that they both produced good estimations of the
overall difficulty of the imputed items.
Discussion
In summary, our simulation found that all three methods performed equally well in
recovering the parameters of the multiple choice items and the lower step values of the polytomous
items. Methods 2 and 3 were also fairly accurate in their estimation of the highest steps values of
polytomous items. Method 1, however, performs poorly in its recovery of such extreme step values.
This led to underestimation of the true overall difficulties of imputed items, making these items
appear easier than they actually are.
One may attribute the poor performance of Method 1 to the disproportionately high
number of examinees per form that were chosen for imputation. This was especially apparent for
the essay item where score points of 4 were assigned to 134 examinees on each form that required
Imputation Methods for Null Categories 15
imputing. While that does appears to be part of the reason, an additional factor also should be
considered. Looking back at the imputation frequencies in Table 2, we see that Method 1 did indeed
impute substantially higher numbers of examinee scores than the other two methods for all four
polytomous items. However, Figures 2 and 3 show that it performed as well as the other two
methods in estimating b3 for open-ended item #1. Also, of the three open-ended items, Method 1
actually imputes the fewest scores (9) per form for open-ended item #3 (compared to 11 and 17 for
open-ended item #1 and #2 respectively). Yet the estimation of b3 for open-ended item #3 was not
as good as the other two open-ended items. If the number of examinees imputed per form were the
only reason for the poorer performance of Method 1, then we would have expect the estimation to
be about equally poor for all three open-ended items. These trends, however, suggest that when
more test forms require imputing for a particular item (such as open-ended #3 or the essay item),
then the fact that Method 1 imputes a disproportionately high number of examinees per form leads
to its considerably larger BIAS and RMSD in estimating the highest step values. Thus, a more
plausible explanation for Method 1’s poor performance is the overall number of examinees across
test forms that require imputation.
The implication of this finding to practitioners is that if a large-scale field-test assessment has
only a few test forms that require imputation and the number of scores that need to be imputed on
each form is reasonably low (e.g. less than 20), then the method of imputation does not seem to have
a substantive effect on parameters estimation. In such cases, randomly imputing a fixed percentage
of examinees based on a historical performance is as good of a method as other more sophisticated
imputation methods.
On the other hand, if a considerable number of test forms have polytomous items with
extreme null categories, then the method of imputation does make a difference. In our simulation,
Methods 2 and 3 both performed noticeable better than Method 1 in recovering the highest step
values of imputed items. This implies that the use of more information in deciding on the frequency
and target of imputation did lead to more accurate parameter estimation for such items. However,
the fact that Method 2 and 3 performed equally well suggests that the use of information from other
test forms is not necessary, and it is reasonable to base imputation decisions solely on item
responses from the same test form. Imputation methods based solely on information from the
current test form are less cumbersome to implement and provide the same high degree of accuracy
in item calibration. Thus, methods such as Method 2 are recommended based on the results of our
study.
Imputation Methods for Null Categories 16
Study Limitations
The current study represents an initial examination of imputation methods applied to a
scenario frequently encountered in practice, but to date, scarcely explored in research. Our study
sheds some light on the issue of what information needs to be considered in deciding how many and
which examinees to impute in situations where test scores need to be imputed for extreme null
categories. Several issues, however, still require further research.
One issue is to determine the effect that the imputation target and imputation frequency
each have on the accuracy of parameter estimation. In the current study, it is difficult to distinguish
how much of the poor performance of Method 1 is due to the large number of examinee imputed
and how much is due to the characteristics of the examinees chosen. Similarly, one wonders
whether the accuracy in parameter estimation for Methods 2 and 3 is because of the small number
of examinees imputed, the more proficient examinees that were chosen for imputation, or a
combination of the two factors. For example, would an imputation method that simply selects one
of the more proficient examinees for imputation (fixed frequency with selective target) do just as
well as Methods 2 and 3? Or, would a method that bases its imputation frequency on test form
information, but always randomly samples from the entire group of examinees (variable frequency
with non-selective target) be equally accurate in parameter estimation? Comparing implementations
of such methods with the ones in this study could further our understanding on the differential
effects of imputation target and frequency.
Another future direction is to explore the relationship between the characteristics of
polytomous items and imputation effectiveness. In the current study, Methods 2 and 3 performed
well in estimating every item parameter except for the highest step value (b3) of open-ended item #3.
No explanation could be found for this apparent aberration based on the design of the current
study. For example, Method 2 imputed, on each test form, the exact same set of examinees for all
four polytomous items. The number of test forms that require imputation is also similar for open-
ended item #3 and the essay item. Thus, it is somewhat notable that this particular parameter was
not estimated well across the forms and replications. One plausible explanation is that it is related to
characteristics of items, specifically those that are designated as the third open-ended item.
However, given that the true item parameters for this study are based on a large real dataset, it is
difficult to identify what distinguishes such an item from the other polytomous items. A study that
systematically compares item characteristics, such as the degree of separation of the step difficulty
values, may be able to shed some light on this unresolved phenomenon.
Imputation Methods for Null Categories 17
Lastly, the examinee abilities (θ) in this study were generated from a normal distribution with
mean of -0.5 to emulate the field-test effect. Given that there is a lower bound (0) an examinee can
achieve on a test and the low motivation typically observed on field-test administration, it might be
reasonable to consider other distributions such one that is negatively-skewed as well as varying
degrees of field-test effects. It would be interesting to examine the extent to which the distribution
of θ and the size of the field-test effect have on the item response distribution and consequently, the
effectiveness of different imputation methods.
Educational Implications
The findings of this and any further studies should help inform practitioners on what
information is important in deciding on the frequency and target for imputation. These decisions
have practical implications for the field-test calibration of high-stakes statewide assessments. Field-
test item statistics are often used in the scoring of retest administrations. In these cases, the test-
taking population is generally smaller so that no post-equating of the retest forms is conducted.
Because the scores on the retest are high stakes for its examinees, motivation is generally not an
issue in these administrations. Consequently, the full range of score points for polytomous items is
usually observed on the retests. However, if the step value for the highest score category was not
estimated for any item during field testing, then we could not use this item to make ability estimates
for students in the highest score category. Knowing how to handle null categories in such
polytomous items is essential to ensuring the integrity and defensibility of the testing program.
Imputation Methods for Null Categories 18
References
DeMars, C.E. (2000). Test stakes and item format interactions. Applied Measurement in Education,
13(1), 55-77.
Linacre, J. M. (2003). WINSTEPS Rasch measurement computer program. Chicago: Winsteps.com.
Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64 (2), 187-212.
Masters, G.N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.
Rasch, G. (1980). Probabilistic Models for Intelligence and Attainment Tests. Chicago: The University of
Chicago Press.
SAS Institute Inc. (2001). The SAS System for Windows (release 8.02). Cary, NC: SAS Institute Inc.
Wilson, M. (1991). Unobserved or null categories. Rasch Measurement Transactions, 5(1), 128.
Wilson, M., & Masters, G. N. (1993). The partial credit model and null categories. Psychometrika,
58(1), 87-99.
Wolf, L.F., & Smith, J.K. (1995). The consequence of consequence: Motivation, anxiety, and test
performance. Applied Measurement in Education, 8(3), 227-242.
Imputation Methods for Null Categories 19
Appendix
Appendix Table 1: BIAS for Step Difficulty Values of Open-Ended Item #1(listed by form)
Form b1 b2 b3 b-bar b1 b2 b3 b-bar b1 b2 b3 b-bar
1 0.00 0.04 0.14 0.06 -0.03 0.03 0.20 0.07 -0.03 0.03 0.18 0.062 -0.03 0.06 0.14 0.06 -0.04 0.05 0.17 0.06 -0.04 0.05 0.16 0.063 0.03 0.10 0.16 0.10 -0.02 0.09 0.46 0.18 -0.01 0.09 0.28 0.124 -0.01 0.06 0.22 0.09 -0.03 0.06 0.32 0.12 -0.03 0.06 0.28 0.105 -0.05 0.05 0.17 0.06 -0.06 0.04 0.18 0.05 -0.06 0.04 0.17 0.056 0.02 0.08 0.24 0.12 -0.02 0.08 0.39 0.15 -0.01 0.08 0.32 0.137 0.01 0.05 -2.79 -0.91 -0.05 0.06 -0.80 -0.26 -0.03 0.06 -2.09 -0.698 -0.01 0.06 0.09 0.05 -0.04 0.06 0.42 0.15 -0.04 0.06 0.24 0.089 0.02 0.07 0.19 0.09 -0.02 0.07 0.28 0.11 -0.02 0.07 0.27 0.1110 0.02 0.14 -1.87 -0.57 0.00 0.15 -0.35 -0.07 0.01 0.14 -1.33 -0.4011 0.00 0.07 0.23 0.10 -0.03 0.07 0.29 0.11 -0.03 0.07 0.28 0.1112 0.03 0.09 0.13 0.09 -0.01 0.08 0.52 0.20 -0.01 0.09 0.31 0.1313 0.00 0.06 0.02 0.03 -0.02 0.06 0.36 0.13 -0.02 0.06 0.19 0.0814 0.00 0.09 0.15 0.08 -0.04 0.08 0.26 0.10 -0.04 0.08 0.23 0.0915 0.04 0.07 0.04 0.05 -0.01 0.08 0.56 0.21 0.00 0.08 0.30 0.1216 -0.01 0.09 0.24 0.11 -0.04 0.09 0.34 0.13 -0.04 0.09 0.31 0.1217 0.04 0.06 -0.58 -0.16 -0.01 0.05 0.14 0.06 -0.01 0.05 -0.26 -0.0718 0.05 0.12 -0.98 -0.27 -0.02 0.13 0.34 0.15 -0.01 0.13 -0.47 -0.1219 -0.01 0.10 0.23 0.11 -0.05 0.07 0.26 0.10 -0.04 0.07 0.25 0.0920 0.06 0.13 0.25 0.15 0.01 0.12 0.40 0.18 0.01 0.12 0.34 0.1621 0.04 0.10 -0.37 -0.08 0.00 0.09 0.29 0.12 0.00 0.09 -0.14 -0.0222 0.01 0.11 -0.31 -0.06 -0.02 0.11 0.34 0.15 -0.01 0.11 -0.05 0.0123 0.04 0.08 -0.56 -0.15 0.00 0.08 0.11 0.06 0.00 0.07 -0.32 -0.0824 0.07 0.12 0.26 0.15 0.00 0.10 0.37 0.16 0.01 0.10 0.34 0.1525 0.01 0.05 0.27 0.11 -0.02 0.05 0.37 0.13 -0.02 0.05 0.33 0.1226 -0.01 0.10 0.33 0.14 -0.02 0.10 0.38 0.15 -0.02 0.09 0.35 0.1427 0.01 0.08 0.20 0.10 -0.03 0.07 0.31 0.12 -0.03 0.07 0.26 0.10
Mean 0.01 0.08 -0.14 -0.01 -0.02 0.08 0.26 0.10 -0.02 0.08 0.03 0.03
Method 1 Method 2 Method 3
Imputation Methods for Null Categories 20
Appendix Table 2: RMSD for Step Difficulty Values of Open-Ended Item #1(listed by form)
Form b1 b2 b3 b-bar b1 b2 b3 b-bar b1 b2 b3 b-bar
1 0.04 0.08 0.37 0.13 0.05 0.07 0.42 0.14 0.05 0.07 0.40 0.132 0.05 0.11 0.47 0.16 0.06 0.11 0.48 0.16 0.06 0.11 0.48 0.163 0.05 0.14 0.66 0.23 0.04 0.14 0.77 0.27 0.04 0.14 0.63 0.224 0.06 0.16 0.58 0.19 0.06 0.16 0.63 0.21 0.06 0.16 0.59 0.205 0.07 0.10 0.41 0.14 0.08 0.10 0.42 0.14 0.08 0.10 0.42 0.146 0.05 0.11 0.58 0.20 0.04 0.11 0.70 0.24 0.04 0.11 0.62 0.217 0.05 0.12 2.95 0.97 0.06 0.12 0.99 0.32 0.05 0.13 2.20 0.728 0.06 0.11 0.65 0.22 0.07 0.11 0.71 0.24 0.07 0.11 0.57 0.199 0.05 0.12 0.43 0.16 0.05 0.12 0.49 0.17 0.05 0.12 0.48 0.1710 0.05 0.22 2.20 0.68 0.05 0.23 0.63 0.17 0.05 0.23 1.55 0.4711 0.05 0.11 0.35 0.13 0.05 0.10 0.40 0.14 0.05 0.10 0.38 0.1412 0.06 0.12 0.77 0.26 0.05 0.12 0.78 0.27 0.04 0.12 0.67 0.2313 0.06 0.12 0.63 0.21 0.05 0.12 0.62 0.21 0.05 0.12 0.50 0.1714 0.05 0.12 0.54 0.19 0.06 0.11 0.60 0.21 0.06 0.11 0.57 0.2015 0.05 0.11 0.88 0.29 0.04 0.12 0.83 0.29 0.04 0.12 0.73 0.2516 0.07 0.14 0.58 0.20 0.07 0.14 0.63 0.22 0.07 0.14 0.61 0.2117 0.07 0.12 1.08 0.35 0.04 0.13 0.90 0.29 0.04 0.12 0.68 0.2218 0.07 0.15 1.46 0.44 0.04 0.16 0.67 0.24 0.04 0.16 0.89 0.2719 0.05 0.12 0.33 0.13 0.06 0.10 0.36 0.12 0.06 0.10 0.34 0.1220 0.08 0.16 0.50 0.21 0.05 0.15 0.62 0.24 0.05 0.15 0.56 0.2221 0.06 0.15 0.99 0.31 0.04 0.14 0.58 0.21 0.04 0.14 0.67 0.2222 0.04 0.18 1.04 0.34 0.04 0.17 0.65 0.24 0.04 0.17 0.69 0.2323 0.06 0.13 1.05 0.34 0.04 0.13 0.62 0.20 0.04 0.13 0.74 0.2424 0.08 0.15 0.43 0.18 0.05 0.13 0.51 0.19 0.05 0.13 0.48 0.1825 0.05 0.12 0.55 0.18 0.04 0.12 0.59 0.20 0.04 0.12 0.57 0.1926 0.05 0.15 0.67 0.24 0.05 0.15 0.72 0.25 0.05 0.15 0.69 0.2427 0.05 0.11 0.51 0.18 0.05 0.10 0.59 0.20 0.05 0.10 0.54 0.18
Mean 0.06 0.13 0.80 0.27 0.05 0.13 0.63 0.21 0.05 0.13 0.68 0.23
Method 1 Method 2 Method 3
Imputation Methods for Null Categories 21
Appendix Table 3: BIAS for Step Difficulty Values of Open-Ended Item #2(listed by form)
Form b1 b2 b3 b-bar b1 b2 b3 b-bar b1 b2 b3 b-bar
1 0.01 0.06 -1.74 -0.56 -0.01 0.08 -0.67 -0.20 -0.01 0.08 -0.84 -0.262 0.00 0.04 -1.44 -0.46 -0.01 0.05 -0.52 -0.16 0.00 0.05 -0.70 -0.223 0.02 0.06 -1.75 -0.56 -0.02 0.08 -0.36 -0.10 -0.01 0.07 -0.70 -0.214 -0.01 0.04 -0.07 -0.01 -0.03 0.03 0.13 0.04 -0.03 0.03 0.06 0.025 -0.03 0.03 0.09 0.03 -0.04 0.02 0.28 0.09 -0.04 0.02 0.24 0.076 0.00 0.08 -0.14 -0.02 -0.04 0.07 0.11 0.05 -0.04 0.07 0.07 0.047 0.05 0.12 -0.07 0.03 0.01 0.13 0.46 0.20 0.02 0.13 0.22 0.128 0.02 0.02 -0.14 -0.03 -0.01 0.05 0.31 0.12 -0.01 0.04 0.26 0.109 0.01 0.09 0.07 0.06 -0.02 0.11 0.33 0.14 -0.02 0.11 0.27 0.1210 0.01 0.07 -1.15 -0.36 -0.02 0.06 -0.06 -0.01 -0.01 0.06 -0.22 -0.0611 0.00 0.03 -0.72 -0.23 -0.02 0.04 -0.11 -0.03 -0.02 0.03 -0.24 -0.0712 0.01 0.05 -1.26 -0.40 -0.03 0.06 -0.01 0.01 -0.03 0.06 -0.30 -0.0913 0.02 0.05 0.15 0.07 -0.01 0.05 0.25 0.10 0.00 0.06 0.22 0.0914 0.03 0.11 0.12 0.09 -0.01 0.11 0.26 0.12 -0.01 0.11 0.22 0.1115 0.03 0.09 0.26 0.13 -0.01 0.09 0.44 0.17 -0.01 0.09 0.37 0.1516 0.01 0.05 -1.26 -0.40 -0.02 0.06 -0.11 -0.02 -0.02 0.06 -0.32 -0.0917 0.05 0.05 -1.18 -0.36 0.00 0.06 -0.15 -0.03 0.00 0.06 -0.32 -0.0918 0.07 0.08 -1.78 -0.54 0.00 0.11 0.02 0.04 0.01 0.11 -0.40 -0.0919 0.06 0.07 -1.94 -0.60 0.04 0.10 -0.35 -0.07 0.04 0.09 -0.69 -0.1920 0.07 0.10 -2.13 -0.65 0.03 0.15 -0.34 -0.05 0.03 0.14 -0.78 -0.2021 0.04 0.11 -2.34 -0.73 0.01 0.14 -0.33 -0.06 0.01 0.14 -0.75 -0.2022 0.01 0.08 -0.20 -0.03 -0.01 0.07 0.13 0.06 -0.01 0.07 0.05 0.0423 0.03 0.07 0.06 0.05 -0.01 0.06 0.36 0.13 -0.01 0.06 0.25 0.1024 0.04 0.08 -0.02 0.03 -0.01 0.07 0.28 0.11 -0.01 0.07 0.20 0.0925 0.02 0.07 -0.14 -0.02 -0.01 0.08 0.37 0.15 -0.01 0.08 0.24 0.1026 0.00 0.05 0.20 0.08 -0.01 0.05 0.32 0.12 -0.01 0.05 0.27 0.1027 0.04 0.10 0.06 0.07 0.00 0.11 0.52 0.21 0.00 0.10 0.39 0.16
Mean 0.02 0.07 -0.68 -0.20 -0.01 0.08 0.06 0.04 -0.01 0.08 -0.11 -0.01
Method 1 Method 2 Method 3
Imputation Methods for Null Categories 22
Appendix Table 4: RMSD for Step Difficulty Values of Open-Ended Item #2(listed by form)
Form b1 b2 b3 b-bar b1 b2 b3 b-bar b1 b2 b3 b-bar
1 0.04 0.19 2.13 0.70 0.04 0.20 0.89 0.28 0.04 0.20 0.97 0.302 0.04 0.17 1.94 0.64 0.04 0.17 0.69 0.22 0.04 0.17 0.82 0.263 0.05 0.17 2.24 0.73 0.04 0.17 0.67 0.21 0.04 0.17 0.84 0.264 0.04 0.12 0.71 0.22 0.05 0.12 0.66 0.21 0.05 0.12 0.58 0.185 0.05 0.12 0.83 0.26 0.06 0.13 0.73 0.23 0.06 0.13 0.69 0.226 0.04 0.14 0.71 0.24 0.05 0.14 0.54 0.19 0.05 0.14 0.52 0.187 0.07 0.21 0.81 0.27 0.05 0.21 0.87 0.31 0.05 0.22 0.62 0.238 0.05 0.14 0.87 0.29 0.04 0.15 0.69 0.23 0.04 0.15 0.65 0.229 0.05 0.19 0.74 0.24 0.05 0.20 0.72 0.25 0.05 0.20 0.66 0.2210 0.04 0.15 1.76 0.57 0.05 0.15 0.53 0.16 0.04 0.15 0.46 0.1411 0.04 0.10 1.29 0.42 0.05 0.10 0.58 0.18 0.05 0.10 0.54 0.1712 0.05 0.13 1.83 0.59 0.04 0.13 0.51 0.17 0.04 0.13 0.52 0.1713 0.06 0.12 0.60 0.19 0.05 0.12 0.61 0.20 0.05 0.12 0.60 0.2014 0.06 0.16 0.63 0.22 0.05 0.16 0.65 0.23 0.05 0.16 0.64 0.2215 0.06 0.15 0.63 0.22 0.05 0.15 0.75 0.26 0.05 0.15 0.69 0.2416 0.04 0.15 1.92 0.61 0.04 0.16 0.49 0.14 0.04 0.16 0.58 0.1717 0.07 0.14 1.72 0.55 0.05 0.15 0.88 0.28 0.05 0.15 0.54 0.1618 0.08 0.16 2.22 0.70 0.05 0.17 0.53 0.18 0.04 0.17 0.56 0.1619 0.08 0.19 2.43 0.78 0.06 0.20 0.59 0.17 0.06 0.20 0.79 0.2320 0.09 0.23 2.56 0.81 0.06 0.25 0.59 0.17 0.06 0.25 0.89 0.2421 0.06 0.24 2.71 0.87 0.06 0.25 0.53 0.15 0.06 0.25 0.85 0.2422 0.05 0.13 0.79 0.25 0.04 0.12 0.57 0.19 0.04 0.12 0.49 0.1623 0.05 0.12 0.74 0.24 0.05 0.12 0.70 0.23 0.04 0.12 0.62 0.2024 0.06 0.12 0.66 0.22 0.04 0.12 0.60 0.22 0.04 0.12 0.56 0.2025 0.06 0.19 0.99 0.32 0.05 0.20 0.74 0.25 0.05 0.20 0.62 0.2126 0.05 0.14 0.69 0.23 0.05 0.14 0.66 0.22 0.05 0.14 0.63 0.2127 0.06 0.18 0.90 0.30 0.04 0.18 0.74 0.27 0.04 0.18 0.64 0.23
Mean 0.05 0.16 1.34 0.43 0.05 0.16 0.66 0.21 0.05 0.16 0.65 0.21
Method 1 Method 2 Method 3
Imputation Methods for Null Categories 23
Appendix Table 5: BIAS for Step Difficulty Values of Open-Ended Item #3(listed by form)
Form b1 b2 b3 b-bar b1 b2 b3 b-bar b1 b2 b3 b-bar
1 0.02 0.09 -0.18 -0.02 0.00 0.10 0.29 0.13 0.00 0.10 0.32 0.142 0.01 0.08 -0.32 -0.08 -0.01 0.08 0.23 0.10 -0.01 0.08 0.25 0.113 0.04 0.11 -2.50 -0.78 0.01 0.17 -0.71 -0.18 0.02 0.16 -0.71 -0.184 0.00 0.05 -1.00 -0.32 -0.02 0.06 -0.04 0.00 -0.02 0.05 -0.01 0.015 -0.03 0.07 -0.48 -0.15 -0.03 0.07 0.09 0.04 -0.03 0.07 0.12 0.056 0.02 0.03 -0.86 -0.27 -0.01 0.05 0.15 0.06 -0.01 0.04 0.14 0.067 0.05 0.09 -1.22 -0.36 0.00 0.10 -0.08 0.01 0.01 0.11 -0.16 -0.018 0.02 0.12 -0.88 -0.25 -0.01 0.15 -0.10 0.01 -0.01 0.15 0.10 0.089 0.04 0.08 0.01 0.04 0.01 0.11 0.39 0.17 0.01 0.10 0.36 0.1610 0.02 0.10 -1.25 -0.38 -0.01 0.10 -0.23 -0.05 0.00 0.10 -0.22 -0.0411 0.02 0.04 -0.12 -0.02 0.00 0.06 0.37 0.14 0.00 0.05 0.36 0.1412 0.02 0.09 -1.74 -0.54 -0.01 0.12 -0.41 -0.10 -0.01 0.11 -0.40 -0.1013 0.02 0.10 -0.94 -0.27 0.00 0.12 -0.35 -0.08 0.00 0.12 -0.39 -0.0914 0.04 0.08 -2.06 -0.64 0.01 0.13 -0.71 -0.19 0.01 0.12 -0.56 -0.1415 0.03 0.07 -2.38 -0.76 0.00 0.16 -0.75 -0.20 0.01 0.14 -0.70 -0.1816 0.02 0.12 0.15 0.10 -0.01 0.13 0.37 0.16 -0.01 0.13 0.37 0.1617 0.05 0.14 -2.86 -0.89 0.01 0.19 -1.14 -0.31 0.01 0.19 -1.14 -0.3118 0.06 0.10 -0.92 -0.25 0.00 0.13 0.20 0.11 0.01 0.13 0.12 0.0919 0.04 0.09 -3.15 -1.01 0.02 0.12 -1.53 -0.46 0.02 0.11 -1.46 -0.4420 0.08 0.21 -5.37 -1.69 0.06 0.32 -3.05 -0.89 0.06 0.29 -3.14 -0.9321 0.05 0.12 -2.90 -0.91 0.02 0.17 -1.12 -0.31 0.03 0.15 -1.07 -0.3022 0.02 0.16 -3.25 -1.02 0.01 0.22 -1.47 -0.41 0.01 0.20 -1.34 -0.3723 0.04 0.15 -4.52 -1.44 0.02 0.23 -2.39 -0.71 0.02 0.20 -2.25 -0.6724 0.08 0.07 -4.99 -1.61 0.06 0.23 -2.61 -0.77 0.06 0.19 -2.60 -0.7925 0.02 0.07 -2.06 -0.66 -0.01 0.09 -0.81 -0.24 -0.01 0.08 -0.67 -0.2026 0.03 0.09 -2.47 -0.79 0.02 0.11 -1.38 -0.41 0.02 0.11 -1.28 -0.3827 0.05 0.23 -4.71 -1.48 0.02 0.30 -2.69 -0.79 0.02 0.28 -2.37 -0.69
Mean 0.03 0.10 -1.96 -0.61 0.01 0.14 -0.72 -0.19 0.01 0.13 -0.68 -0.18
Method 1 Method 2 Method 3
Imputation Methods for Null Categories 24
Appendix Table 6: RMSD for Step Difficulty Values of Open-Ended Item #3(listed by form)
Form b1 b2 b3 b-bar b1 b2 b3 b-bar b1 b2 b3 b-bar
1 0.06 0.17 0.82 0.26 0.05 0.18 0.67 0.23 0.05 0.18 0.69 0.242 0.05 0.17 0.98 0.31 0.05 0.17 0.60 0.21 0.05 0.17 0.59 0.213 0.07 0.25 2.72 0.85 0.05 0.29 0.96 0.27 0.06 0.28 0.78 0.194 0.06 0.13 1.46 0.47 0.06 0.14 0.41 0.13 0.06 0.14 0.29 0.105 0.05 0.15 1.08 0.35 0.06 0.15 0.36 0.13 0.06 0.15 0.35 0.136 0.06 0.13 1.36 0.43 0.06 0.14 0.50 0.17 0.06 0.14 0.39 0.147 0.07 0.16 1.61 0.50 0.05 0.17 0.63 0.20 0.05 0.17 0.45 0.148 0.05 0.21 1.37 0.43 0.05 0.24 0.86 0.28 0.05 0.23 0.44 0.179 0.07 0.17 0.68 0.23 0.05 0.19 0.71 0.25 0.05 0.18 0.69 0.2410 0.05 0.17 1.64 0.52 0.05 0.17 0.44 0.13 0.04 0.17 0.34 0.0911 0.05 0.14 0.82 0.27 0.04 0.14 0.72 0.24 0.04 0.14 0.65 0.2212 0.07 0.19 2.04 0.65 0.06 0.20 0.67 0.20 0.06 0.20 0.51 0.1413 0.05 0.20 1.30 0.41 0.05 0.21 0.58 0.17 0.05 0.21 0.56 0.1614 0.07 0.19 2.37 0.74 0.05 0.22 0.89 0.25 0.05 0.22 0.65 0.1715 0.07 0.26 2.60 0.83 0.06 0.31 0.95 0.27 0.06 0.29 0.75 0.2016 0.06 0.18 0.68 0.25 0.05 0.19 0.71 0.26 0.05 0.19 0.73 0.2617 0.06 0.26 3.07 0.96 0.04 0.29 1.37 0.40 0.04 0.29 1.18 0.3218 0.08 0.18 1.34 0.41 0.05 0.20 0.57 0.21 0.05 0.20 0.42 0.1619 0.07 0.24 3.34 1.07 0.06 0.26 1.62 0.50 0.06 0.25 1.49 0.4520 0.11 0.41 5.43 1.71 0.10 0.47 3.14 0.92 0.10 0.46 3.16 0.9321 0.07 0.26 3.07 0.97 0.06 0.30 1.21 0.34 0.06 0.28 1.10 0.3022 0.06 0.32 3.42 1.08 0.06 0.37 1.59 0.44 0.05 0.35 1.38 0.3823 0.06 0.32 4.56 1.45 0.05 0.37 2.48 0.74 0.05 0.35 2.26 0.6824 0.11 0.44 5.05 1.63 0.10 0.50 2.69 0.79 0.10 0.47 2.64 0.7925 0.05 0.20 2.32 0.75 0.04 0.21 0.98 0.31 0.04 0.20 0.72 0.2226 0.07 0.28 2.72 0.87 0.07 0.28 1.47 0.45 0.07 0.28 1.33 0.4027 0.06 0.33 4.73 1.48 0.05 0.39 2.76 0.81 0.04 0.37 2.38 0.69
Mean 0.07 0.23 2.32 0.74 0.06 0.25 1.13 0.34 0.06 0.24 1.00 0.30
Method 1 Method 2 Method 3
Imputation Methods for Null Categories 25
Appendix Table 7: BIAS for Step Difficulty Values of Essay Item (listed by form)
Form b1 b2 b3 b4 b-bar b1 b2 b3 b4 b-bar b1 b2 b3 b4 b-bar
1 -0.02 0.02 0.08 -1.60 -0.38 -0.03 0.04 0.17 0.17 0.09 -0.03 0.04 0.16 0.18 0.092 -0.04 0.02 0.08 -0.05 0.00 -0.06 0.01 0.08 0.39 0.11 -0.05 0.01 0.08 0.38 0.113 0.01 0.05 0.05 -1.93 -0.45 -0.01 0.08 0.17 0.28 0.13 0.00 0.09 0.16 0.21 0.114 -0.04 0.02 0.04 -1.31 -0.32 -0.04 0.03 0.11 0.30 0.10 -0.04 0.03 0.11 0.30 0.105 -0.06 0.02 0.06 -0.11 -0.03 -0.07 0.01 0.07 0.30 0.08 -0.07 0.01 0.07 0.31 0.086 -0.02 0.00 -0.02 -3.23 -0.82 -0.03 0.06 0.15 -0.16 0.01 -0.03 0.06 0.15 -0.18 0.007 0.00 0.02 0.03 -3.38 -0.83 -0.03 0.04 0.18 -0.29 -0.02 -0.02 0.06 0.18 -0.43 -0.058 -0.03 -0.02 0.00 -3.13 -0.79 -0.03 0.05 0.18 -0.20 0.00 -0.03 0.04 0.17 -0.13 0.019 -0.05 -0.02 -0.02 -4.20 -1.07 -0.04 0.07 0.22 -0.67 -0.10 -0.04 0.07 0.20 -0.53 -0.0710 -0.02 0.06 0.08 -0.16 -0.01 -0.05 0.03 0.07 0.33 0.10 -0.04 0.04 0.08 0.28 0.0911 -0.04 0.01 0.06 -2.13 -0.53 -0.04 0.03 0.16 -0.05 0.02 -0.04 0.03 0.15 0.04 0.0512 0.00 0.03 0.05 -2.47 -0.60 -0.02 0.06 0.20 0.19 0.11 -0.01 0.07 0.19 0.10 0.0813 -0.03 0.01 0.05 -2.29 -0.57 -0.04 0.04 0.18 -0.08 0.02 -0.04 0.04 0.17 -0.05 0.0314 -0.03 -0.01 -0.02 -3.78 -0.96 -0.04 0.05 0.17 -0.58 -0.10 -0.04 0.05 0.16 -0.59 -0.1115 -0.04 0.01 0.02 -4.07 -1.02 -0.06 0.05 0.19 -0.67 -0.12 -0.06 0.06 0.18 -0.55 -0.0916 -0.04 -0.01 -0.01 -3.32 -0.84 -0.05 0.03 0.14 -0.47 -0.09 -0.05 0.03 0.13 -0.36 -0.0617 -0.02 0.02 0.00 -3.79 -0.95 -0.05 0.05 0.16 -0.42 -0.07 -0.04 0.05 0.15 -0.50 -0.0818 -0.01 -0.03 -0.08 -6.67 -1.69 -0.04 0.05 0.25 -1.22 -0.24 -0.03 0.06 0.22 -1.32 -0.2719 -0.02 0.04 0.10 -0.95 -0.21 -0.05 0.03 0.13 0.23 0.08 -0.05 0.03 0.13 0.25 0.0920 0.01 0.06 0.06 -2.82 -0.67 -0.03 0.06 0.16 -0.17 0.01 -0.02 0.07 0.16 -0.22 0.0021 0.01 0.06 0.10 -0.68 -0.13 -0.03 0.04 0.11 0.25 0.09 -0.03 0.04 0.11 0.19 0.0822 0.00 0.07 0.11 0.10 0.07 -0.02 0.05 0.11 0.50 0.16 -0.01 0.06 0.12 0.44 0.1523 0.01 0.03 0.05 -1.78 -0.42 -0.02 0.04 0.13 0.03 0.04 -0.01 0.04 0.12 0.01 0.0424 0.01 0.00 0.00 -5.17 -1.29 -0.01 0.07 0.23 -0.67 -0.09 -0.01 0.07 0.22 -0.66 -0.0925 -0.01 0.03 0.04 -2.20 -0.54 -0.02 0.07 0.18 -0.23 0.00 -0.02 0.07 0.17 -0.10 0.0326 -0.05 0.04 0.10 0.02 0.03 -0.06 0.04 0.10 0.34 0.11 -0.06 0.04 0.10 0.36 0.1127 0.00 0.04 0.03 -2.14 -0.52 -0.01 0.07 0.16 -0.04 0.05 -0.02 0.07 0.15 0.00 0.05
Mean -0.02 0.02 0.04 -2.34 -0.58 -0.04 0.05 0.15 -0.10 0.02 -0.03 0.05 0.15 -0.10 0.02
Method 1 Method 2 Method 3
Appendix Table 8: RMSD for Step Difficulty Values of Essay Item (listed by form)
Form b1 b2 b3 b4 b-bar b1 b2 b3 b4 b-bar b1 b2 b3 b4 b-bar
1 0.05 0.07 0.20 2.89 0.74 0.05 0.06 0.22 0.54 0.15 0.05 0.06 0.22 0.48 0.142 0.07 0.06 0.12 1.31 0.34 0.07 0.05 0.12 0.69 0.18 0.07 0.05 0.12 0.70 0.183 0.06 0.11 0.20 3.16 0.81 0.06 0.11 0.25 0.64 0.19 0.06 0.12 0.24 0.51 0.164 0.06 0.07 0.17 2.61 0.68 0.06 0.07 0.16 0.61 0.16 0.06 0.07 0.16 0.58 0.165 0.08 0.05 0.12 1.22 0.31 0.08 0.05 0.12 0.61 0.15 0.08 0.05 0.12 0.62 0.166 0.05 0.09 0.21 4.17 1.08 0.05 0.08 0.24 0.53 0.11 0.05 0.08 0.23 0.41 0.097 0.04 0.07 0.24 4.27 1.09 0.05 0.07 0.26 0.63 0.13 0.04 0.08 0.26 0.59 0.108 0.06 0.09 0.27 4.11 1.07 0.06 0.08 0.27 0.63 0.14 0.06 0.08 0.27 0.43 0.109 0.07 0.11 0.28 4.94 1.28 0.06 0.10 0.33 0.85 0.16 0.06 0.10 0.32 0.63 0.1110 0.05 0.09 0.13 1.20 0.31 0.07 0.07 0.12 0.70 0.18 0.06 0.08 0.13 0.66 0.1811 0.06 0.06 0.22 3.35 0.87 0.07 0.06 0.23 0.53 0.13 0.07 0.06 0.23 0.36 0.1012 0.05 0.09 0.26 3.62 0.94 0.05 0.09 0.28 0.53 0.16 0.05 0.09 0.27 0.43 0.1313 0.05 0.09 0.30 3.51 0.90 0.06 0.07 0.33 0.50 0.12 0.05 0.07 0.33 0.46 0.1014 0.05 0.08 0.28 4.57 1.19 0.05 0.07 0.27 0.74 0.15 0.05 0.07 0.26 0.74 0.1515 0.06 0.06 0.25 4.77 1.22 0.07 0.07 0.28 0.87 0.18 0.07 0.07 0.26 0.64 0.1316 0.06 0.07 0.21 4.22 1.09 0.07 0.06 0.22 0.67 0.15 0.07 0.06 0.22 0.50 0.1117 0.05 0.07 0.24 4.60 1.17 0.07 0.08 0.26 0.71 0.15 0.06 0.08 0.25 0.59 0.1118 0.04 0.07 0.29 6.80 1.73 0.06 0.08 0.35 1.38 0.28 0.05 0.08 0.33 1.36 0.2719 0.06 0.07 0.16 2.19 0.55 0.07 0.06 0.17 0.60 0.15 0.07 0.06 0.17 0.59 0.1620 0.05 0.09 0.20 3.76 0.94 0.06 0.08 0.22 0.54 0.13 0.05 0.09 0.22 0.47 0.1021 0.04 0.08 0.18 1.94 0.49 0.05 0.07 0.17 0.58 0.16 0.05 0.07 0.17 0.54 0.1422 0.05 0.09 0.18 1.20 0.33 0.05 0.08 0.17 0.83 0.24 0.05 0.08 0.17 0.78 0.2223 0.05 0.08 0.22 2.96 0.77 0.05 0.08 0.20 0.55 0.13 0.05 0.08 0.19 0.48 0.1224 0.04 0.07 0.22 5.55 1.40 0.04 0.09 0.30 0.81 0.15 0.04 0.09 0.29 0.75 0.1225 0.06 0.11 0.31 3.44 0.90 0.05 0.11 0.31 0.70 0.13 0.06 0.11 0.30 0.45 0.0826 0.07 0.07 0.17 1.26 0.32 0.07 0.07 0.17 0.73 0.19 0.07 0.07 0.17 0.75 0.2027 0.05 0.10 0.31 3.35 0.87 0.04 0.10 0.31 0.58 0.12 0.04 0.10 0.30 0.52 0.11
Mean 0.05 0.08 0.22 3.37 0.87 0.06 0.08 0.23 0.68 0.16 0.06 0.08 0.23 0.59 0.14
Method 1 Method 2 Method 3
top related