modeling guessing properties of multiple-choice … guessing properties of multiple-choice items in...
TRANSCRIPT
Modeling Guessing Properties of Multiple-Choice Items in
the Measurement of Political Knowledge∗
Tsung-han Tsai† Chang-chih Lin‡
This Draft: December 18, 2014
Abstract
Due to the crucial role of political knowledge in political participation, the measurement of
political knowledge has been a major concern in political science. The conventional measure
of political knowledge is to sum a respondent’s correct answers on a battery of questions,
including open-ended and multiple-choice items. In this paper, we propose a Bayesian IRT
guessing model that accommodates different types of guessing behaviors for multiple-choice
items and address the issue regarding the difference between incorrect responses and DKs in
both closed-ended and open-ended items. The proposed model successfully describes guessing
behaviors based on respondents’ levels of political knowledge and item characteristics in
Taiwan.
keywords: political knowledge, guessing behavior, don’t know response, measurement, item
response theory, Bayesian methods
∗Paper prepared for presentation at the 2015 Asian Political Methodology Conference in Taipei, Taiwan, January9-10, 2015. The previous version of this manuscript was presented at the 2014 Annual Meeting of the AmericanPolitical Science Association in Washington, D.C., August 28-31, 2014.†Assistant Professor of Political Science, National Chengchi University, Taipei, Taiwan ([email protected]).‡Post-doctoral Researcher at the Election Study Center and Adjunct Assistant Professor of Political Science,
National Chengchi University, Taipei, Taiwan ([email protected]).
1 Introduction
Political knowledge is a central construct in theories of democracy, most of which suggest that
knowledgeable citizens are one of the components of well-functioning democracy. The logic behind
these theories is as follows: the more knowledgeable, informed a citizen is, the more prudent
decisions s/he makes in democratic processes (see. e.g., Campbell et al., 1960; Galston, 2001;
Lassen, 2005). To test these theories empirically, many efforts have been made to measure political
knowledge in studies of public opinion and political behavior (e.g., Luskin, 1987; Delli Carpini
and Keeter, 1996; Mondak, 1999). The conventional approach to measuring levels of political
knowledge is conducted by asking survey respondents a brief battery of factual questions about
politics and counting the number of correct answers. Common formats include closed-ended
(multiple-choice) items and open-ended identification items.
The selection of item formats involves two related issues: items that are likely to be guessed
correctly and the way to handle “don’t know” (hereafter, DK) responses in the measurement pro-
cedure. On closed-ended items, one primary issue is that respondents who have no knowledge at all
or those who have partial knowledge about the question are able to select an answer from provided
choice options. Failing to distinguish between “lucky guessing” or “blind guessing” (guessing the
correct answer randomly) and “informed guessing” (guessing the correct answer based on partial
knowledge) thus masks real levels of political knowledge. On the other hand, usually, open-ended
questions are more difficult than closed-format items for respondents to answer. Uninformed and
partially informed respondents who are uncertain of the answer to the question are likely to say
“don’t know.” This leads to not only a high proportion of DK responses but also poor discrimi-
nation between levels of political knowledge (Mondak, 1999, 2001; Mondak and Davis, 2001).
In this paper, we focus on multiple-choice items and distinguish between lucky guessing and
informed guessing appeared in the measurement of political knowledge. First, building on the
methods of item response theory (IRT), we propose an IRT model with a “guessing component”
that accommodates the chance of guessing the correct answer for a multiple-choice item based on a
respondent’s level of political knowledge and item difficulties. It is stressed that the proposed two-
1
parameter logistic (2PL) IRT guessing model identifies items likely to have guessing component.
Second, although we do not explicitly deal with the possible variation in DK responses, we treat
DKs as “missing values” rather than collapsing DKs and incorrect responses into the same category,
which reflects our ignorance of these DK respondents’ knowledge levels. Finally, a Bayesian
approach is adopted to estimate the proposed two-parameter logistic (2PL) IRT guessing model,
which offers flexibility for complex model specifications and the estimation of missing values along
with model parameters.
The proposed model is applied to analyzing survey data conducted in 2012 from the Tai-
wan’s Election and Democratization Study (TEDS) project.1 The results show that the proposed
2PL-IRT guessing model accurately describes the characteristics of the political knowledge items
employed in Taiwan. In particular, the guessing property of these items corresponds to what we
expect for the public’s political knowledge in Taiwan. While the focus of this paper is on the
measurement of political knowledge, the model proposed here applies to the same item format for
measuring other types of knowledge in surveys.
The remainder of this paper proceeds as follows. Section 2 reviews the literature on the
measurement of political knowledge and Section 3 discusses the proposed IRT model. Section
4 introduces the analyzed data and measures and presents the results of analysis. Section 5
concludes.
2 Measurement of Political Knowledge
In most public opinion and survey research in political science, the common procedure to mea-
sure political knowledge is that the respondents are asked a series of questions about politics.
These questions usually ask a respondent’s awareness and cognitions of “textbooks facts,” “cur-
1Data analyzed in this paper were from Taiwan’s Election and Democratization Studies, 2012: Presidentialand Legislative Elections (TEDS2012) (NSC 100-2420-H002-030). The coordinator of multi-year project TEDS isProfessor Chi Huang (National Chengchi University). TEDS2012 is a yearly project on the election of presidencyand legislators in 2012. The principal investigator is Professor Yun-han Chu. More information is on TEDS website(http://www.tedsnet.org) and to investigating guessing behaviors in the measurement of political knowledge. Theauthors appreciate the assistance in providing data by the institute and individuals aforementioned. The authorsare alone responsible for views expressed herein.
2
rent events,” and “historical facts” (Delli Carpini and Keeter, 1991, 1993). The responses to
political knowledge questions fall into three categories: correct answers, incorrect answers, and
don’t knows (DKs). Conventionally, DKs are treated as incorrect responses and then the levels
of knowledge are constructed by counting the number of correct answers. The procedure for the
measurement of political knowledge involves two related issues. One is that item formats are
closely associated with guessing. The other is about the way to deal with DKs.
Common formats for knowledge items in survey research mainly include open-ended, closed-
ended, and true-false (yes-no) items (Mondak, 2001; Mondak and Canache, 2004). Items with
different formats, in general, impose different demands on the respondents (Tourangeau, Rips
and Rasinski, 2000, p. 35-38). On open-ended items, first, respondents are asked to submit the
answers about political events and facts without any other information or hints. Second, closed-
ended questions, usually shown as two- to four-category multiple-choice items, list some choice
options and ask respondents to choose the correct one among the options. Finally, on true-false
items, which also can be regarded as two-category multiple-choice items, respondents are asked to
judge the statements true or false.
Different formats of items involving in different opportunities to guess truly affect the validity
and reliability of the measurement of political knowledge. Some respondents who have no idea at
all of the correct answer may guess, whereas others tend to choose DK (Mondak, 2001). On closed-
ended items, respondents will have incentives to guess one answer from given options because of
a non-zero probability hitting upon the right answer, even if they are completely uninformed.
Additionally, for respondents likely somewhat sure, but not certain, that they know the correct
answer, some of them will choose DK and others will try to guess and offer substantive answers.
Compared with closed-ended items, open-ended ones tend to increase the load on respondents’
working memory and need them to be more knowledgeable to correctly answer the questions,
which are relatively immune to blind guessing (Mondak and Davis, 2001). Thus, closed-ended
items usually have higher percentages of correct responses than open-ended ones, which in turn
leads to different knowledge scores (Luskin and Bullock, 2011).
3
Moreover, it is argued that the measurement of political knowledge considerably associates
with respondents’ tendency to guess, especially in closed-ended items (Mondak, 2001; Mondak and
Davis, 2001). In general, respondents with identical knowledge levels will receive different scores
if they differ in their tendency to guess the answer. For the respondents with strong tendency
to guess rather than giving DKs, their “observed knowledge” is going to be overestimated and
usually higher than their “actual knowledge” because they correctly answer some questions just
by chance rather than by their actual knowledge. In other words, knowledge scores not only reflect
actual knowledge but also propensity to guess.
To diminish the contamination effect of guessing propensity to the measurement of knowledge
level, Delli Carpini and Keeter (1996, p. 305) advocate that the DK option should be encouraged
to increase the validity of measurement in the survey.2 It will somewhat reduce the tendency for
respondents to guess, especially for the completely uninformed or partial informed ones. However,
Mondak (1999, 2001) argues that respondents’ propensity to guess is not completely eliminated by
encouraging DK options, which threatens the validity of knowledge measures. Uninformed respon-
dents who guess will receive overestimated knowledge scores while partially informed respondents
who resist to guessing will answer DK, leading to underestimated knowledge. Rather, one should
discourage DK responses in surveys because propensity to guess is eliminated if all respondents
are forced to choose an answer despite their tendency to guess and, thus, knowledge scores reflect
only one systematic factor, actual knowledge levels (Mondak, 2001; Mondak and Davis, 2001).
When DK responses are discouraged, these DKs have only a few proportions of responses and
suggest a firm resistance to guessing. To further eliminate DK responses, one can do a simple post
hoc correction by randomly assigning DKs to the substantive choice categories (Mondak, 1999,
2001). However, some research contends that Mondak’s forced choice format works only if knowl-
edge lies hidden within DK responses in surveys and, more importantly, shows that discouraging
DKs does not reveal a substantially more knowledgable public (Luskin and Bullock, 2011; Sturgis,
Allum and Smith, 2008). In other words, when survey respondents who initially answer DKs on
2In that way, survey interviewers will give respondents some prompts like that “many people don’t know theanswers to these questions, so if there are some you don’t know just tell me and we’ll go on.”
4
political knowledge items are forced to provide a guess, they are unable to provide correct answers
at better than chance, which means that they really don’t know.
Although there is a disagreement on the way that DKs are randomly assigned to the substantive
choice options, more and more scholars argue that it is inappropriate to simply pool DKs and
incorrect responses together as a single absence-of-knowledge category (Gibson and Caldeira, 2009;
Mondak and Anderson, 2004; Mondak and Davis, 2001; Miller and Orr, 2008). Many respondents
who offer DKs may actually do know or may be momentarily unable to recall the right answer in
response to open-ended questions. They may also say they don’t know in closed-ended questions
despite having vague information about correct answers rather than being completely ignorant.
In other words, there may be variation among the levels of political knowledge for respondents
who say they don’t know.
According to the above discussion, there are two issues that need to be addressed when we
measure levels of political knowledge. First, as Mondak (2001) argues, when DKs are not encour-
aged, respondents who do not know or those who partially know the correct answer will guess,
especially for multiple-choice items. That is to say, there are two types of guessing component
needed to be identified: blind guessing and (partially) informed guessing. The latter occurs more
frequently on political knowledge questions than the former (Mondak and Davis, 2001). Second,
respondents who say they don’t know may actually have partial knowledge. Although this is
seen more often in open-ended (Mondak, 2001; Luskin and Bullock, 2011), it may also occur in
multiple-choice items. In the following, we propose a model to address the first issue and leave
the second one for future research.
3 Item Response Theory for Guessing Behaviors
Due to the crucial role of political knowledge in political participation, the measurement of political
knowledge has been a major concern in political science (Delli Carpini and Keeter, 1996; Luskin,
1987, 1990; Mondak, 1999). Typically, political knowledge is assumed to be an unobserved, latent
5
variable that can be measured by a number of manifest variables or indicators. These indicators
usually are questions asking a respondent’s awareness of officeholders and political systems. Con-
ventional measures of political knowledge is to sum a respondent’s correct answers on a battery
of questions. However, this approach, which implicitly assumes same difficulty levels for different
items, is problematic due to the omission of item properties from the measurement procedure.
A solution to the problem is utilizing methods of item response theory, which describe how
changes in trait level relate to changes in the probability of providing a correct answer to an
item (Embretson and Reise, 2000). IRT has received great attention from political scientist for
inference about latent variables (Jackman, 2008) and has been applied to a variety of studies
such as congressional roll-call data analysis (Clinton, Jackman and Rivers, 2004; Jackman, 2001),
levels of democracy (Treier and Jackman, 2008), decisions of the Supreme Court (Martin and
Quinn, 2002), and the invariant property of political knowledge scales (Pietryka and MacIntosh,
2013). Contrast to conventional IRT models, which cannot distinguish between blind guessing
and informed guessing, we propose an IRT guessing model to deal with uninformed and partially
informed guessing effects for multiple-choice items in this paper.
3.1 The IRT Guessing Model
Suppose that, for each item k = 1, · · · , K, respondent i = 1, · · · , N provides a response (yi,k),
which is either correct (yi,k = 1) or incorrect (yi,k = 0). We assume that these items measure
an unidimensional latent variable θi, i.e., political knowledge in our discussion. A two-parameter
logistic (2PL) item response model is
Pr(yi,k = 1|θi, αk, βk) = Λ[βk(θi − αk)]
=exp[βk(θi − αk)]
1 + exp[βk(θi − αk)], (1)
where Λ(·) denotes the logistic cumulative distribution function (cdf), αk item-difficulty param-
eters, βk item-discrimination parameters, and θi levels of political knowledge (Baker and Kim,
6
2004; Embretson and Reise, 2000; Lord and Novick, 1968). When βk are fixed to 1, Equation (1)
becomes the Rasch model (Rasch, 1960) of the form:
Pr(yi,k = 1|θi, αk, βk) = Λ(θi − αk)
=exp(θi − αk)
1 + exp(θi − αk). (2)
One key assumption involved in IRT models is so-called “local independence,” which means that
the responses are conditionally independent across items and subjects given the latent variable
and the item parameters.
The 2PL-IRT model of Equation (1) and/or the Rasch model of Equation (2) have been used
to assess the item properties of survey responses (e.g., Delli Carpini and Keeter, 1996; Jackman,
2000b). However, for the measurement of political knowledge, respondents who have no idea at
all or are uncertain of the correct response may guess, especially on multiple-choice items. Some
multiple-choice items are more likely to be guessed than others. Without considering the effects of
guessing, items that have correctly guessed responses appear relatively easier than they would be.
In other words, the values of the estimated item-difficulty parameters are smaller than they would
be. The presence of bias in the estimates of item parameters affects the accuracy of the estimates
of respondents’ latent trait levels. Since the difficult items are easier than they would be, the
knowledge level of informed respondents who are able to answer difficult items correctly will be
underestimated while that of uninformed and partially informed ones will be overestimated.
For the guessing property of an item, it has been observed that the lower tail of the empirical
item characteristic curve sometimes is asymptotic to a value greater than 0. A three-parameter
logistic (3PL) item response model which describes this asymptotic behavior is presented as follows:
Pr(yi,k = 1|θi, αk, βk, ck) = ck + (1− ck)Λ[βk(θi − αk)]
= ck + (1− ck)(
exp[βk(θi − αk)]1 + exp[βk(θi − αk)]
), (3)
7
where ck is the asymptotic probability of correct response for θ → −∞ (Birnbaum, 1968). In the
literature, ck is commonly referred to as the “guessing” parameter (Hambleton and Cook, 1977).
The guessing parameter ck indicates the probability of item success for the lowest trait level.
However, some research argues that ck should not be interpreted as a guessing parameter (Lord,
1968, 1970). Instead, ck should be considered as the lower bound for the item characteristic curve.
This becomes more clear when we rearrange Equation (2) as the following:
Pr(yi,k = 1|θi, αk, βk, ck) = Λ[βk(θi − αk)] + (1− Λ[βk(θi − αk)]) ck. (4)
Equation (4) shows that the probability of correct response to item k for person i results from
two components. The first component is the probability that a respondent works on the item to
find the correct answer based on his/her level of latent trait. This component is represented by
the first term on the right-hand side of Equation (4), which is the functional form of a standard
2PL-IRT model of Equation (1). The second component is that a respondent answers the item
correctly due to how likely the item is to be guessed, which is indicated by the second term on the
right in Equation (4). The second term indicates the probability of successful guess for an item,
which is the value of ck moderated by the probability of an incorrect response 1− Λ[βk(θi − αk)].
Furthermore, Equation (4) also shows that the greater the value of the first term, the smaller the
value of the second term and thus the smaller the impact of guessing on the item (Andrich, Marais
and Humphry, 2012, p. 427).
Considering the second term on the right in Equation (4) as the probability of guessing a
correct answer, some research works on its functional form to allow individual differences and/or
item properties in the guessing component. For example, San Mart́ın, Del Pino and De Boeck
(2006) propose an IRT model with individual differences in the guessing component, which is
formulated as follows:
Pr(yi,k = 1|θi, αk, γk, b) = Λ(θi − αk) + [1− Λ(θi − αk)]exp(bθi + γk)
1 + exp(bθi + γk), (5)
8
where b is the general weight of the trait level in the guessing component and γk is the “guessing
parameter” corresponding to a respondent with average trait level on the logistic scale.3
We extend the model presented in Equation (5) to include the item-discrimination parameters
βk and make a modification with a replacement of γk by −αk. Moreover, we allow the weighting
parameter of the trait level to vary across items. As a result, the proposed model, called 2PL-IRT
guessing model, is specified as
Pr(yi,k = 1|θi, αk, βk, bk) = Λ[βk(θi − αk)]
+ (1− Λ[βk(θi − αk)])(
exp(bkθi − αk)1 + (M − 1) exp(bkθi − αk)
), (6)
where M is the number of options for a multiple-choice item. Like Equation (5), the model
presented by Equation (6) has two terms contributing to the probability of item success. The first
term contributes to the success probability due to ability and the second term contributes to the
success probability due to guessing.
3.2 Properties of The Proposed Model
Some properties of the proposed 2PL-IRT guessing model, compared to the conventional 2PL-
and 3PL-IRT models, are discussed in this subsection. Following the literature on IRT, item
characteristic curves (ICCs) are displayed to describe how changes in trait level relate to changes
in the probability of a specified response. First, the inclusion of M−1 in the guessing component of
Equation (6) indicates that the asymptotic probability of successful random guessing for θ → −∞
is equal to 1/M , given that α = 0 and b = 0. For example, suppose that M = 4, that is, there are
four options for a multiple-choice item. A respondent with the lowest trait level has a probability
of 1/4 to randomly choose a correct response (b = 0) to an item with average difficulty (α = 0),
3Cao and Stokes (2008) work on the extension to the 2PL-IRT model to deal deal with guessing behaviors.Moreover, some research considers guessing, including random guessing and informed guessing, of a correct responseas a function of a person’s trait level relative to the difficulty of an item, rather than a property of an item, andworks on procedures for removing guessing in the estimation of item parameters and latent trait levels (Waller,1976, 1989; Andrich, Marais and Humphry, 2012; Andrich and Marais, 2014).
9
which is represented by the gray solid curve in the left panel of Figure 1.
Second, the weighting parameter b indicates how important the levels of ability is for the
probability of successful guess for an item. The left panel of Figure 1 shows that, as the value
of b increases, the probability of successful guessing for respondent with the lowest trait level
decreases and the density becomes more concentrated around the average trait level. In the
context of political knowledge, this property basically reflects the fact that both partially informed
(respondents with average level of knowledge) and uninformed respondents are more likely to
guess than informed ones, but partially informed respondents are more likely to have successful
guessing than uninformed ones. Respondents with low-level knowledge would make guesses with
low probability of being correct and the chance of guessing correct answer is lower than randomly
guessing because respondents with low-level knowledge are highly seducible by the attractive
distractors. Furthermore, the reason why informed respondents have relatively low probability
of successful guessing is that their high trait levels play the major role in contributing to the
probability of item success. Therefore, partially informed respondents have higher probability of
successful guessing than both uninformed and informed ones.
Third, related to the second property, respondents with lower trait level are considerably
affected by the weight of the trait level, b, in terms of the probability of item success. The right
panel of Figure 1 shows the ICCs for different values of b given that α = 0 and β = 1. As can be
seen, the probability of item success for respondents with lower trait level decreases when correct
guessing requires high ability levels, which is reflected by the increase in the value of b. Moreover,
no matter how large the b is, the ICCs suggest that the success probability due to guess always
contributes to the probability of item success for partially informed respondents, compared to the
ICC of a 2PL-IRT model displayed by the black solid curve, which has no guessing component.
In contrast, the success probability due to guess contributes less to the probability of item success
for informed respondents since respondents with high trait level can answer these items correctly.
Fourth, the gray solid curve in the right panel of Figure 1 is equal to the ICC of a 3PL-IRT
model with c = 0.25. In a conventional 3PL-IRT model, i.e., Equation (3), the lower asymptote
10
α=0, β=1
θ
Prob
. of S
ucce
ssfu
l Gue
ss
0.0
0.5
1.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
b=0b=0.1b=0.5b=1b=2
α=0, β=1
θ
Prob
. of C
orre
ct Re
spon
se
0.0
0.5
1.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
b=0b=0.1b=0.5b=1b=22PL IRT
Figure 1: The probability of successful guess and that of correct response for four-category multiple-choice items. Item parameters α and β are fixed while b is varied.
β=1, b=0.5
θ
Prob
. of S
ucce
ssfu
l Gue
ss
0.0
0.5
1.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
α=-1α=0α=1α=2
β=1, b=0.5
θ
Prob
. of C
orre
ct Re
spon
se
0.0
0.5
1.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
α=-1α=0α=1α=22PL IRT
Figure 2: The probability of successful guess and that of correct response for four-category multiple-choice items. Item parameters b and β are fixed while α is varied.
is an unknown parameter to be estimated.4 More importantly, it implies that guessing behavior
is an item property that applies to all respondents despite differences in item characteristics and
levels of latent trait. In contrast, in the proposed model, the asymptotic probability is determined
4Theoretically, without additional constraints, the parameter c in a 3PL-IRT model of Equation (3) lies betweenthe interval 0 and 1 despite the number of options in a multiple-choice item. Empirically, estimates of c usually aresmaller than 1/M (Embretson and Reise, 2000, p. 73), which lacks an explanation why it is lower than the randomguessing probability for respondents with the lowest trait level.
11
by both item characteristics (M and α) and levels of latent trait (depending on the magnitude of
b), which makes the 3PL-IRT model as a special case of the proposed model.
Finally, the proposed model allows not only differences in trait levels but also those in item
difficulty levels in the guessing component. The left panel in Figure 2 shows how changes in
item difficulty levels relate to changes in probability of successful guess, which suggest that, as
items become more difficult, higher ability is required for respondents to have higher probability
of successful guess. The corresponding ICCs are shown in the right panel in Figure 2.
3.3 Prior Distributions and Identification
It is well known that item-response theory models suffer from two identification problems: scale
invariance and rotational invariance (see, e.g., Albert, 1992; Johnson and Albert, 1999). The
problem of scale invariance occurs because the metric (location and scale) of the latent trait is only
known up to a linear transformation. Therefore, one must anchor the metric of the latent traits.
Additionally, the problem of rotational invariance refers to the fact that, for the unidimensional
case, multiplying all of the model parameters by −1 would not change the value of the likelihood
function.
We estimate the model presented in Equation (6) by a Bayesian approach, so we complete
the model specification by defining the prior distribution. In the Bayesian context, the use of
informative prior distributions can resolve these two identification problems (Johnson and Albert,
1999; Martin and Quinn, 2002). First, in typical IRT models, latent variables are assumed to be
sampled from a normal distribution with mean 0 and finite variance σ2θ , that is,
θi ∼ N(0, σ2θ), for i = 1, · · · , N. (7)
To solve the problem of scale invariance, we can simply assume that σ2θ = 1 (Jackman, 2009,
p. 460).
12
Second, for item parameters βk, αk, and bk from Equation (6), we assume that
βk ∼ N(1, 2)I(βk > 0), for k = 1, · · · , K, (8)
αk ∼ N(0, σ2α), for k = 1, · · · , K, (9)
bk ∼ Gamma(1, 1), for k = 1, · · · , K, (10)
where I(·) denotes an indicator function and σ2α follows the conjugate inverse gamma prior
σ2α ∼ Inv-Gamma(0.01, 0.01). (11)
Notice that the rotation invariance problem is solved by restricting item-discrimination parameters
to be positive. This is because ICCs with positive item-discrimination parameters assume that
respondents answer test items correctly if they have higher ability. This form of constraint is some-
times known as ‘anchoring’ (Skrondal and Rabe-Hesketh, 2004, p. 66). The truncated normal prior
with mean 1 and variance 2 for β reflects the fact that item-discrimination parameters usually take
values between the interval 0.5 and 3 (Fox, 2010, p. 21). The hyper prior Inv-Gamma(0.01, 0.01)
for σ2α is set to present non-informativeness. We assume a gamma prior Gamma(c0, d0) for pa-
rameter bk since bk is nonnegative and we choose c0 = d0 = 1 to let the proposed model priorly
approximate to the 3PL-IRT model with parameter ck = 0.25. If the estimates of bk are different
from 0, it shows evidence supporting the proposed model against the 3PL-IRT model.
In the estimation process, DKs are treated as missing values rather than incorrect responses.
As discussed above, there may be variation among the levels of political knowledge for respondents
who say they don’t know and we do not arbitrarily consider these respondents as the public with
low levels of political knowledge. Instead, by treating DKs as missing values in a Bayesian setup,
we obtain the joint posterior of all random quantities in the model, including the levels of political
knowledge, the item parameters, and the partially observed item responses (DKs) via Bayesian
simulation (Jackman, 2000a,b).
13
4 Data and Analysis
The dataset we analyze is survey data conducted for Taiwan’s 2012 presidential and legislative
elections (TEDS2012) by Election Study Center of National Chengchi University. This dataset,
collected by face-to-face interview after the 2012 election, includes three open-ended items and four
multiple-choice items about political affairs to investigate Taiwanese public’s political knowledge.
These knowledge questions are DK-neutral items. In the survey process, interviewers are instructed
to accept but not offer the DK option in advance. In other words, interviewers neither encourage
nor discourage respondents to answer DK and just record their answers including DK responses.
Table 1 shows the seven knowledge items used in our analysis and corresponding distributions
of responses. Based on their distributions, these items can be roughly classified into three groups:
(1) Item 1, 2, and 6; (2) Item 3, 4, and 5; and (3) Item 7. As can be seen, first, Item 1, 2, and
6 are relatively easy for survey respondents to answer. The percentages of correct responses for
these three items are 75.85%, 63.14%, and 87.02%, respectively. Further, compared Item 1 with
Item 6, we do observe a higher percentage of DKs in the open-ended item (Item 1) than in the
multiple-choice one (Item 6), which implies guessing behaviors in Item 6.
Second, the percentages of correct responses for Item 3, 4, and 5 are 28.81%, 34.56%, and
33.68%, respectively. These items are a little bit hard for the public in Taiwan to answer. In other
words, respondents require some extent of political knowledge to get these items correct. Table 1
shows that one third of the respondents choose the DK option in the two closed-ended items (Item
4 and 5), which suggests that these respondents likely do not know the answers. Furthermore,
compared Item 3 with Item 4 and 5, the proportion of DKs is a little higher in the open-ended
item (Item 3) than those in the closed-ended items (Item 4 and 5). This result implies that some
partially informed respondents choose DK in Item 3, but may guess the answers in Item 4 and 5.
Finally, as shown in Table 1, there are only 18.67% of the respondents correctly answer Item
7, which implies that Item 7 is a much harder question among these seven items. This highly
difficult item also results in high percentage (53.89%) of DKs. Notice that incorrect responses are
higher than correct ones, which is the evidence of guessing behaviors. It is reasonable to assume
14
Table 1: The Percentage of Responses to Knowledge Items in TEDS 2012
Political Knowledge Items Correct Incorrect “Don’t Know”
1. Who is the current president75.85 2.14 22.01
of the United States?2. Who is the current premier
63.14 8.87 27.99of our country?
3. What institution has the power28.81 27.76 43.43
to interpret the constitution?4. Which of these persons was the finance
34.56 31.98 33.46minister before the recent election?
5. What was the current unemployment rate33.68 33.30 33.02
in Taiwan as of the end of last year (2011)?6. Which party came in second in seat
87.02 3.07 9.91in the Legislative Yuan?
7. Who is the current Secretary-General18.67 27.44 53.89
of the United Nations?
Note: The first three knowledge items are open-ended while the other four are multiple-choice items.
that some partially informed respondents may choose the wrong answer since it requires higher
levels of political knowledge to guess the correct one.
4.1 Results of Analysis
We apply the proposed model to analyzing the TEDS2012 dataset. The model is estimated via
Markov chain Monte Carlo (MCMC) methods (Albert and Chib, 1993; Fox, 2010) implemented in
JAGS 3.3.0 (Plummer, 2003) called from R version 3.0.2 (R2jags, Su and Yajima, 2012).5 Before
we investigate guessing behaviors in the seven knowledge items, we compare the estimates of item
parameters from three models: the 2PL-IRT model represented in Equation (1), the 3PL-IRT
model represented in Equation (2), and the proposed 2PL-IRT guessing model represented in
Equation (5).
Figure 3 shows 90% the credible intervals of item-difficulty parameters on the top and item-
5The estimation was performed with three parallel chains of 50,000 iterations each to be conservative. The firsthalf of the iterations were discarded as a burn-in period and 10 as thinning and thus 7,500 samples were generated.The convergence of MCMC chains is conducted and there is no evidence of non-convergence in these chains.
15
Item Difficulty
-5 -4 -3 -2 -1 0 1 2
-5 -4 -3 -2 -1 0 1 2
Item 3
Item 2
Item 1
2pl3pl2pl-guessing
Item Difficulty
-5 -4 -3 -2 -1 0 1 2
-5 -4 -3 -2 -1 0 1 2
item 7
Item 6
Item 5
Item 4
2pl3pl2pl-guessing
Item Discrimination
0 1 2 3 4
0 1 2 3 4
Item 3
Item 2
Item 1
2pl3pl2pl-guessing
Item Discrimination
0 1 2 3 4
0 1 2 3 4
item 7
Item 6
Item 5
Item 4
2pl3pl2pl-guessing
Figure 3: The 90% credible intervals of item parameters for 2PL-IRT, 3PL-IRT, and 2PL-IRT guessingmodels.
discrimination parameters on the bottom. As can be seen in Figure 3, the top two panels show that
Item 1, 2, and 6 have lower values of item difficulty among these items, which indicates that they
are relatively easy questions. On the contrary, Item 3, 4, 5, and 7 are relatively hard questions.
Notice that, although the three models provide almost equivalent estimates of item difficulty for
open-ended items, the 3PL-IRT model and the 2PL-IRT guessing model have larger values of
item-difficulty parameters for multiple-choice items, particularly Item 4 and 5. This result reflects
the fact that some of the multiple-choice items are actually harder if we take possible guessed
responses into account.
The estimates of item-discrimination parameters are shown in the bottom of Figure 3. The
bottom two panels show that Item 1 and 4 perform well in discriminating different levels of political
16
3PL Model
θ
Prob
. of S
ucce
ssful
Gue
ss
0.0
0.5
1.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Item 4 α=0.263, β=2.413, b=0.154Item 5 α=0.995, β=1.441, b=0.294Item 6 α=-2.031, β=1.879, b=0.512Item 7 α=0.977, β=1.863, b=0.176
3PL Model
θ
Prob
. of C
orrec
t Res
pons
e
0.0
0.5
1.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Item 4 α=0.263, β=2.413, b=0.154Item 5 α=0.995, β=1.441, b=0.294Item 6 α=-2.031, β=1.879, b=0.512Item 7 α=0.977, β=1.863, b=0.176
2PL Guessing Model
θ
Prob
. of S
ucce
ssful
Gue
ss
0.0
0.5
1.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Item 4 α=0.353, β=2.247, b=1.067Item 5 α=0.934, β=0.574, b=0.984Item 6 α=-2.709, β=1.504, b=0.923Item 7 α=1.065, β=1.473, b=0.636
2PL Guessing Model
θ
Prob
. of C
orrec
t Res
pons
e
0.0
0.5
1.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Item 4 α=0.353, β=2.247, b=1.067Item 5 α=0.934, β=0.574, b=0.984Item 6 α=-2.709, β=1.504, b=0.923Item 7 α=1.065, β=1.473, b=0.636
Figure 4: The probabilities of guessing and item success for the 3PL-IRT model and for the 2PL-IRTguessing model.
knowledge. It is reasonable to state that in Taiwan Item 1 can discriminate respondents with low
knowledge level from those with average knowledge level while Item 4 distinguishes respondents
with average knowledge level from those with relatively high knowledge level.
Figure 4 displays the probability of successful guessing and the probability of correct response
across different levels of political knowledge from both the 3PL-IRT model and the proposed 2PL-
IRT model. The illustration of blind and informed guessing is quite obvious in Figure 4. The
top two panels suggest that uninformed respondents have higher probability to correctly guess the
answer for multiple-choice items, compared to partially informed and fully informed respondents.
In contrast, the bottom two panels in Figure 4 show informed guessing for items that requires
relatively high levels of political knowledge for respondents to guess the correct answers. The
17
results from the 2PL-IRT guessing model are considerably consistent with the conventional wisdom
about the status and amounts of Taiwanese information-obtaining channels which associated with
their levels of political knowledge. On the one hand, Item 7 involves international affairs, which are
relatively limited information in Taiwan’s traditional and electronic media. Therefore, respondents
have to be relatively knowledgable to correctly guess or answer Item 7.
On the other hand, Item 4, 5, and 6 are questions about domestic politics. Item 6 asks
respondents which party came in the second in seats in the Legislative Yuan. Information about
party politics like this is more common in media reports and, therefore, citizens are more aware
of party competition in both electoral and legislative arenas. So respondents with low knowledge
levels are likely to guess the correct answer. Although Items 4 and 5 involve the fact of Taiwan’s
politics, the public in Taiwan is less aware of the finance minister and the unemployment rate
in 2011. The finance minister is relatively unknown than the prime minister due to limited
media exposure. With regard to Item 5, the public may have partial information from the media
to understand domestic economic situation but not necessarily know the concrete figure of the
unemployment rate. Therefore, relatively high knowledge levels are required for respondents to
correctly guess or answer Item 4 and 5, but not for Item 6.
5 Concluding Remark
Blind guessing and informed guessing exist ubiquitously in educational testing and survey ques-
tionnaires. It is argued that item formats and the presence of DK optionoptions lead to the problem
of guessing. In this paper, we distinguish blind guessing and informed guessing in multiple-choice
items with DK options discouraged. The proposed Bayesian 2PL-IRT guessing model successfully
describes guessing properties of political knowledge items, which are based on respondents’ knowl-
edge levels and item characteristics, in Taiwan. In next stage, we will investigate whether this
model improves the estimates of political knowledge.
18
References
Albert, James H. 1992. “Bayesian Estimation of Normal Ogive Item Response Curves Using GibbsSampling.” Journal of Educational and Behavioral Statistics 17(3):251–269.
Albert, James H. and Siddhartha Chib. 1993. “Bayesian Analysis of Binary and PolychotomousResponse Data.” Journal of the American Statistical Association 88(442):669–679.
Andrich, David and Ida Marais. 2014. “Person Proficiency Estimates in the Dichotomous RaschModel When Random Guessing Is Removed From Difficulty Estimates of Multiple ChoiceItems.” Applied Psychological Measurement 38(6):432–449.
Andrich, David, Ida Marais and Stephen Humphry. 2012. “Using a Theorem by Andersen andthe Dichotomous Rasch Model to Assess the Presence of Random Guessing in Multiple ChoiceItems.” Journal of Educational and Behavioral Statistics 37(3):417–442.
Baker, Frank B. and Seock-Ho Kim. 2004. Item Response Theory: Parameter Estimation Tech-niques. 2 ed. Boca Raton, FL: Chapman & Hall/CRC.
Birnbaum, Allan. 1968. Some Latent Trait Models and Their Use in Inferring An Examinee’sAbility. In Statistical Theories of Mental Test Scores, ed. Frederic M. Lord and Melvin R.Novick. Reading, MA: Addison-Wesley pp. 397–479.
Campbell, Angus, Philip E. Converse, Warren E. Miller and Donald E Stokes. 1960. The AmericanVoter. New York: John Wiley.
Cao, Jing and S Lynne Stokes. 2008. “Bayesian IRT Guessing Models for Partial Guessing Be-haviors.” Psychometrika 73(2):209–230.
Clinton, Joshua D., Simon Jackman and Douglas Rivers. 2004. “The Statistical Analysis of RollCall Data.” American Political Science Review 98(2):355–70.
Delli Carpini, Michael X. and Scott Keeter. 1991. “Stability and Change in the US Public’sKnowledge of Politics.” Public Opinion Quarterly 55(4):583–612.
Delli Carpini, Michael X. and Scott Keeter. 1993. “Measuring Political Knowledge: Putting FirstThings First.” American Journal of Political Science 37(4):1179–1206.
Delli Carpini, Michael X. and Scott Keeter. 1996. What Americans Know About Politics and WhyIt Matters. New Haven, CT: Yale University Press.
Embretson, Susan E. and Steven P. Reise. 2000. Item Response Theory for Psychologists. Mahwah:Lawrence Erlbaum.
Fox, Jean-Paul. 2010. Bayesian Item Response Modeling: Theory and Applications. New York,NY: Springer.
Galston, William A. 2001. “Political Knowledge, Political Engagement, and Civic Education.”Annual Review of Political Science 4(1):217–234.
19
Gibson, James L. and Gregory A. Caldeira. 2009. “Knowing the Supreme Court? A Reconsider-ation of Public Ignorance of the High Court.” The Journal of Politics 71(2):429–441.
Hambleton, Ronald K. and Linda L. Cook. 1977. “Latent Trait Models and Their Use in theAnalysis of Education Test Data.” Journal of Educational Measurement 14(2):75–96.
Jackman, Simon. 2000a. “Estimation and Inference are Missing Data Problems: Unifying SocialScience Statistics via Bayesian Simulation.” Political Analysis 8(4):307–332.
Jackman, Simon. 2000b. “Estimation and Inference via Bayesian Simulation: An Introduction toMarkov chain Monte Carlo.” American Journal of Political Science 44(2):375–404.
Jackman, Simon. 2001. “Multidimensional Analysis of Roll Call Data via Bayesian Simulation:Identification, Estimation, Inference, and Model Checking.” Political Analysis 9(3):227.
Jackman, Simon. 2008. Measurement. In The Oxford Handbook of Political Methodology, ed. JanetBox-Steffensmeier, Henry Brady and David Collier. Oxford: Oxford University Press.
Jackman, Simon. 2009. Bayesian Analysis for the Social Sciences. Chichester, UK: Wiley.
Johnson, Valen E. and James H. Albert. 1999. Ordinal Data Modeling. New York, NY: Springer-Verlag.
Lassen, David Dreyer. 2005. “The Effect of Information on Voter Turnout: Evidence from ANatural Experiment.” American Journal of Political Science 49(1):103–118.
Lord, Frederic M. 1968. “An Analysis of the Verbal Scholastic Aptitude Test Using Birnbaum’sThree-Parameter Logistic Model.” Educational and Psychological Measurement 28(4):989–1020.
Lord, Frederic M. 1970. “Item Characteristic Curves Estimated without Knowledge of TheirMathematical Form–A Confrontation of Birnbaum’s Logistic Model.” Psychometrika 35(1):43–50.
Lord, Frederic M. and Melvin R. Novick. 1968. Statistical Theories of Mental Test Scores. Reading,MA: Addison-Wesley.
Luskin, Robert C. 1987. “Measuring Political Sophistication.” American Journal of PoliticalScience 31(4):856–899.
Luskin, Robert C. 1990. “Explaining Political Sophistication.” Political Behavior 12(4):331–361.
Luskin, Robert C. and John G. Bullock. 2011. “‘Don’t Know’ Means ‘Don’t Know’: DK Responsesand the Public’s Level of Political Knowledge.” Journal of Politics 73(2):547–557.
Martin, Andrew D. and Kevin M. Quinn. 2002. “Dynamic Ideal Point Estimation via Markovchain Monte Carlo for the US Supreme Court, 1953–1999.” Political Analysis 10(2):134–152.
Miller, Melissa K. and Shannon K. Orr. 2008. “Experimenting with A ”Third Way” in PoliticalKnowledge Estimation.” Public Opinion Quarterly 72(4):768–780.
20
Mondak, Jeffery J. 1999. “Reconsidering the Measurement of Political Knowledge.” PoliticalAnalysis 8(1):57–82.
Mondak, Jeffery J. 2001. “Developing Valid Knowledge Scales.” American Journal of PoliticalScience 45(1):224–238.
Mondak, Jeffery J. and Belinda Creel Davis. 2001. “Asked and Answered: Knowledge LevelsWhen We Will Not Take “Don’t Know” for an Answer.” Political Behavior 23(3):199–224.
Mondak, Jeffery J. and Damarys Canache. 2004. “Knowledge Variables in Cross-National SocialInquiry.” Social Science Quarterly 85(3):539–558.
Mondak, Jeffery J. and Mary R. Anderson. 2004. “The Knowledge Gap: A Reexamination ofGender-Based Differences in Political Knowledge.” Journal of Politics 66(2):492–512.
Pietryka, Matthew T. and Randall C. MacIntosh. 2013. “An Analysis of ANES Items and TheirUse in the Construction of Political Knowledge Scales.” Political Analysis 21(4):407–429.
Plummer, Martyn. 2003. JAGS: A Program for Analysis of Bayesian Graphical Models UsingGibbs Sampling. In Proceedings of the 3rd International Workshop on Distributed StatisticalComputing. pp. 20–22.
Rasch, Georg. 1960. Probabilistic Models for Some Intelligence and Attainment Tests. The DanishInstitute for Educational Research.
San Mart́ın, Ernesto, Guido Del Pino and Paul De Boeck. 2006. “IRT Models for Ability-BasedGuessing.” Applied Psychological Measurement 30(3):183–203.
Skrondal, Anders and Sophia Rabe-Hesketh. 2004. Generalized Latent Variable Modeling: Multi-level, Longitudinal, and Structural Equation Models. Boca Raton, FL: Chapman & Hall/CRC.
Sturgis, Patrick, Nick Allum and Patten Smith. 2008. “An Experiment on the Measurement ofPolitical Knowledge in Surveys.” Public Opinion Quarterly 72(1):90–102.
Su, Yu-Sung and Masanao Yajima. 2012. R2jags ver. 0.03-08 (R package).URL: http://cran.r-project.org/web/packages/R2jags/
Tourangeau, Roger, Lance J Rips and Kenneth Rasinski. 2000. The Psychology of Survey Response.Cambridge, UK: Cambridge University Press.
Treier, Shawn and Simon Jackman. 2008. “Democracy as a Latent Variable.” American Journalof Political Science 52(1):201–217.
Waller, Michael I. 1976. Estimating Parameters in the Rasch Model: Removing the Effects ofRandom Guessing. Princeton, NJ: Educational Testing Service.
Waller, Michael I. 1989. “Modeling Guessing Behavior: A Comparison of Two IRT Models.”Applied Psychological Measurement 13(3):233–243.
21