modeling guessing properties of multiple-choice … guessing properties of multiple-choice items in...

Modeling Guessing Properties of Multiple-Choice Items in

the Measurement of Political Knowledge∗

Tsung-han Tsai† Chang-chih Lin‡

This Draft: December 18, 2014

Abstract

Due to the crucial role of political knowledge in political participation, the measurement of

political knowledge has been a major concern in political science. The conventional measure

of political knowledge is to sum a respondent’s correct answers on a battery of questions,

including open-ended and multiple-choice items. In this paper, we propose a Bayesian IRT

guessing model that accommodates different types of guessing behaviors for multiple-choice

items and address the issue regarding the difference between incorrect responses and DKs in

both closed-ended and open-ended items. The proposed model successfully describes guessing

behaviors based on respondents’ levels of political knowledge and item characteristics in

Taiwan.

keywords: political knowledge, guessing behavior, don’t know response, measurement, item

response theory, Bayesian methods

∗Paper prepared for presentation at the 2015 Asian Political Methodology Conference in Taipei, Taiwan, January9-10, 2015. The previous version of this manuscript was presented at the 2014 Annual Meeting of the AmericanPolitical Science Association in Washington, D.C., August 28-31, 2014.†Assistant Professor of Political Science, National Chengchi University, Taipei, Taiwan ([email protected]).‡Post-doctoral Researcher at the Election Study Center and Adjunct Assistant Professor of Political Science,

National Chengchi University, Taipei, Taiwan ([email protected]).

1 Introduction

Political knowledge is a central construct in theories of democracy, most of which suggest that

knowledgeable citizens are one of the components of well-functioning democracy. The logic behind

these theories is as follows: the more knowledgeable, informed a citizen is, the more prudent

decisions s/he makes in democratic processes (see. e.g., Campbell et al., 1960; Galston, 2001;

Lassen, 2005). To test these theories empirically, many efforts have been made to measure political

knowledge in studies of public opinion and political behavior (e.g., Luskin, 1987; Delli Carpini

and Keeter, 1996; Mondak, 1999). The conventional approach to measuring levels of political

knowledge is conducted by asking survey respondents a brief battery of factual questions about

politics and counting the number of correct answers. Common formats include closed-ended

(multiple-choice) items and open-ended identification items.

The selection of item formats involves two related issues: items that are likely to be guessed

correctly and the way to handle “don’t know” (hereafter, DK) responses in the measurement pro-

cedure. On closed-ended items, one primary issue is that respondents who have no knowledge at all

or those who have partial knowledge about the question are able to select an answer from provided

choice options. Failing to distinguish between “lucky guessing” or “blind guessing” (guessing the

correct answer randomly) and “informed guessing” (guessing the correct answer based on partial

knowledge) thus masks real levels of political knowledge. On the other hand, usually, open-ended

questions are more difficult than closed-format items for respondents to answer. Uninformed and

partially informed respondents who are uncertain of the answer to the question are likely to say

“don’t know.” This leads to not only a high proportion of DK responses but also poor discrimi-

nation between levels of political knowledge (Mondak, 1999, 2001; Mondak and Davis, 2001).

In this paper, we focus on multiple-choice items and distinguish between lucky guessing and

informed guessing appeared in the measurement of political knowledge. First, building on the

methods of item response theory (IRT), we propose an IRT model with a “guessing component”

that accommodates the chance of guessing the correct answer for a multiple-choice item based on a

respondent’s level of political knowledge and item difficulties. It is stressed that the proposed two-

1

parameter logistic (2PL) IRT guessing model identifies items likely to have guessing component.

Second, although we do not explicitly deal with the possible variation in DK responses, we treat

DKs as “missing values” rather than collapsing DKs and incorrect responses into the same category,

which reflects our ignorance of these DK respondents’ knowledge levels. Finally, a Bayesian

approach is adopted to estimate the proposed two-parameter logistic (2PL) IRT guessing model,

which offers flexibility for complex model specifications and the estimation of missing values along

with model parameters.

The proposed model is applied to analyzing survey data conducted in 2012 from the Tai-

wan’s Election and Democratization Study (TEDS) project.1 The results show that the proposed

2PL-IRT guessing model accurately describes the characteristics of the political knowledge items

employed in Taiwan. In particular, the guessing property of these items corresponds to what we

expect for the public’s political knowledge in Taiwan. While the focus of this paper is on the

measurement of political knowledge, the model proposed here applies to the same item format for

measuring other types of knowledge in surveys.

The remainder of this paper proceeds as follows. Section 2 reviews the literature on the

measurement of political knowledge and Section 3 discusses the proposed IRT model. Section

4 introduces the analyzed data and measures and presents the results of analysis. Section 5

concludes.

2 Measurement of Political Knowledge

In most public opinion and survey research in political science, the common procedure to mea-

sure political knowledge is that the respondents are asked a series of questions about politics.

These questions usually ask a respondent’s awareness and cognitions of “textbooks facts,” “cur-

1Data analyzed in this paper were from Taiwan’s Election and Democratization Studies, 2012: Presidentialand Legislative Elections (TEDS2012) (NSC 100-2420-H002-030). The coordinator of multi-year project TEDS isProfessor Chi Huang (National Chengchi University). TEDS2012 is a yearly project on the election of presidencyand legislators in 2012. The principal investigator is Professor Yun-han Chu. More information is on TEDS website(http://www.tedsnet.org) and to investigating guessing behaviors in the measurement of political knowledge. Theauthors appreciate the assistance in providing data by the institute and individuals aforementioned. The authorsare alone responsible for views expressed herein.

2

rent events,” and “historical facts” (Delli Carpini and Keeter, 1991, 1993). The responses to

political knowledge questions fall into three categories: correct answers, incorrect answers, and

don’t knows (DKs). Conventionally, DKs are treated as incorrect responses and then the levels

of knowledge are constructed by counting the number of correct answers. The procedure for the

measurement of political knowledge involves two related issues. One is that item formats are

closely associated with guessing. The other is about the way to deal with DKs.

Common formats for knowledge items in survey research mainly include open-ended, closed-

ended, and true-false (yes-no) items (Mondak, 2001; Mondak and Canache, 2004). Items with

different formats, in general, impose different demands on the respondents (Tourangeau, Rips

and Rasinski, 2000, p. 35-38). On open-ended items, first, respondents are asked to submit the

answers about political events and facts without any other information or hints. Second, closed-

ended questions, usually shown as two- to four-category multiple-choice items, list some choice

options and ask respondents to choose the correct one among the options. Finally, on true-false

items, which also can be regarded as two-category multiple-choice items, respondents are asked to

judge the statements true or false.

Different formats of items involving in different opportunities to guess truly affect the validity

and reliability of the measurement of political knowledge. Some respondents who have no idea at

all of the correct answer may guess, whereas others tend to choose DK (Mondak, 2001). On closed-

ended items, respondents will have incentives to guess one answer from given options because of

a non-zero probability hitting upon the right answer, even if they are completely uninformed.

Additionally, for respondents likely somewhat sure, but not certain, that they know the correct

answer, some of them will choose DK and others will try to guess and offer substantive answers.

Compared with closed-ended items, open-ended ones tend to increase the load on respondents’

working memory and need them to be more knowledgeable to correctly answer the questions,

which are relatively immune to blind guessing (Mondak and Davis, 2001). Thus, closed-ended

items usually have higher percentages of correct responses than open-ended ones, which in turn

leads to different knowledge scores (Luskin and Bullock, 2011).

3

Moreover, it is argued that the measurement of political knowledge considerably associates

with respondents’ tendency to guess, especially in closed-ended items (Mondak, 2001; Mondak and

Davis, 2001). In general, respondents with identical knowledge levels will receive different scores

if they differ in their tendency to guess the answer. For the respondents with strong tendency

to guess rather than giving DKs, their “observed knowledge” is going to be overestimated and

usually higher than their “actual knowledge” because they correctly answer some questions just

by chance rather than by their actual knowledge. In other words, knowledge scores not only reflect

actual knowledge but also propensity to guess.

To diminish the contamination effect of guessing propensity to the measurement of knowledge

level, Delli Carpini and Keeter (1996, p. 305) advocate that the DK option should be encouraged

to increase the validity of measurement in the survey.2 It will somewhat reduce the tendency for

respondents to guess, especially for the completely uninformed or partial informed ones. However,

Mondak (1999, 2001) argues that respondents’ propensity to guess is not completely eliminated by

encouraging DK options, which threatens the validity of knowledge measures. Uninformed respon-

dents who guess will receive overestimated knowledge scores while partially informed respondents

who resist to guessing will answer DK, leading to underestimated knowledge. Rather, one should

discourage DK responses in surveys because propensity to guess is eliminated if all respondents

are forced to choose an answer despite their tendency to guess and, thus, knowledge scores reflect

only one systematic factor, actual knowledge levels (Mondak, 2001; Mondak and Davis, 2001).

When DK responses are discouraged, these DKs have only a few proportions of responses and

suggest a firm resistance to guessing. To further eliminate DK responses, one can do a simple post

hoc correction by randomly assigning DKs to the substantive choice categories (Mondak, 1999,

2001). However, some research contends that Mondak’s forced choice format works only if knowl-

edge lies hidden within DK responses in surveys and, more importantly, shows that discouraging

DKs does not reveal a substantially more knowledgable public (Luskin and Bullock, 2011; Sturgis,

Allum and Smith, 2008). In other words, when survey respondents who initially answer DKs on

2In that way, survey interviewers will give respondents some prompts like that “many people don’t know theanswers to these questions, so if there are some you don’t know just tell me and we’ll go on.”

4

political knowledge items are forced to provide a guess, they are unable to provide correct answers

at better than chance, which means that they really don’t know.

Although there is a disagreement on the way that DKs are randomly assigned to the substantive

choice options, more and more scholars argue that it is inappropriate to simply pool DKs and

incorrect responses together as a single absence-of-knowledge category (Gibson and Caldeira, 2009;

Mondak and Anderson, 2004; Mondak and Davis, 2001; Miller and Orr, 2008). Many respondents

who offer DKs may actually do know or may be momentarily unable to recall the right answer in

response to open-ended questions. They may also say they don’t know in closed-ended questions

despite having vague information about correct answers rather than being completely ignorant.

In other words, there may be variation among the levels of political knowledge for respondents

who say they don’t know.

According to the above discussion, there are two issues that need to be addressed when we

measure levels of political knowledge. First, as Mondak (2001) argues, when DKs are not encour-

aged, respondents who do not know or those who partially know the correct answer will guess,

especially for multiple-choice items. That is to say, there are two types of guessing component

needed to be identified: blind guessing and (partially) informed guessing. The latter occurs more

frequently on political knowledge questions than the former (Mondak and Davis, 2001). Second,

respondents who say they don’t know may actually have partial knowledge. Although this is

seen more often in open-ended (Mondak, 2001; Luskin and Bullock, 2011), it may also occur in

multiple-choice items. In the following, we propose a model to address the first issue and leave

the second one for future research.

3 Item Response Theory for Guessing Behaviors

Due to the crucial role of political knowledge in political participation, the measurement of political

knowledge has been a major concern in political science (Delli Carpini and Keeter, 1996; Luskin,

1987, 1990; Mondak, 1999). Typically, political knowledge is assumed to be an unobserved, latent

5

variable that can be measured by a number of manifest variables or indicators. These indicators

usually are questions asking a respondent’s awareness of officeholders and political systems. Con-

ventional measures of political knowledge is to sum a respondent’s correct answers on a battery

of questions. However, this approach, which implicitly assumes same difficulty levels for different

items, is problematic due to the omission of item properties from the measurement procedure.

A solution to the problem is utilizing methods of item response theory, which describe how

changes in trait level relate to changes in the probability of providing a correct answer to an

item (Embretson and Reise, 2000). IRT has received great attention from political scientist for

inference about latent variables (Jackman, 2008) and has been applied to a variety of studies

such as congressional roll-call data analysis (Clinton, Jackman and Rivers, 2004; Jackman, 2001),

levels of democracy (Treier and Jackman, 2008), decisions of the Supreme Court (Martin and

Quinn, 2002), and the invariant property of political knowledge scales (Pietryka and MacIntosh,

2013). Contrast to conventional IRT models, which cannot distinguish between blind guessing

and informed guessing, we propose an IRT guessing model to deal with uninformed and partially

informed guessing effects for multiple-choice items in this paper.

3.1 The IRT Guessing Model

Suppose that, for each item k = 1, · · · , K, respondent i = 1, · · · , N provides a response (yi,k),

which is either correct (yi,k = 1) or incorrect (yi,k = 0). We assume that these items measure

an unidimensional latent variable θi, i.e., political knowledge in our discussion. A two-parameter

logistic (2PL) item response model is

Pr(yi,k = 1|θi, αk, βk) = Λ[βk(θi − αk)]

=exp[βk(θi − αk)]

1 + exp[βk(θi − αk)], (1)

where Λ(·) denotes the logistic cumulative distribution function (cdf), αk item-difficulty param-

eters, βk item-discrimination parameters, and θi levels of political knowledge (Baker and Kim,

6

2004; Embretson and Reise, 2000; Lord and Novick, 1968). When βk are fixed to 1, Equation (1)

becomes the Rasch model (Rasch, 1960) of the form:

Pr(yi,k = 1|θi, αk, βk) = Λ(θi − αk)

=exp(θi − αk)

1 + exp(θi − αk). (2)

One key assumption involved in IRT models is so-called “local independence,” which means that

the responses are conditionally independent across items and subjects given the latent variable

and the item parameters.

The 2PL-IRT model of Equation (1) and/or the Rasch model of Equation (2) have been used

to assess the item properties of survey responses (e.g., Delli Carpini and Keeter, 1996; Jackman,

2000b). However, for the measurement of political knowledge, respondents who have no idea at

all or are uncertain of the correct response may guess, especially on multiple-choice items. Some

multiple-choice items are more likely to be guessed than others. Without considering the effects of

guessing, items that have correctly guessed responses appear relatively easier than they would be.

In other words, the values of the estimated item-difficulty parameters are smaller than they would

be. The presence of bias in the estimates of item parameters affects the accuracy of the estimates

of respondents’ latent trait levels. Since the difficult items are easier than they would be, the

knowledge level of informed respondents who are able to answer difficult items correctly will be

underestimated while that of uninformed and partially informed ones will be overestimated.

For the guessing property of an item, it has been observed that the lower tail of the empirical

item characteristic curve sometimes is asymptotic to a value greater than 0. A three-parameter

logistic (3PL) item response model which describes this asymptotic behavior is presented as follows:

Pr(yi,k = 1|θi, αk, βk, ck) = ck + (1− ck)Λ[βk(θi − αk)]

= ck + (1− ck)(

exp[βk(θi − αk)]1 + exp[βk(θi − αk)]

), (3)

7

where ck is the asymptotic probability of correct response for θ → −∞ (Birnbaum, 1968). In the

literature, ck is commonly referred to as the “guessing” parameter (Hambleton and Cook, 1977).

The guessing parameter ck indicates the probability of item success for the lowest trait level.

However, some research argues that ck should not be interpreted as a guessing parameter (Lord,

1968, 1970). Instead, ck should be considered as the lower bound for the item characteristic curve.

This becomes more clear when we rearrange Equation (2) as the following:

Pr(yi,k = 1|θi, αk, βk, ck) = Λ[βk(θi − αk)] + (1− Λ[βk(θi − αk)]) ck. (4)

Equation (4) shows that the probability of correct response to item k for person i results from

two components. The first component is the probability that a respondent works on the item to

find the correct answer based on his/her level of latent trait. This component is represented by

the first term on the right-hand side of Equation (4), which is the functional form of a standard

2PL-IRT model of Equation (1). The second component is that a respondent answers the item

correctly due to how likely the item is to be guessed, which is indicated by the second term on the

right in Equation (4). The second term indicates the probability of successful guess for an item,

which is the value of ck moderated by the probability of an incorrect response 1− Λ[βk(θi − αk)].

Furthermore, Equation (4) also shows that the greater the value of the first term, the smaller the

value of the second term and thus the smaller the impact of guessing on the item (Andrich, Marais

and Humphry, 2012, p. 427).

Considering the second term on the right in Equation (4) as the probability of guessing a

correct answer, some research works on its functional form to allow individual differences and/or

item properties in the guessing component. For example, San Mart́ın, Del Pino and De Boeck

(2006) propose an IRT model with individual differences in the guessing component, which is

formulated as follows:

Pr(yi,k = 1|θi, αk, γk, b) = Λ(θi − αk) + [1− Λ(θi − αk)]exp(bθi + γk)

1 + exp(bθi + γk), (5)

8

where b is the general weight of the trait level in the guessing component and γk is the “guessing

parameter” corresponding to a respondent with average trait level on the logistic scale.3

We extend the model presented in Equation (5) to include the item-discrimination parameters

βk and make a modification with a replacement of γk by −αk. Moreover, we allow the weighting

parameter of the trait level to vary across items. As a result, the proposed model, called 2PL-IRT

guessing model, is specified as

Pr(yi,k = 1|θi, αk, βk, bk) = Λ[βk(θi − αk)]

+ (1− Λ[βk(θi − αk)])(

exp(bkθi − αk)1 + (M − 1) exp(bkθi − αk)

), (6)

where M is the number of options for a multiple-choice item. Like Equation (5), the model

presented by Equation (6) has two terms contributing to the probability of item success. The first

term contributes to the success probability due to ability and the second term contributes to the

success probability due to guessing.

3.2 Properties of The Proposed Model

Some properties of the proposed 2PL-IRT guessing model, compared to the conventional 2PL-

and 3PL-IRT models, are discussed in this subsection. Following the literature on IRT, item

characteristic curves (ICCs) are displayed to describe how changes in trait level relate to changes

in the probability of a specified response. First, the inclusion of M−1 in the guessing component of

Equation (6) indicates that the asymptotic probability of successful random guessing for θ → −∞

is equal to 1/M , given that α = 0 and b = 0. For example, suppose that M = 4, that is, there are

four options for a multiple-choice item. A respondent with the lowest trait level has a probability

of 1/4 to randomly choose a correct response (b = 0) to an item with average difficulty (α = 0),

3Cao and Stokes (2008) work on the extension to the 2PL-IRT model to deal deal with guessing behaviors.Moreover, some research considers guessing, including random guessing and informed guessing, of a correct responseas a function of a person’s trait level relative to the difficulty of an item, rather than a property of an item, andworks on procedures for removing guessing in the estimation of item parameters and latent trait levels (Waller,1976, 1989; Andrich, Marais and Humphry, 2012; Andrich and Marais, 2014).

9

which is represented by the gray solid curve in the left panel of Figure 1.

Second, the weighting parameter b indicates how important the levels of ability is for the

probability of successful guess for an item. The left panel of Figure 1 shows that, as the value

of b increases, the probability of successful guessing for respondent with the lowest trait level

decreases and the density becomes more concentrated around the average trait level. In the

context of political knowledge, this property basically reflects the fact that both partially informed

(respondents with average level of knowledge) and uninformed respondents are more likely to

guess than informed ones, but partially informed respondents are more likely to have successful

guessing than uninformed ones. Respondents with low-level knowledge would make guesses with

low probability of being correct and the chance of guessing correct answer is lower than randomly

guessing because respondents with low-level knowledge are highly seducible by the attractive

distractors. Furthermore, the reason why informed respondents have relatively low probability

of successful guessing is that their high trait levels play the major role in contributing to the

probability of item success. Therefore, partially informed respondents have higher probability of

successful guessing than both uninformed and informed ones.

Third, related to the second property, respondents with lower trait level are considerably

affected by the weight of the trait level, b, in terms of the probability of item success. The right

panel of Figure 1 shows the ICCs for different values of b given that α = 0 and β = 1. As can be

seen, the probability of item success for respondents with lower trait level decreases when correct

guessing requires high ability levels, which is reflected by the increase in the value of b. Moreover,

no matter how large the b is, the ICCs suggest that the success probability due to guess always

contributes to the probability of item success for partially informed respondents, compared to the

ICC of a 2PL-IRT model displayed by the black solid curve, which has no guessing component.

In contrast, the success probability due to guess contributes less to the probability of item success

for informed respondents since respondents with high trait level can answer these items correctly.

Fourth, the gray solid curve in the right panel of Figure 1 is equal to the ICC of a 3PL-IRT

model with c = 0.25. In a conventional 3PL-IRT model, i.e., Equation (3), the lower asymptote

10

α=0, β=1

θ

Prob

. of S

ucce

ssfu

l Gue

ss

0.0

0.5

1.0

-5 -4 -3 -2 -1 0 1 2 3 4 5

b=0b=0.1b=0.5b=1b=2

α=0, β=1

θ

Prob

. of C

orre

ct Re

spon

se

0.0

0.5

1.0

-5 -4 -3 -2 -1 0 1 2 3 4 5

b=0b=0.1b=0.5b=1b=22PL IRT

Figure 1: The probability of successful guess and that of correct response for four-category multiple-choice items. Item parameters α and β are fixed while b is varied.

β=1, b=0.5

θ

Prob

. of S

ucce

ssfu

l Gue

ss

0.0

0.5

1.0

-5 -4 -3 -2 -1 0 1 2 3 4 5

α=-1α=0α=1α=2

β=1, b=0.5

θ

Prob

. of C

orre

ct Re

spon

se

0.0

0.5

1.0

-5 -4 -3 -2 -1 0 1 2 3 4 5

α=-1α=0α=1α=22PL IRT

Figure 2: The probability of successful guess and that of correct response for four-category multiple-choice items. Item parameters b and β are fixed while α is varied.

is an unknown parameter to be estimated.4 More importantly, it implies that guessing behavior

is an item property that applies to all respondents despite differences in item characteristics and

levels of latent trait. In contrast, in the proposed model, the asymptotic probability is determined

4Theoretically, without additional constraints, the parameter c in a 3PL-IRT model of Equation (3) lies betweenthe interval 0 and 1 despite the number of options in a multiple-choice item. Empirically, estimates of c usually aresmaller than 1/M (Embretson and Reise, 2000, p. 73), which lacks an explanation why it is lower than the randomguessing probability for respondents with the lowest trait level.

11

by both item characteristics (M and α) and levels of latent trait (depending on the magnitude of

b), which makes the 3PL-IRT model as a special case of the proposed model.

Finally, the proposed model allows not only differences in trait levels but also those in item

difficulty levels in the guessing component. The left panel in Figure 2 shows how changes in

item difficulty levels relate to changes in probability of successful guess, which suggest that, as

items become more difficult, higher ability is required for respondents to have higher probability

of successful guess. The corresponding ICCs are shown in the right panel in Figure 2.

3.3 Prior Distributions and Identification

It is well known that item-response theory models suffer from two identification problems: scale

invariance and rotational invariance (see, e.g., Albert, 1992; Johnson and Albert, 1999). The

problem of scale invariance occurs because the metric (location and scale) of the latent trait is only

known up to a linear transformation. Therefore, one must anchor the metric of the latent traits.

Additionally, the problem of rotational invariance refers to the fact that, for the unidimensional

case, multiplying all of the model parameters by −1 would not change the value of the likelihood

function.

We estimate the model presented in Equation (6) by a Bayesian approach, so we complete

the model specification by defining the prior distribution. In the Bayesian context, the use of

informative prior distributions can resolve these two identification problems (Johnson and Albert,

1999; Martin and Quinn, 2002). First, in typical IRT models, latent variables are assumed to be

sampled from a normal distribution with mean 0 and finite variance σ2θ , that is,

θi ∼ N(0, σ2θ), for i = 1, · · · , N. (7)

To solve the problem of scale invariance, we can simply assume that σ2θ = 1 (Jackman, 2009,

p. 460).

12

Second, for item parameters βk, αk, and bk from Equation (6), we assume that

βk ∼ N(1, 2)I(βk > 0), for k = 1, · · · , K, (8)

αk ∼ N(0, σ2α), for k = 1, · · · , K, (9)

bk ∼ Gamma(1, 1), for k = 1, · · · , K, (10)

where I(·) denotes an indicator function and σ2α follows the conjugate inverse gamma prior

σ2α ∼ Inv-Gamma(0.01, 0.01). (11)

Notice that the rotation invariance problem is solved by restricting item-discrimination parameters

to be positive. This is because ICCs with positive item-discrimination parameters assume that

respondents answer test items correctly if they have higher ability. This form of constraint is some-

times known as ‘anchoring’ (Skrondal and Rabe-Hesketh, 2004, p. 66). The truncated normal prior

with mean 1 and variance 2 for β reflects the fact that item-discrimination parameters usually take

values between the interval 0.5 and 3 (Fox, 2010, p. 21). The hyper prior Inv-Gamma(0.01, 0.01)

for σ2α is set to present non-informativeness. We assume a gamma prior Gamma(c0, d0) for pa-

rameter bk since bk is nonnegative and we choose c0 = d0 = 1 to let the proposed model priorly

approximate to the 3PL-IRT model with parameter ck = 0.25. If the estimates of bk are different

from 0, it shows evidence supporting the proposed model against the 3PL-IRT model.

In the estimation process, DKs are treated as missing values rather than incorrect responses.

As discussed above, there may be variation among the levels of political knowledge for respondents

who say they don’t know and we do not arbitrarily consider these respondents as the public with

low levels of political knowledge. Instead, by treating DKs as missing values in a Bayesian setup,

we obtain the joint posterior of all random quantities in the model, including the levels of political

knowledge, the item parameters, and the partially observed item responses (DKs) via Bayesian

simulation (Jackman, 2000a,b).

13

4 Data and Analysis

The dataset we analyze is survey data conducted for Taiwan’s 2012 presidential and legislative

elections (TEDS2012) by Election Study Center of National Chengchi University. This dataset,

collected by face-to-face interview after the 2012 election, includes three open-ended items and four

multiple-choice items about political affairs to investigate Taiwanese public’s political knowledge.

These knowledge questions are DK-neutral items. In the survey process, interviewers are instructed

to accept but not offer the DK option in advance. In other words, interviewers neither encourage

nor discourage respondents to answer DK and just record their answers including DK responses.

Table 1 shows the seven knowledge items used in our analysis and corresponding distributions

of responses. Based on their distributions, these items can be roughly classified into three groups:

(1) Item 1, 2, and 6; (2) Item 3, 4, and 5; and (3) Item 7. As can be seen, first, Item 1, 2, and

6 are relatively easy for survey respondents to answer. The percentages of correct responses for

these three items are 75.85%, 63.14%, and 87.02%, respectively. Further, compared Item 1 with

Item 6, we do observe a higher percentage of DKs in the open-ended item (Item 1) than in the

multiple-choice one (Item 6), which implies guessing behaviors in Item 6.

Second, the percentages of correct responses for Item 3, 4, and 5 are 28.81%, 34.56%, and

33.68%, respectively. These items are a little bit hard for the public in Taiwan to answer. In other

words, respondents require some extent of political knowledge to get these items correct. Table 1

shows that one third of the respondents choose the DK option in the two closed-ended items (Item

4 and 5), which suggests that these respondents likely do not know the answers. Furthermore,

compared Item 3 with Item 4 and 5, the proportion of DKs is a little higher in the open-ended

item (Item 3) than those in the closed-ended items (Item 4 and 5). This result implies that some

partially informed respondents choose DK in Item 3, but may guess the answers in Item 4 and 5.

Finally, as shown in Table 1, there are only 18.67% of the respondents correctly answer Item

7, which implies that Item 7 is a much harder question among these seven items. This highly

difficult item also results in high percentage (53.89%) of DKs. Notice that incorrect responses are

higher than correct ones, which is the evidence of guessing behaviors. It is reasonable to assume

14

Table 1: The Percentage of Responses to Knowledge Items in TEDS 2012

Political Knowledge Items Correct Incorrect “Don’t Know”

1. Who is the current president75.85 2.14 22.01

of the United States?2. Who is the current premier

63.14 8.87 27.99of our country?

3. What institution has the power28.81 27.76 43.43

to interpret the constitution?4. Which of these persons was the finance

34.56 31.98 33.46minister before the recent election?

5. What was the current unemployment rate33.68 33.30 33.02

in Taiwan as of the end of last year (2011)?6. Which party came in second in seat

87.02 3.07 9.91in the Legislative Yuan?

7. Who is the current Secretary-General18.67 27.44 53.89

of the United Nations?

Note: The first three knowledge items are open-ended while the other four are multiple-choice items.

that some partially informed respondents may choose the wrong answer since it requires higher

levels of political knowledge to guess the correct one.

4.1 Results of Analysis

We apply the proposed model to analyzing the TEDS2012 dataset. The model is estimated via

Markov chain Monte Carlo (MCMC) methods (Albert and Chib, 1993; Fox, 2010) implemented in

JAGS 3.3.0 (Plummer, 2003) called from R version 3.0.2 (R2jags, Su and Yajima, 2012).5 Before

we investigate guessing behaviors in the seven knowledge items, we compare the estimates of item

parameters from three models: the 2PL-IRT model represented in Equation (1), the 3PL-IRT

model represented in Equation (2), and the proposed 2PL-IRT guessing model represented in

Equation (5).

Figure 3 shows 90% the credible intervals of item-difficulty parameters on the top and item-

5The estimation was performed with three parallel chains of 50,000 iterations each to be conservative. The firsthalf of the iterations were discarded as a burn-in period and 10 as thinning and thus 7,500 samples were generated.The convergence of MCMC chains is conducted and there is no evidence of non-convergence in these chains.

15

Item Difficulty

-5 -4 -3 -2 -1 0 1 2

-5 -4 -3 -2 -1 0 1 2

Item 3

Item 2

Item 1

2pl3pl2pl-guessing

Item Difficulty

-5 -4 -3 -2 -1 0 1 2

-5 -4 -3 -2 -1 0 1 2

item 7

Item 6

Item 5

Item 4

2pl3pl2pl-guessing

Item Discrimination

0 1 2 3 4

0 1 2 3 4

Item 3

Item 2

Item 1

2pl3pl2pl-guessing

Item Discrimination

0 1 2 3 4

0 1 2 3 4

item 7

Item 6

Item 5

Item 4

2pl3pl2pl-guessing

Figure 3: The 90% credible intervals of item parameters for 2PL-IRT, 3PL-IRT, and 2PL-IRT guessingmodels.

discrimination parameters on the bottom. As can be seen in Figure 3, the top two panels show that

Item 1, 2, and 6 have lower values of item difficulty among these items, which indicates that they

are relatively easy questions. On the contrary, Item 3, 4, 5, and 7 are relatively hard questions.

Notice that, although the three models provide almost equivalent estimates of item difficulty for

open-ended items, the 3PL-IRT model and the 2PL-IRT guessing model have larger values of

item-difficulty parameters for multiple-choice items, particularly Item 4 and 5. This result reflects

the fact that some of the multiple-choice items are actually harder if we take possible guessed

responses into account.

The estimates of item-discrimination parameters are shown in the bottom of Figure 3. The

bottom two panels show that Item 1 and 4 perform well in discriminating different levels of political

16

3PL Model

θ

Prob

. of S

ucce

ssful

Gue

ss

0.0

0.5

1.0

-5 -4 -3 -2 -1 0 1 2 3 4 5

Item 4 α=0.263, β=2.413, b=0.154Item 5 α=0.995, β=1.441, b=0.294Item 6 α=-2.031, β=1.879, b=0.512Item 7 α=0.977, β=1.863, b=0.176

3PL Model

θ

Prob

. of C

orrec

t Res

pons

e

0.0

0.5

1.0

-5 -4 -3 -2 -1 0 1 2 3 4 5


2PL Guessing Model

θ

Prob

. of S

ucce

ssful

Gue

ss

0.0

0.5

1.0

-5 -4 -3 -2 -1 0 1 2 3 4 5


2PL Guessing Model

θ

Prob

. of C

orrec

t Res

pons

e

0.0

0.5

1.0

-5 -4 -3 -2 -1 0 1 2 3 4 5


Figure 4: The probabilities of guessing and item success for the 3PL-IRT model and for the 2PL-IRTguessing model.

knowledge. It is reasonable to state that in Taiwan Item 1 can discriminate respondents with low

knowledge level from those with average knowledge level while Item 4 distinguishes respondents

with average knowledge level from those with relatively high knowledge level.

Figure 4 displays the probability of successful guessing and the probability of correct response

across different levels of political knowledge from both the 3PL-IRT model and the proposed 2PL-

IRT model. The illustration of blind and informed guessing is quite obvious in Figure 4. The

top two panels suggest that uninformed respondents have higher probability to correctly guess the

answer for multiple-choice items, compared to partially informed and fully informed respondents.

In contrast, the bottom two panels in Figure 4 show informed guessing for items that requires

relatively high levels of political knowledge for respondents to guess the correct answers. The

17

results from the 2PL-IRT guessing model are considerably consistent with the conventional wisdom

about the status and amounts of Taiwanese information-obtaining channels which associated with

their levels of political knowledge. On the one hand, Item 7 involves international affairs, which are

relatively limited information in Taiwan’s traditional and electronic media. Therefore, respondents

have to be relatively knowledgable to correctly guess or answer Item 7.

On the other hand, Item 4, 5, and 6 are questions about domestic politics. Item 6 asks

respondents which party came in the second in seats in the Legislative Yuan. Information about

party politics like this is more common in media reports and, therefore, citizens are more aware

of party competition in both electoral and legislative arenas. So respondents with low knowledge

levels are likely to guess the correct answer. Although Items 4 and 5 involve the fact of Taiwan’s

politics, the public in Taiwan is less aware of the finance minister and the unemployment rate

in 2011. The finance minister is relatively unknown than the prime minister due to limited

media exposure. With regard to Item 5, the public may have partial information from the media

to understand domestic economic situation but not necessarily know the concrete figure of the

unemployment rate. Therefore, relatively high knowledge levels are required for respondents to

correctly guess or answer Item 4 and 5, but not for Item 6.

5 Concluding Remark

Blind guessing and informed guessing exist ubiquitously in educational testing and survey ques-

tionnaires. It is argued that item formats and the presence of DK optionoptions lead to the problem

of guessing. In this paper, we distinguish blind guessing and informed guessing in multiple-choice

items with DK options discouraged. The proposed Bayesian 2PL-IRT guessing model successfully

describes guessing properties of political knowledge items, which are based on respondents’ knowl-

edge levels and item characteristics, in Taiwan. In next stage, we will investigate whether this

model improves the estimates of political knowledge.

18

References

Albert, James H. 1992. “Bayesian Estimation of Normal Ogive Item Response Curves Using GibbsSampling.” Journal of Educational and Behavioral Statistics 17(3):251–269.

Albert, James H. and Siddhartha Chib. 1993. “Bayesian Analysis of Binary and PolychotomousResponse Data.” Journal of the American Statistical Association 88(442):669–679.

Andrich, David and Ida Marais. 2014. “Person Proficiency Estimates in the Dichotomous RaschModel When Random Guessing Is Removed From Difficulty Estimates of Multiple ChoiceItems.” Applied Psychological Measurement 38(6):432–449.

Andrich, David, Ida Marais and Stephen Humphry. 2012. “Using a Theorem by Andersen andthe Dichotomous Rasch Model to Assess the Presence of Random Guessing in Multiple ChoiceItems.” Journal of Educational and Behavioral Statistics 37(3):417–442.

Baker, Frank B. and Seock-Ho Kim. 2004. Item Response Theory: Parameter Estimation Tech-niques. 2 ed. Boca Raton, FL: Chapman & Hall/CRC.

Birnbaum, Allan. 1968. Some Latent Trait Models and Their Use in Inferring An Examinee’sAbility. In Statistical Theories of Mental Test Scores, ed. Frederic M. Lord and Melvin R.Novick. Reading, MA: Addison-Wesley pp. 397–479.

Campbell, Angus, Philip E. Converse, Warren E. Miller and Donald E Stokes. 1960. The AmericanVoter. New York: John Wiley.

Cao, Jing and S Lynne Stokes. 2008. “Bayesian IRT Guessing Models for Partial Guessing Be-haviors.” Psychometrika 73(2):209–230.

Clinton, Joshua D., Simon Jackman and Douglas Rivers. 2004. “The Statistical Analysis of RollCall Data.” American Political Science Review 98(2):355–70.

Delli Carpini, Michael X. and Scott Keeter. 1991. “Stability and Change in the US Public’sKnowledge of Politics.” Public Opinion Quarterly 55(4):583–612.

Delli Carpini, Michael X. and Scott Keeter. 1993. “Measuring Political Knowledge: Putting FirstThings First.” American Journal of Political Science 37(4):1179–1206.

Delli Carpini, Michael X. and Scott Keeter. 1996. What Americans Know About Politics and WhyIt Matters. New Haven, CT: Yale University Press.

Embretson, Susan E. and Steven P. Reise. 2000. Item Response Theory for Psychologists. Mahwah:Lawrence Erlbaum.

Fox, Jean-Paul. 2010. Bayesian Item Response Modeling: Theory and Applications. New York,NY: Springer.

Galston, William A. 2001. “Political Knowledge, Political Engagement, and Civic Education.”Annual Review of Political Science 4(1):217–234.

19

Gibson, James L. and Gregory A. Caldeira. 2009. “Knowing the Supreme Court? A Reconsider-ation of Public Ignorance of the High Court.” The Journal of Politics 71(2):429–441.

Hambleton, Ronald K. and Linda L. Cook. 1977. “Latent Trait Models and Their Use in theAnalysis of Education Test Data.” Journal of Educational Measurement 14(2):75–96.

Jackman, Simon. 2000a. “Estimation and Inference are Missing Data Problems: Unifying SocialScience Statistics via Bayesian Simulation.” Political Analysis 8(4):307–332.

Jackman, Simon. 2000b. “Estimation and Inference via Bayesian Simulation: An Introduction toMarkov chain Monte Carlo.” American Journal of Political Science 44(2):375–404.

Jackman, Simon. 2001. “Multidimensional Analysis of Roll Call Data via Bayesian Simulation:Identification, Estimation, Inference, and Model Checking.” Political Analysis 9(3):227.

Jackman, Simon. 2008. Measurement. In The Oxford Handbook of Political Methodology, ed. JanetBox-Steffensmeier, Henry Brady and David Collier. Oxford: Oxford University Press.

Jackman, Simon. 2009. Bayesian Analysis for the Social Sciences. Chichester, UK: Wiley.

Johnson, Valen E. and James H. Albert. 1999. Ordinal Data Modeling. New York, NY: Springer-Verlag.

Lassen, David Dreyer. 2005. “The Effect of Information on Voter Turnout: Evidence from ANatural Experiment.” American Journal of Political Science 49(1):103–118.

Lord, Frederic M. 1968. “An Analysis of the Verbal Scholastic Aptitude Test Using Birnbaum’sThree-Parameter Logistic Model.” Educational and Psychological Measurement 28(4):989–1020.

Lord, Frederic M. 1970. “Item Characteristic Curves Estimated without Knowledge of TheirMathematical Form–A Confrontation of Birnbaum’s Logistic Model.” Psychometrika 35(1):43–50.

Lord, Frederic M. and Melvin R. Novick. 1968. Statistical Theories of Mental Test Scores. Reading,MA: Addison-Wesley.

Luskin, Robert C. 1987. “Measuring Political Sophistication.” American Journal of PoliticalScience 31(4):856–899.

Luskin, Robert C. 1990. “Explaining Political Sophistication.” Political Behavior 12(4):331–361.

Luskin, Robert C. and John G. Bullock. 2011. “‘Don’t Know’ Means ‘Don’t Know’: DK Responsesand the Public’s Level of Political Knowledge.” Journal of Politics 73(2):547–557.

Martin, Andrew D. and Kevin M. Quinn. 2002. “Dynamic Ideal Point Estimation via Markovchain Monte Carlo for the US Supreme Court, 1953–1999.” Political Analysis 10(2):134–152.

Miller, Melissa K. and Shannon K. Orr. 2008. “Experimenting with A ”Third Way” in PoliticalKnowledge Estimation.” Public Opinion Quarterly 72(4):768–780.

20

Mondak, Jeffery J. 1999. “Reconsidering the Measurement of Political Knowledge.” PoliticalAnalysis 8(1):57–82.

Mondak, Jeffery J. 2001. “Developing Valid Knowledge Scales.” American Journal of PoliticalScience 45(1):224–238.

Mondak, Jeffery J. and Belinda Creel Davis. 2001. “Asked and Answered: Knowledge LevelsWhen We Will Not Take “Don’t Know” for an Answer.” Political Behavior 23(3):199–224.

Mondak, Jeffery J. and Damarys Canache. 2004. “Knowledge Variables in Cross-National SocialInquiry.” Social Science Quarterly 85(3):539–558.

Mondak, Jeffery J. and Mary R. Anderson. 2004. “The Knowledge Gap: A Reexamination ofGender-Based Differences in Political Knowledge.” Journal of Politics 66(2):492–512.

Pietryka, Matthew T. and Randall C. MacIntosh. 2013. “An Analysis of ANES Items and TheirUse in the Construction of Political Knowledge Scales.” Political Analysis 21(4):407–429.

Plummer, Martyn. 2003. JAGS: A Program for Analysis of Bayesian Graphical Models UsingGibbs Sampling. In Proceedings of the 3rd International Workshop on Distributed StatisticalComputing. pp. 20–22.

Rasch, Georg. 1960. Probabilistic Models for Some Intelligence and Attainment Tests. The DanishInstitute for Educational Research.

San Mart́ın, Ernesto, Guido Del Pino and Paul De Boeck. 2006. “IRT Models for Ability-BasedGuessing.” Applied Psychological Measurement 30(3):183–203.

Skrondal, Anders and Sophia Rabe-Hesketh. 2004. Generalized Latent Variable Modeling: Multi-level, Longitudinal, and Structural Equation Models. Boca Raton, FL: Chapman & Hall/CRC.

Sturgis, Patrick, Nick Allum and Patten Smith. 2008. “An Experiment on the Measurement ofPolitical Knowledge in Surveys.” Public Opinion Quarterly 72(1):90–102.

Su, Yu-Sung and Masanao Yajima. 2012. R2jags ver. 0.03-08 (R package).URL: http://cran.r-project.org/web/packages/R2jags/

Tourangeau, Roger, Lance J Rips and Kenneth Rasinski. 2000. The Psychology of Survey Response.Cambridge, UK: Cambridge University Press.

Treier, Shawn and Simon Jackman. 2008. “Democracy as a Latent Variable.” American Journalof Political Science 52(1):201–217.

Waller, Michael I. 1976. Estimating Parameters in the Rasch Model: Removing the Effects ofRandom Guessing. Princeton, NJ: Educational Testing Service.

Waller, Michael I. 1989. “Modeling Guessing Behavior: A Comparison of Two IRT Models.”Applied Psychological Measurement 13(3):233–243.

21

modeling guessing properties of multiple-choice … guessing properties of multiple-choice items in...

Documents