short is better: evaluating the quality of answers in ... · questions are tackled by means of an...
TRANSCRIPT
1
Short is better: Evaluating the quality of answers in online surveys through
screener questions
Riccardo Ladini, University of Trento ([email protected])
Moreno Mancosu, Collegio Carlo Alberto ([email protected])
Cristiano Vezzoni, University of Trento ([email protected])
Paper prepared for the ECPR General Conference, Prague, September 7-10, 2016
Section: Experimental Designs in Political Science
Panel: Survey Experiments I: Application of the Design
Draft August 26, 2016. Please do not quote without permission of the authors.
Abstract
The article shows that a large proportion of respondents (≥ 50%) in an online survey does not pay
adequate attention to the wording of questions, especially if the text is long and complex. The result is
obtained by an experiment on screener questions, specifically developed to evaluate the attention of the
respondents. To successfully pass a screener question, a respondent must follow the instructions
described in a text that contains a varying amount of misleading information. Screeners turn out to be
effective tools for evaluating the quality of respondents’ answers, especially if the cognitive load
required to pass the test is low. The analyses are based on ITANES data (Italian National Election
Study) collected in 2015, with n = 3000.
Keywords: Screener, cognitive strain, survey experiment, online surveys
2
Introduction
Answering a survey needs an effort that respondents are not always prepared to sustain
(Krosnick, 1991; Lenzner, Kaczmirek and Lenzner, 2010). Several studies have found that respondents
do not always pay attention in answering survey questions. Often they do not choose the option that
better represents their positions but they rather adopt strategies that reduce the cognitive load required
to answer correctly and sincerely. For instance, a respondent can select the first option that satisfies
him/her or even a random answer category (Krosnick, 1991). These strategies determine measurement
errors, that can be attributed to the respondent (Groves, 1989; Corbetta, 2015: 62). This type of error is
likely to increase its relevance as long as the control of the researcher on the interview setting
decreases, as in the case of online surveys performed with CAWI mode (Computer Assisted Web
Interview).
In order to estimate the amount of these errors in the online surveys, Instructional Manipulation
Checks (IMC, Oppenheimer, Meyvis and Davidenko, 2009), also known as screener1 (Berinsky,
Margolis and Sances, 2014) have been proposed. This survey tools, diffused both in psychology and
political science2, can be seen as a test aimed at distinguishing between “quality” respondents and
respondents who do not pay enough attention to survey questions (Meade and Craig, 2012). In addition,
the tool can activate respondents’ attention during an online survey (Oppeheimer et al., 2009).
Despite the growing diffusion of online surveys in social research (Callegaro, Lozar Manfreda
and Vehovar, 2015) and the increasing relevance of issues related to CAWI data quality,
methodological works focused on the nature and employment of screeners are usually based on ad hoc
1 Henceforth, we will use the label screener to define this kind of questions.
2 Berinsky and colleagues (2014) identify about 40 studies employing screeners between January 2006 and July 2013.
3
experimental studies, carried out on small convenience samples (Oppenheimer et al., 2009; Hauser and
Schwarz, 2015).
The study presented here aims to enhance the knowledge on the empirical working of screeners
in realistic survey conditions, by employing data coming from an online survey carried out on a large
sample of Italian respondents (n=3,000). The paper attempts to answer two main research questions.
First, we examine the relation between the difficulty of a screener and its ability in identifying
“quality” respondents. Second, we test whether a screener activates respondents’ attention. Both the
questions are tackled by means of an experimental design, in which the position and the wording of the
screener are manipulated.
Results could be of particular interest for researchers who rely on online surveys to collect data
and employ non-conventional questions in CAWI surveys, such as vignettes (Atzmüller and Steiner,
2010; Mutz, 2011: 54-67).. These questions often imply long and complex texts, which necessitate a
careful and accurate reading. Is it possible to expect that respondents read so carefully those questions
in a CAWI survey? This is, basically, the question that this paper faces.
Screener and respondents’ quality in online surveys
A screener is a multiple-response question in an online/self-administered survey. Differently
from what happens in a common behavioral or attitudinal question, in a screener the respondent is not
required to express an opinion or a behavior, but (s)he is rather requested to complete a task, following
specific instructions hidden in the wording of the question. Performing the task is made difficult by the
introduction in the text of misleading information. Generally speaking, a screener can be split in four
sections, as shown in figure 1. The first section presents an introduction that provides information
connected to the topic of the survey, but not relevant for the accomplishment of the task. The second
4
part provides the instructions that respondent must follow to carry out the text and pass the screener.
The task requires to select a combination of non-consistent answer categories which would be hardly
chosen if one does not read the instructions. The third part is constituted by the trap-question, which is
aimed at diverting respondent from passing the screener.3 The question, indeed, is semantically
connected to the available answer options, but the accomplishment of the screener is independent from
the content of the question. Finally, the fourth part is constituted by the answer categories.
Figure 1: Base screener wording employed in the experiment (Source: CAWI ITANES - 2015)
Part 1. Introduction Previous research shows that the large majority of people who gather information on-
line prefer a site or portal that they perceive as more trustworthy than others.
Part 2. Task
In this case, however, we are interested to know whether people take the time they
need to follow carefully instructions in interviews. To show that you’ve read this
much, please ignore the question and select only the options “Local newspaper
websites” and “None of these websites” as your two answers, regardless the websites
you actually visit.
Part 3. Trap question When you hear some breaking news, which is the news website you would visit more
frequently? (Maximum three answers)
Part 4. Answer categories
□ La Repubblica □ Il Giornale □ La Stampa
□ Corriere □ Dagospia □ Press association websites
□ Huffington post □ Il Fatto Quotidiano □ Other
□ Libero quotidiano □ Local newspapers websites □ None of these websites
A respondent passes the screener if (s)he follows the instructions described in the second part,
otherwise (s)he fails the screener, even if his/her answer is coherent with the trap-question. In our case,
in order to pass the screener the respondent must tick both “Local newspapers websites” and “I never
consult websites”, a clearly contradictory combination from a semantic point of view. In order to pass
the screener, the respondent should read carefully the instructions of part 2. People who limit their
3 For this reason, in the literature, the whole question is usually called trap question (Hauser and Schwarz, 2015).
5
reading to the introduction, to the trap question or, even, to the answer categories fail the test, being
deceived by the apparent semantic consistency of the answer. The cognitive strain requested to pass the
test varies accordingly to the amount of misleading text that must be read to correctly identify the
instructions, as well as to the presence/absence of the trap-question.
The main aim of the screener is to distinguish workers, who read carefully the questions and are
attentive in answering the surveys, and shirkers (Berinski et al, 2014), namely subjects who do not pay
enough attention and answer superficially to the questions of a survey4. The assumption here is that
people who pay the necessary attention to screeners are those who pay, in general, more attention to
every question in a survey and answer consistently. According to this assumption, people who answer
correctly the screeners can be defined as “quality respondents”. For this reason, several authors suggest
that screeners should be included in online surveys in order to allow researchers to exclude ex-post
inattentive respondents (Goodman, Cryder e Cheema, 2013, Oppenheimer et al., 2009), or, at least, to
stratify results by levels of respondents’ attentiveness (Berinsky et al., 2014).5
Despite the relevance of the topic, the knowledge on the empirical working of screeners is still
embryonic and the potential of the instrument to improve quality of survey response is far from being
clear. A large amount of screeners proposed in the literature presents a particularly complex wording
and lengths that exceed several times the average length of a survey question. This choice largely
reflects the aim for which screeners have been proposed, that is, distinguishing “quality” respondents
4 In the literature, this tendency is defined as satisficing (Krosnick 1991, Oppenheimer et al. 2009).
5 Stratification is a less onerous procedure than filtering of subjects who do not pass the screener. Excluding these subjects
could indeed lead to external validity issues, unbalancing the composition of the sample with respect to several individual
characteristics, such as socio-demographic ones, which are associated to the passing of the test.
6
from inattentive and shallow ones. In some situations, however, choosing screeners characterized by an
excessive cognitive load can turn out to be an issue.
If from one side a difficult screener can more certainly identify good (that is, more attentive)
respondents, from the other side it risks to identify as shirkers respondents who are actually not so
inattentive to the survey questions (and thus produce good quality answers), but simply fail the screener
because the task is too difficult6.
This aspect has been little considered in the literature. To understand the problem it is good to
remember that a screener, as every question in a survey, requires the respondent to apply a certain
amount of cognitive work in order to be correctly understood. The workload varies according to the
question’s wording (Kahneman, 1973; Kool, McGuire, Rosen e Botvinick, 2010). Starting from
psycholinguistics findings (Tourangeau, Rips e Rasinski, 2000), Lenzner et al. (2010) show that the
length and the syntactic complexity of a question - as well as the overload of respondents’ working
memory – can represent a hurdle in the understanding of the question and have a huge impact on data
quality (see also Christian, Dillman e Smyth, 2007). In the case of a screener, it is thus possible to
hypothesize that, an increase in complexity and length of the wording of the question will bring to a
higher cognitive load to complete the task.
Starting from these considerations, the first aim of the paper is to explore, by means of a survey
experiment that manipulates the wording of the screener, the relation between complexity of the
question wording and likelihood to complete correctly the task. The final aim is to calibrate the
6A complementary risk is either present. A screener which is too simple will not be able to operate an adequate distinction
between workers and shirkers, leading to coding as attentive a number of people who are not so attentive (in other words,
respondents who produce answers of scarce quality). In the light of screener examples presented in the literature, which are
usually characterized by a high level of complexity, this issue seems less relevant.
7
question wording, in order to identify an instrument in which the cognitive load optimally discriminates
between workers and shirkers, according to the rationale defined above.
Beside post-hoc discrimination between respondents according to their attentiveness, it has been
suggested that screeners can be employed to increase answers’ quality. In a certain way, it is supposed
that screeners “wake up” respondents, by activating their attention. The idea is that when a respondent
detects a tricky question, (s)he will be more attentive in order to avoid errors in subsequent questions.
Oppenheimer et al. (2009) propose to insert screeners which do not allow the continuation of the
survey until the respondent has successfully completed the test. In this way, it is guaranteed that the
respondent has correctly understood the screener wording (Guess, 2015) and that he/she understands
the need of answering to subsequent questions with an increased level of attentiveness (a strategy that
would be particularly useful for people who are initially shirkers). The idea that screeners can be
employed as an activation tools guided also a recent ad hoc experiment, carried out with Amazon
Mechanical Turk, in which Hauser and Schwartz (2015) show that the introduction of a screener placed
before a task enhances the likelihood of passing it. The effect of exposure to a screener on the quality
of subsequent questions has however showed a quite short latency. The instrument turned out to be of
little effect in maintain respondents’ attentiveness in answering to more distant questions in the survey
(Berinsky, Margolis e Sances, in press)
By means of an experiment that randomizes the screener’s position in a survey, our study aims
at verifying whether also in a realistic context of an online survey screeners can act as a tool of
activation (Hauser and Schwartz, 2015), hypothesizing that respondents exposed to a screener will
produce higher-quality answers with respect to those who have not been subjected to it and, in
particular, that this holds for those who have passed successfully the screener.
8
The experiment
A screener has been included in a self-administered CAWI survey of 3,000 individuals.
Following Berinsky and colleagues (2014, p. 740), the misleading information deals with the websites
one consults after having known a breaking news. The complete version of the screener is presented in
figure 1. In line with what suggested by Oppenheimer and colleagues (2009), this topic is not at odd
with the rest of the questions in the questionnaire, which is aimed at measuring Italians’ political
opinions and includes specific questions on media consumption. By means of a randomized procedure,
the experiment manipulates the cognitive load, that is the complexity of the task as function of the
complexity of its wording (hard, medium, easy), and its position (whether before or after a battery of
items). The experimental design is shown in table 1.
Table 1. Factorial design and experimental groups size
Position
Cognitive load Pre
(Treatment)
After
(Control)
Hard 515 510
Medium 502 492
Easy 479 502
For what concerns the cognitive load, the three experimental conditions are presented in figure
2. Indications to pass the task (part 2) and answer categories (part 4 of the screener) are the same for
each of the three groups. In the medium and easy version the introductory misleading information (part
1) has been removed, moreover in the easy version also the trap question (part 3) has been removed.
9
For what concerns the position, the screener has been randomly located before and after a
battery of questions regarding attitudes toward democracy7, which will be the one employed to evaluate
the quality of respondents.
Figure 2: Different experimental conditions (Source: CAWI ITANES - 2015)
Cognitive load / Task complexity
HARD MEDIUM EASY
Part 1.
Introduction
Previous research shows that the
large majority of people who
gather information on-line prefer
a site or portal that they perceive
as more trustworthy than others.
Part 2.
Task
(common to every
condition)
(In this case, however) we are (now) interested to know whether people take the time they
need to follow carefully instructions in interviews.(a)
To show that you’ve read this much, please ignore the question and select only the options
“Local newspaper websites” and “None of these websites” as your two answers, no matter of
the websites you actually visit.
Part 3.
Trap question
When you hear some breaking
news, which is the news website
you would visit more frequently?
(Maximum three answers)
When you hear some
breaking news, which is the
news website you would visit
more frequently? (Maximum
three answers)
(None)
Part 4.
Answer categories
(common to every
condition)
□ Repubblica □ Il Giornale □ La Stampa
□ Corriere □ Dagospia □ Press assoc. websites
□ Huffington post □ Il Fatto Quotidiano □ Other
□ Libero quotidiano □ Local newspapers websites □ None of these websites
(a) In the easy and medium versions, the first sentence of the task was slightly modified to make it coherent. “In this
case however, we are interested” becomes “We are now interested”.
7 Items are partly extracted from a battery of stealth democracy proposed by Hibbing and Theiss-Morse (2002). The
wording of the items is presented in figure 3.
10
Hypotheses
The first hypothesis concerns the relation between cognitive load and likelihood of correctly
accomplishing the task in the screener. As underlined above, the complexity of a screener should
influence the amount of cognitive load requested to the respondent and, thus, the likelihood of
accomplishing the task. Starting from this consideration, it is possible to draw the following
hypothesis:
Hp1. The higher is the cognitive load (in terms of complexity and length of the wording of the
screener), the lower is the passage rate8.
The main aim of a screener, however, is that of identifying respondents who pay attention to the
questions of the survey, answering with more accuracy. The second hypothesis, thus, focuses on the
relation between the outcome of the screener and quality of answers, and can be formulated as follows:
Hp2. Respondents who pass the screener produce answers to other questions of higher quality
than respondents who do not pass it.
It has also been underlined that a screener is considered as an instrument to activate
respondents’ attention, hypothesizing that the instrument enhances the focus of the respondent, who
answers in a more accurate way to the questions that follow. Therefore, we can formulate the third
hypothesis as follows:
8 Following Berinsky et al. (2014), we refer to ‘passage rate’ as the percentage of respondents that answer correctly to the
screener question.
11
Hp3. Exposure to a screener enhances the quality of answers to the questions that follow.
Finally, once shown that a screener is actually able to distinguish respondents according to the
quality of their answers, it is possible to ask which one is the right compromise between cognitive load
to which the respondent must be subjected and capacity of discriminating between workers and
shirkers. In the literature very complex screeners have been proposed so far, assuming that this solution
is the one that guarantees the identification of quality respondents in the clearer way. We can thus
formulate the last hypothesis as follows:
Hp4. The higher is the cognitive load for a screener, the higher is the quality of the answers of
respondents who pass it.
Data and Methods
Data come from the ITANES (Italian National Election Study) online panel 2013-15. Data
analyzed here concern the sixth wave of the survey, which took place about a month after Italian
regional elections of 2015, on a quota sample of 3,000 respondents. Those respondents were selected
randomly from a starting sample of panel participants (n=8,723), originally drawn by means of quota
sampling (according to quotas related to gender, age, and educational level), from a list of members of
an opt-in online community of a private research institute (SWG).
The quality of respondents has been evaluated considering their answers to a battery of items
placed immediately before or after the screener. The battery concerns attitudes toward democracy and
12
is made by 6 items as shown in figure 3. The respondent is asked to provide the degree of agreement
with every item by means of a scale going from 0 (totally disagree) to 10 (totally agree).
Reading the items, it is clear that the semantic polarity varies, as three items (1, 3 and 6) express
a negative attitude toward democracy and the other three (item 2, 4 and 5) express a positive attitude.
This choice is aimed at minimizing possible response set effects.
Figure 3 – Items on attitudes toward democracy
1. Compromises in politics are really just selling out on one’s principles
2. Parties are necessary to defend special interests of groups and social classes
3. Parties criticize one another, but actually they are all the same
4. Parties guarantee that people can participate to politics in Italy
5. There cannot be democracy without parties
6. Politicians would help the country more if they would stop talking and just take action on important problems
The first measure of answers’ quality is defined at the individual level, considering the so-called
straight-line response set, which represents the tendency that one has to answering with the same
answer category to every item of the battery. In the case of a battery in which the items present inverted
semantic polarity it is clear that a set of identical answers shows that the respondent does not
adequately consider the meaning of the questions. The measure is dichotomous: the variable is equal to
1 if respondents answer to all questions9 the same way and 0 otherwise.
The second measure of quality is defined at the aggregate level by means of the analysis of
internal consistency (Cronbach’s alpha) of the 6-item scale, adequately recoded insofar all items have
the same semantic polarity.10
Higher values of the coefficient indicate higher coherence among the
9 Value 1 is attributed also to respondents who have answered always “Don’t know”
10 The measures have been calculated through the listwise deletion. This led to a deletion of the 11% of the original
sample.1
13
answers of the battery, and thus the higher is the value of the coefficient, the higher is the quality of the
group of respondents on which the coefficient has been calculated. Results show Alpha’s confidence
intervals at the level of 95%, obtained by means of a bootstrap procedure on original data (Padilla,
Diverse e Newton, 2012).11
Results
The first analysis focuses on the passage rates of screeners. Table 2 shows the percentage of
respondents who pass the screener for every experimental groups. Data show clearly that the task is
more complex to be accomplished as long as the screener becomes more difficult (and, thus, the
cognitive load of the question is higher). In the easy version the screener is passed by the 50% of
subjects, while only the 22% passes the screener in its most difficult form. The first hypothesis is thus
confirmed. This latter result is particularly relevant for our aims, since the most complicated screener is
the one that is nearer to those proposed in the literature. If the aim of the instrument is to distinguish
between workers and shirkers, it is clear that such a level of complexity makes the screener extremely
selective, and only a little quota of the sample (1 out of 5) is able to pass the test. In addition, we can
underline other two elements of particular interest:
- Overall, respondents passing the screener are few, about 1 out of 3 (35%);
- Passage rates are not influenced by position of the screener. This assures us that the stealth
democracy battery does not influence the task described in the screener.
11
Synthetically, bootstrap is a technique that allows to infer standard errors and distribution of an estimate when the
underlying distribution is unknown. The technique is based on a set of n random resamples that lead to a distribution of n
parameters, that makes possible to infer the estimates of interest In our case, the bootstrapped estimate of confidence
intervals is calculated by means of 400 resamples.
14
Table 2. Passage rate by cognitive load and position
Position
Cognitive load
Pre
(N=1496)
Post
(N=1504) Passage rate by
cognitive load N
Hard 21 22 22 1025
Medium 36 33 35 994
Easy 48 51 50 981
Passage rates by
position 35 35 35 3000
For what concerns the ability of the instrument to distinguish between workers and shirkers,
table 3 shows clearly that respondents who passed the screener present higher quality answers. In this
group, the straight-line response set is practically absent, while it involves 13% of respondents who did
not pass the screener (the difference is significant, p < .001). The same results emerge for values of the
Cronbach alpha of the two groups. For what regards people who have passed the screener, the lower
bound of the confidence interval is higher than .70, a value considered as the minimum for research in
which non-validated scales are employed (Peterson, 1994). For the other respondents, identified by the
screener as not attentive in responding to questions, the confidence interval is located entirely under
this threshold. We can thus confirm our second hypothesis.
Table 3: Quality of the answers to the battery on attitudes towards democracy, by screener outcome.
Outcome % Response Set(a)
N Alpha I.C. 95% N(b)
Positive 1.3 1054 .72- .77 978
Negative 13.2 1946 .57 - .64 1683
Notes: (a)
Chi2(1) = 116.8; p < .01 (b)
Listwise deletion
15
Results of the experimental manipulation of the screener lead us to reject our third hypothesis,
since no effect due to activation seems to be detected. Table 4 shows clearly that the quality of the
answers on democracy attitudes, both in terms of response set and alpha, does not vary in the two
groups according to the position of the screener. This outcome is confirmed also by analyzing only
those respondents who have passed the screener and consequently realized the presence of the trap-
questions.
Table 4: Quality of the answers to the battery on attitudes towards democracy, by screener position
(whole sample and positive outcome results).
Position
% Response Set Alpha I.C. 95%
Whole
sample(a)
Only positive
outcome(b)
Whole sample Only positive
outcome
Pre 9.2 1.7 .63 - .69 .69 - .77
Post 8.8 0.9 .63 - .69 .72 - .79
N 3000 1054 2661(c)
978(c)
Notes: (a)
Chi2(1) = 0.2; p = .67 (b)
Chi2(1) = 1.2; p = .27 (c)
Listwise deletion
Finally, we have to evaluate the relation between cognitive load and quality of the respondents
who pass the test. Our expectation – inferred from the research practice that privileges screener several
times longer than usual survey questions – is that the harder is the trial, the higher is the quality of
respondents who pass it. Data obtained by means of the manipulation of the cognitive load do not go
toward this direction and lead us to reject also this hypothesis. In terms of response set, the quality of
respondents who pass the screener is the same, independently from the cognitive load (table 5, first
column). Therefore, by employing too complex screeners we obtain, a sub-optimal result: “good”
respondents fail the test and the number of people identified as quality respondents reduces
16
considerably. This confirms the result showed in Table 2, with passage rates dropping from 50% to
22% when considering respectively the easy and difficult screener.12
Table 5: Response set percentage to the battery on attitudes towards democracy, by screener cognitive
load and outcome.
Cognitive load
Positive outcome Negative outcome
% Response Set(a) N % Response Set(b) N
Hard 1.4 221 11.3 804
Medium 0.6 344 11.7 650
Easy 1.8 489 18.1 492
Notes: (a)
Chi2(2) = 2.4; p = .30 (b)
Chi2(2) = 14.1; p< .01
With respect to the fourth hypothesis, similar results come from the evaluation of Cronbach’s
alpha. The analysis of the confidence intervals, presented graphically in figure 4, confirms the absence
of a relation between cognitive load and quality of the answers, once distinguished the respondents by
the outcome of the task. This last result is particularly relevant when we have to define and calibrate
our instrument, since the increase of the cognitive load does not seem to present advantages in terms of
capacity of discriminating the quality of respondents. On the contrary, it seems to diminish excessively
the number of those who are not attentive enough to pass the test.
12
For what regards the response set, relatively easy screeners have the charming property of selecting better shirkers, as it is
possible to see in Table 5. 18% of those who fail the test with the easy screener answered with a straight-line response set to
the battery questions, while this happens only for the 12% who did not pass the medium and difficult tasks. This relation,
however, is not confirmed by the Cronbach's alpha.
17
Figure 4. Confidence interval at the 95% level of Cronbach's alpha (boostrap) of the battery on attitudes
toward democracy, by screener cognitive load and outcome.
Conclusion and discussions
Nowadays, online surveys represent one of the main instruments for collecting data in social
and political research (Callegaro et al., 2015: 4), allowing to conduct a large number of interviews in a
fast and cheap way (Loosveldt and Sonck, 2008: 96). This advantage has a cost: the absence of an
interviewer can lead to a reduced quality of the data collected. For this reason, the introduction of
survey instruments, aimed at controlling respondents’ attention while answering a survey, has been
suggested. These instruments, known as screeners, are tests in which the respondent must complete a
task by following instructions hidden inside the wording of a question, which contains a variable
18
amount of misleading information. The screener is also known as “trap question” because misleading
information are aimed at diverting respondent in his/her task.
The employment of screeners is increasingly growing up (Berinski et al. 2014), but knowledge
on their empirical working is still narrow and mainly limited to ad-hoc experimental studies. The
experiment proposed in this article aims at investigating, in the context of a real survey, how a screener
actually works and calibrates the cognitive load of the task according to its capacity of identifying
quality respondents. The experimental design, thus, randomizes two factors: the cognitive load and the
position of the screener.
The first result of the study is that only a limited number of respondents seems to read carefully
the wording of the question. Generally speaking, less than half of the sample passes successfully the
screener. The rate is equal to 50% for the easiest wording and drops to 22% when the wording is more
complex and the trap more insidious. This result leads to reflect on the quality of answers that people
produce in online surveys, in particular if one is subjected to questions more complex than the rest of
the survey. The result clearly represents an alarm bell for the quality of non-conventional questions -
such, for instance, vignette studies, which present on average long and complex wordings that vary in a
systematic way. Answers to these questions can be subjected to a large error due to the low attention of
the respondent and could be prone to large biases.
Our findings are even more significant if we consider that the passage of the task is associated
with the quality of answers that respondents give to other survey questions. This means that the
screener is actually able to distinguish between workers and shirkers and that it can be thus employed if
we want to maintain in the analysis only better respondents. Notwithstanding, this choice has a huge
cost, since the exclusion of people who do not pass the screener leads to erasing large portions of the
sample and a consistent drop in the statistical power of the study (in addition to the non-employment of
19
a large part of information). In employing the screener, the good calibration of the cognitive load is
thus important, in order to avoid excessively selective choices with respect to respondents’ quality.
Concerning this aspect, the experiment presented in this article stressed that the quality of the
answers of people who correctly passed the easy screener is substantially identical with respect to those
who passed the medium, as well as the most complex one. In order to differentiate among respondents,
it seems sufficient to introduce tasks characterized by a limited cognitive load, with the advantage of
identifying a higher number of quality respondents. This result is inconsistent with the general practice
of research, where the suggested screeners are generally very complex and require more cognitive
strain compared to the other questions in the survey. Our article thus recommend to rethink calibration
and empirical working of the screeners, which, in the common format, are not able to do efficiently
their job. The employment of brief and relatively simple screener, less invasive and more easily
employable in different contexts, turns out to be recommended. Starting from this introductory work,
more research on this topic will lead to test further the calibration of the instrument, in order to get to
an optimal wording, able to distinguish efficiently among respondents.
Finally, results of the manipulation of the screener’s position show that, in a context of a real
survey, the instrument does not act as an activation of respondents. Exposure to the screener does not
affect indeed the quality of the answers to questions immediately following. This result is not in line
with previous studies, which argued that screeners can behave as activation tools. This could be due to
the different conditions in which experiments have been carried out. In particular, Hauser and Schwartz
(2015) refer to an ad-hoc experimental study, with a relatively small sample (n < 400), with paid
respondents and a brief questionnaire, while our study is carried out on a large sample in real survey
conditions. However, the question remains open and further works are necessary to confirm the
findings of our experiment.
20
References
ATZMÜLLER, C., STEINER, P. M. (2010). Experimental vignette studies in survey research.
European Journal of Research Methods for the Behavioral and Social Sciences, 6(3), 128-138.
BERINSKY, A. J., MARGOLIS, M. F., SANCES, M. W. (2014). Separating the shirkers from
the workers? Making sure respondents pay attention on self-administered surveys. American Journal of
Political Science, 58(3), 739-753.
BERINSKY, A., MARGOLIS, M.F., SANCES, M.W. (In press). Can we turn shirkers into
workers?. Journal of Experimental Social Psychology.
CALLEGARO, M., LOZAR MANFREDA, K., VEHOVAR, V. (2015). Web survey
methodology. London: Sage.
CHRISTIAN, L. M., DILLMAN, D. A., SMYTH, J. D. (2007). Helping respondents get it right
the first time: the influence of words, symbols, and graphics in web surveys. Public Opinion Quarterly,
71(1), 113-125.
CORBETTA, P. (2015). La ricerca sociale: metodologia e tecniche. Vol. 2: Le tecniche
quantitative. Seconda edizione. Bologna: il Mulino.
21
GOODMAN, J. K., CRYDER, C. E., CHEEMA, A. (2013). Data collection in a flat world: The
strengths and weaknesses of Mechanical Turk samples. Journal of Behavioral Decision Making, 26(3),
213-224.
GUESS, A. M. (2015). Measure for measure: An experimental test of online political media
exposure. Political Analysis, 23(1), 59-75.
GROVES, R. M. (1989). Survey errors and survey costs. New York: Wiley-Interscience.
HAUSER, D. J. SCHWARZ, N. (2015). It’s a trap!: Instructional manipulation checks prompt
systematic thinking on "tricky" tasks. SAGE Open, 5, 1-6.
HIBBING, J. R., THEISS-MORSE, E. (2002). Stealth democracy: Americans' beliefs about
how government should work. Cambridge: Cambridge University Press.
KAHNEMAN, D. (1973). Attention and effort. Englewood Cliffs, NJ: Prentice-Hall.
KOOL, W., MCGUIRE, J. T., ROSEN, Z. B., BOTVINICK, M. M. (2010). Decision making
and the avoidance of cognitive demand. Journal of Experimental Psychology: General, 139(4), 665-
682.
KROSNICK, J. A. (1991). Response strategies for coping with the cognitive demands of
attitude measures in surveys. Applied cognitive psychology, 5(3), 213-236.
22
LENZNER, T., KACZMIREK, L., LENZNER, A. (2010). Cognitive burden of survey
questions and response times: A psycholinguistic experiment. Applied Cognitive Psychology, 24(7),
1003-1020.
LOOSVELDT, G., SONCK, N. (2008). An evaluation of the weighting procedures for an
online access panel survey. Survey Research Methods, 2(2), 93-105.
MEADE, A. W., CRAIG, S. B. (2012). Identifying careless responses in survey data.
Psychological methods, 17(3), 437-455.
MUTZ, D.C. (2011). Population-based survey experiments. Princeton: Princeton University
Press.
OPPENHEIMER, D. M., MEYVIS, T., DAVIDENKO, N. (2009). Instructional manipulation
checks: Detecting satisficing to increase statistical power. Journal of Experimental Social Psychology,
45(4), 867-872.
PADILLA, M. A., DIVERSE, J., NEWTON, M. (2012). Coefficient alpha bootstrap confidence
interval under non normality. Applied Psychological Measurement, 36(5), 331-348.
PETERSON, R. A. (1994) A meta-analysis of Cronbach's coefficient alpha. Journal of
Consumer Research, 21(2), 381-391
23
TOURANGEAU, R., RIPS, L. J., RASINSKI, K. (2000). The psychology of survey response.
Cambridge: Cambridge University Press.