stuck in the wisdom of crowds: information, knowledge, and
TRANSCRIPT
1
Stuck in the Wisdom of Crowds:
Information, Knowledge, and Heuristics
Yunwen He† Jaimie W. Lien‡ Jie Zheng±
Initial Version: July 1st, 2020
Current Draft: July 13th, 2021
Abstract:1
Collective knowledge is significantly affected by information about others’ viewpoints. However,
under what conditions does the “wisdom of crowds” help versus harm knowledge of factual
information? In this experiment, we present subjects with the task of answering 50 factual true or false
trivia questions, with the potential opportunity to revise their answers after receiving different levels
of information about other subjects’ answers and self-assessed confidence levels from an independent
session. We find that information about others’ answers improves performance on easy questions, but
tends to harm performance on difficult questions. In addition, information about answers provided by
other subjects mainly improves performance for those with lower initial knowledge levels. Subjects in
our Moderate-Information condition outperform those in either the Low or High-Information
conditions, implying an optimal level of social information provision, in which the Majority Rule and
Maximum Confidence rule complement one another. Although the Maximum Confidence rule can
improve performance, yielding the lowest overall error rate out of the heuristics considered, subjects
generally underutilize the information on other subjects’ confidence levels in favor of the Majority
Rule heuristic. These findings shed light on possible directions for policies that can cultivate factual
knowledge on online opinion platforms.
JEL Codes: C91, D81, D83
Keywords: wisdom of crowds, information provision, decision heuristics, majority rule, surprising
popularity, maximum confidence
1 † He: Department of Economics, School of Economics and Management, Tsinghua University, Beijing, China,
[email protected]; ‡ Lien: Department of Decision Sciences and Managerial Economics, The Chinese
University of Hong Kong, Shatin, Hong Kong, China, [email protected]; ± Zheng: Department of Economics,
School of Economics and Management, Tsinghua University, Beijing, China, [email protected]. Guangying Chen’s
excellent research assistance is greatly appreciated. We are thankful for comments from Shouying Liu, Xianghong Wang,
Chun-lei Yang, Maoliang Ye, Boyu Zhang, and session participants at the China Behavioral and Experimental Economics
Forum (2020) in Beijing. We gratefully acknowledge funding from the National Natural Science Foundation of China,
Tsinghua University, The Chinese University of Hong Kong, and Research Grants Council of Hong Kong. All errors are
our own.
2
1. Introduction
Many statements can be challenging to verify or refute on the spot. Such uncertainties about facts
are the origin of many controversies and disagreements. In such situations, the perspectives,
assessments, and comments of other people may be heavily used by decision-makers in order to
formulate their own judgments. Particularly in the digital era, online news and discussion forums have
considerably expanded decision-makers’ access to the opinions and suggestions of others.
Previous studies have demonstrated that groups can outperform the average in arriving at correct
answers, even outperforming the best individual response, which is often referred to as the “wisdom
of the crowd” effect. Utilizing collective intelligence has the potential to solve many important
decision-making problems regarding predictions or establishing facts, such as predicting stock markets
(Chen et al., 2014; Gottschlich & Hinz, 2014), political changes (Wolfers & Zitzewitz, 2004; Mellers
et al., 2014), or weather conditions (Baars & Mass, 2005), alleviating microfinance problems (Yum et
al., 2012), and boosting medical diagnostics (Wolf et al., 2015; Kurvers et al., 2016).
In most studies on the wisdom of crowds, the commonly used aggregation method is majority
voting (or averaging, for continuous estimation problems). However, decisions based solely on the
majority opinion are often criticized for ignoring diverse views and discarding potentially valuable
information content held by a minority of individuals. As a consequence, the effectiveness of a
majority rule approach may be limited to simpler tasks, while failing in many complicated settings
(Morton et al., 2019). For similar reasons, the simple averaging of judgments bears the risk of
neglecting experts by assigning equal weight to each person, which reduces the full potential of
benefitting from the wisdom of crowds (Lee et al., 2011).
Algorithms which have the potential to better extract the wisdom of crowds have been proposed;
for example, weighting judgments by individuals’ confidence levels (Cooke, 1991; Koriat, 2012) or
knowledge levels (Raykar et al., 2010; Aspinall, 2010; Wang et al., 2011; Budescu & Chen, 2014),
selecting the answer which is more popular than predicted (Prelec et al., 2017), pivoting and
recombining private and shared information optimally (Palley & Soll, 2019), and others. However,
each of these approaches are not necessarily guaranteed to succeed in practice, since the heuristics that
individuals actually use to gain access to the wisdom of crowds in practice have not been detected and
tested empirically.
In this paper, we establish several facts about how individuals form their beliefs about statements,
when provided with information about other individuals’ beliefs and decisions. We also study if and
when individuals change their minds when presented with differing degrees of information. We
conduct comparisons between three treatments with different levels of information, to identify the
determinants of subjects’ answer changes. The comparison between the baseline treatment without
information, and three successively informative information treatments allows us to study how access
other respondents’ information influences subjects’ final performances.
A key research area relevant to our work is the impact of social information on individuals’
decision making. Lorenz et al. (2011) finds that the knowledge about other respondents’ estimates is
detrimental to the “wisdom of crowd” effect by narrowing the diversity of opinions over periods in a
quantity estimation task. Bazazi et al. (2019) provide further evidence for a negative effect, while
showing that diminished crowd wisdom and increased social conformity are only present in the
3
individual incentive structure and not in the collective incentive structure. In the online experiment of
artificial investment games conducted by Chen et al. (2020), forecasts about the financial market are
more concentrated within groups with forecast sharing, failing to reduce the prediction error, but
generating higher variability across groups. However, social information does not necessarily erode
collective intelligence when individual decisions are made sequentially, due to the tendency to imitate
successful individuals (King et al., 2012); or when information is provided in a complete,
disaggregated format (Silva & Correia, 2016; Jayles & Kurvers, 2020). The size of the group and the
size of the majority are also potentially important determinants of social influence (Brown, 2000).
While most of these prior studies focus on the conditions that increase or decrease the
performance of collective knowledge, our study seeks to establish at more detailed level, when and
how “crowd wisdom” helps or harms decisions individually. Prior experimental studies primarily focus
on quantitative estimation tasks while varying the form of shared information in different treatments.
However, the literature largely lacks studies focusing on the effect of different levels of information
along multiple dimensions, and their impact on successful persuasion of individuals through changes
to prior answers.
We address this important issue by implementing simple binary choice tasks, in which subjects
are asked to give their incentivized answers to 50 factual and uncontroversial true/false trivia questions.
Following Prelec et al. (2017), we further incentivize the subjects to report their confidence in their
own answer, their estimate of others’ giving the same answer, and their estimate of the average
confidence in own answer reported by all participants. In the information provision treatments, we
additionally provide subjects with the opportunity to revise their answers, after having received
different levels of aggregated information from low to high, generated from a baseline treatment.
Firstly, we show that information provision leads to improvement in accuracy on easy questions,
and undermines it among difficult questions, as measured by the baseline correct rates of each question.
This result follows the spirit of the insights in Lorenz et al. (2011) and Tump et al. (2018), which
propose that social information has an asymmetric impact, with better average individual performance
on easy questions and poorer average individual performance on difficult questions.
We also find that subjects with lower knowledge levels generally benefit the most from social
information in terms of improvement in the accuracy of their answers. Furthermore, confident subjects
lose the potential opportunity to benefit from information since they are less influenced by it, which is
consistent with the notion that confidence and accuracy are related to sensitivity levels to social
influence (Yaniv & Milyavsky, 2007; Chacoma & Zanette, 2015; Jayles et al., 2017).
We find that there exists an optimal level of information provision in our setting, reminiscent of
studies that find an optimal group size for exchanging opinions in the quantity estimation tasks (Jayles
& Kurvers, 2020). The moderate-level information condition which reveals other participants’ choice
frequencies and confidence levels generates the best performance, avoiding information overload
while enabling comparisons between options along multiple dimensions.
To examine how subjects use information in revising their opinions, we focus on three specific
heuristics which can be tested using the information provided to subjects in the experiment: Majority
Rule, the Maximum Confidence rule and the Surprising Popularity rule. Our experimental results show
that Majority Rule is the most attractive heuristic for decision-makers. Furthermore, subjects rely more
heavily on Majority Rule over the Maximum Confidence rule in our Full-Information treatment
4
compared to the Moderate-Information treatment. This indicates that the greater complexity of
information processing in the Full-Information treatment leads individuals to a reduced processing of
the information available. We find little evidence for the use of the standard Surprising Popularity rule
proposed by Prelec et al. (2017) (that is, comparing the average estimated percentage of “True”
answers with the actual percentage) nor for the egocentric Surprising Popularity rule (that is,
comparing their own estimated percentage of “True” answers with the actual percentage) in practice
in our experiment.
Despite its prevalent usage, Majority Rule has serious limitations and tends to harm knowledge
on difficult questions. Our regression analyses on correct rates demonstrates that although information
provision generally plays a positive role, a higher proportion of participants having the same answer
as their own is associated with inferior final performance, especially when the relevant information is
revealed. Joint analysis on the two stages performance (initial answer and revision) also indicates that
the relative risk ratio of always giving the wrong answer is significantly higher when subjects’ original
view is widely accepted.
Our study fills a gap in the literature on the wisdom of crowds by examining the effects of social
conditions, in terms of the amount of information provided along multiple dimensions, on individual
factual knowledge and opinion changes. Furthermore, we focus on the context of binary choice
questions, which provides a convenient framework for analyzing heuristics utilized on factual
problems. Our experimental approach reflects many realistic situations, such as voting, making
investments, and fact-finding in online discussion forums, where decision-makers need to take a
position, after being potentially exposed to social information. Thus, our work complements previous
studies by showing that more social information is not always better for factual knowledge, and
furthermore, that there tends to be a strong individual bias in favor of following the majority.
The rest of the paper is organized as follows: Section 2 describes the experimental design. Section
3 evaluates the final performance in our task at the question level and the subject level, as well as
comparing the accuracies of three heuristic rules. Section 4 investigates the determinants of subjects’
answer changes and the associated rules. Section 5 discusses the overall impact of information
provision and initial answers on final performance. Section 6 concludes and discusses.
2. Experimental Design
2.1 Experiment treatments
We implement an experiment with 50 factual true/false trivia questions of varying expected
difficulty that test participants’ real-world knowledge. Participants are incentivized to answer the 50
questions correctly to the best of their ability. Subjects are also asked to report their confidence in their
own answer (ranging from 50%-100%), their estimate of the proportion of others’ giving the same
answer (ranging from 1%-100%), and their estimate of the average confidence of all participants in
own answer (ranging from 50%-100%).
Quiz questions and follow-up questions are incentivized in the following way: Subjects receive 1
RMB for each quiz question that is correctly answered. For their self-reported confidence, when it is
smaller than a random number Ri% generated by the computer from the interval [0%, 100%], they will
additionally receive 2×Ri% RMB; otherwise they will receive 0.2 RMB if the corresponding quiz
question is correctly answered, or 0 if the corresponding quiz question is incorrectly answered. This is
5
exactly the Becker-DeGroot-Marschak (1964) mechanism utilized to incentivize subjects to report
their real confidence levels. Finally, 0.2 RMB will be added to their payment if the errors for their
estimate of others’ giving the same answer, or of the average confidence in own answer of all
participants are correct within 5 percentage points. Detailed descriptions of the experimental
instructions and the true/false questions can be found in the Appendix.
In the baseline treatment (NI; No-Information), the only experimental task for subjects is to submit
their answers and beliefs as described above. They receive monetary payments based on their
performance on all quiz questions and corresponding follow-up questions on beliefs.
In the three information provision treatments, subjects first complete the task with the same rules
as in the NI treatment, without being informed that there will be a second stage. After the first stage
has ended, they are informed that they are provided with an opportunity to revise their answers after
having received some information generated from a session conducted earlier. The information
presented to subjects in this second stage is from the baseline NI treatment. The monetary payments
are then based on their finalized answers to the quiz questions as well as their self-reported estimates
from the first stage.
In the Low-Information treatment (LI), subjects are only informed of the actual choice distribution
of answers given by the respondents in the baseline treatment. In the Moderate-Information treatment
(MI), the information about the average confidence levels of those who agree and disagree with the
statement from the baseline treatment are additionally provided. In the Full-Information treatment (FI),
in addition to the actual choice distribution and the average level of self-reported confidences, the
estimates about others’ answers and confidences are also provided in the. Table 1 summarizes the
treatments.
Table 1: Treatment Overview
Treatment First stage:
Question
Answering
and belief
elicitation
Second stage:
Information
Answer
revision
opportunity
Information provided in the second stage (generated from
NI treatment)
NI (No
Information)
Yes No -
LI (Low
Information)
Yes Yes 1) Actual choice frequencies of answers.
MI
(Moderate
Information)
Yes Yes 1) Actual choice frequencies of answers;
2) Average confidence level of those who agree and disagree
with the statement.
FI (Full
Information)
Yes Yes 1) Actual choice frequencies of answers;
2) Average confidence level of those who agree and disagree
with the statement;
3) Average estimate about percent agreement with own
answer, by those who agree and disagree with the statement;
4) Average estimate about average confidence level in own
answer, by those who agree and disagree with the statement.
6
2.2 Experiment procedure
Four experimental sessions of 128 subjects in total (32 subjects per treatment), were conducted
on December 22nd, 2019 at the experimental laboratory of Beijing Foreign Studies University, with
university undergraduate students as the subject pool. The experiment was programmed and conducted
using the software z-Tree (Fischbacher, 2007).
At the beginning of each session, subjects were seated in front of their terminal and received a
copy of the experiment instructions. The terminals were isolated via partitions, and verbal
communication between subjects was not allowed. Subjects were instructed not to access their mobile
devices while participating in the experiment. After everyone understood the experiment rules and
payoff functions, the experiment could begin.
Each session lasted approximately 60 minutes, and the average payment received was 43.43 RMB
per subject (including a 10 RMB show-up fee), which is within the standard range of payment for
experiment participation in mainland China.
3. Analysis of Performance
3.1 Overall evaluation
Figure 1 (left panel) presents the mean of correct rates of each stage separated by treatment. It is
clear that there is no significant difference across treatments in the original mean of correct rates, which
is close to 0.5.2 In fact, the correct answer for 24 out of the 50 questions is “True”, that is, a participant
can achieve the average performance by always choosing “True” (or “False”).
However, the gaps in the correct rates across treatments are widened in the second stage, as seen
in the right panel. With the answer revision possibility, the final correct rates in the three information
provision treatments are all strictly higher than that in the control group (No-Information treatment),
which shows that information generally plays a positive role. On the other hand, the correct rate is not
monotonically increasing in the amount of information provision, showing an inverse-U shape as a
function of information provided.
The information provided in the Low-Information treatment only leads to a subtle improvement
in subjects’ performance, suggesting that merely knowing the majority opinion renders little help. The
correct rate in the Moderate-Information treatment is the highest, indicating that knowing the average
confidence of those who agree and disagree with the statement is effective for answer accuracy.
However, the additional information about other respondents’ estimates and beliefs as provided in the
Full-Information treatment, worsens the outcome compared with the Moderate-Information treatment.
While the Figures are shown to illustrate the aggregate result, detailed statistical tests are provided in
the next section.
2 Specifically, for two-sided t-tests at the question level: No-Information vs Low-Information: p=0.524; No-Information vs Moderate-
Information: p=0.235; No-Information vs Full-Information: p=0.594; Low-Information vs Moderate-Information: p=0.403; Moderate-
Information vs Full-Information: p=0.436; Low-Information vs Full-Information: p=0.915. Two-sided Welch t-test at the subject level:
No-Information vs Low-Information: p=0.698; No-Information vs Moderate-Information: p=0.372; No-Information vs Full-Information:
p=0.762; Low-Information vs Moderate-Information: p=0.542; Moderate-Information vs Full-Information: p=0.499; Low-Information
vs Full-Information: p=0.929.
7
Figure 1: Mean of Correct Rates by Treatment Note: Confidence intervals are shown at the 95 percent level.
Figure 2 presents the standard deviation of correct rates at the question level (top panels) and at
the subject level (bottom panels) respectively. Compared with the first stage (left panels), the variation
in the question-based correct rates becomes larger in the second stage (right panels), while the variation
in the subject-based correct rates experiences a moderate decline. Later, we show that being provided
with the chance to revise answers leaves the performance even more disperse by individual questions.
In particular, the correct rates of difficult questions are further lowered, because the minority who
originally give the right answer are misled by the majority opinion in the second stage. On the other
hand, the correct rates of easy questions are further raised, because the minority who originally give
the incorrect answer benefit from the wisdom of crowds in the second stage. Furthermore, the
performance at the subject level becomes more uniform with information disclosure. Subjects who
initially perform outstandingly become worse off, while those who initially perform poorly then
improve in the second stage.
Figure 2: Standard Deviation of Correct Rates by Treatment
8
3.2 Question-based Analysis
We use the ranking of question difficulties based on performance in the No-Information treatment
throughout the analysis below. For example, questions such as “The last name of Confucius is Zi.”
(Question 50) and “The Italian scientist and astronomer Copernicus was burnt to die for his adherence
to the heliocentric theory.” (Question 22) are difficult for subjects, with correct rates below 20%.
Examples of easy questions are “The Pyrenees is a natural border between Spain and France.”
(Question 20) and “The father of Dayu, who is the hero of controlling flood in the history of China,
was called Gu.” (Question 23), for which the correct rates are above 80%.
Given that there are no significant differences in the correct rates across treatments in the first
stage, we focus our comparisons on the second stage, and can attribute the outcome differences to the
information disclosure process. In the analyses below, we present results from the raw data and use
both distributional and means tests to examine the statistical differences.
3.2.1 Treatment Effects of Information
We first compare the three information provision treatments with the baseline treatment to test
for the impact of information provision.
Recall that in the Low-Information treatment, for each question, subjects have access to the
information about the percentage of subjects in the No Information treatment who agree or disagree
with the statement. Figure 3 (upper left panels) depict the gap in correct rates of each question in the
Low-Information treatment and the No-Information treatment (upper sub-panel). Subjects in the Low-
Information treatment perform better on half of questions and worse on the other half. Since we can
observe what subjects’ original answers in the first stage before information disclosure, we also plot
the change in correct rates within the Low-Information treatment (lower sub-panel). It turns out to
share a similar pattern with the gap, which reinforces that there are no systematic differences among
participants in the two treatments.
9
Figure 3: Comparisons between the Information Treatments and the No-Information
Treatment
In the Moderate-Information treatment, for each question, subjects obtain the information not
only about the actual choice distribution of answers in the No-Information treatment, but also about
the average confidence levels of those who agree and disagree with the statement, respectively. It is
clear from Figure 3 (upper right panel) that subjects in the Moderate-Information treatment perform
better on most of the easy questions and partially better on difficult questions compared with those in
the No-Information treatment. Also, they improve their second-stage performance within the
Moderate-Information treatment (lower sub-panel), mainly on the relatively straightforward questions,
which indicates that information about other participants’ confidence levels helps subjects make more
frequent correct judgements.
In the Full-Information treatment, the information revealed in the second stage includes the actual
choice distribution of answers, the average confidence of those who agree or disagree with the
statement, as well as the average estimate of the percent agreement with own answer, and average
estimate of the mean confidence from the No-Information treatment. Once again, subjects in the Full-
Information treatment perform worse on difficult questions and better on easy questions compared
with those in the No-Information treatment as well as compared to the first stage within treatment
(Figure 3, bottom panel).
As for the general comparison, the statistical tests indicate significant differences in final correct
10
rates between the No-Information and Moderate- (Wilcoxon matched-pairs signed-ranks test, p=0.002;
one sided t-test, p=0.001) or Full-Information treatments (Wilcoxon matched-pairs signed-ranks test,
p=0.009; one sided t-test, p=0.004), yet no evidence is found for the comparison between the Low-
Information treatment and the No-Information treatment (Wilcoxon matched-pairs signed-ranks test,
p=0.333; two sided t-test, p=0.359).
To summarize the observed treatment effects of information:
Result 1: Information at all levels (LI, MI, FI) improves performance on easy questions but does not
necessarily improve performance on difficult questions. Consequently, information provided about
others’ choices and beliefs creates a more disperse distribution of correct rates across questions.
3.2.2 Comparisons Across Information Conditions
The previous subsection discussed the general treatment effects of information compared to the
No-Information control. Here, we conduct comparisons across the three different information
treatments to test the marginal impacts of increasing information content.
When comparing the second-stage performance of subjects between the Low-Information
treatment and the Moderate-Information treatment, Figure 4 (upper left panel) shows that providing
information about subjects’ confidence levels improves the correct rates of difficult questions. The
distribution test demonstrates a significant difference between the two treatments (Wilcoxon matched-
pairs signed-ranks test, p=0.025). Meanwhile, the means test also verifies that subjects in the
Moderate-Information treatment perform better than those in the Low-Information treatment on
average (one-sided t-test, p=0.005).
Also illustrated in Figure 4 (upper right panel), subjects in the Full-Information treatment
generally perform worse than those in the Moderate-Information treatment (Wilcoxon matched-pairs
signed-ranks test, p=0.105; one sided t-test, p=0.044). One possibility is that the additional information
about former participants’ estimates and beliefs confuses the subjects in the Full-Information treatment.
The correct rates of difficult questions are especially negatively affected. Overall, more information
seems to be counterproductive at this level, in terms of improving correct answers.
Although the positive influence of the information about subjects’ confidence levels is partially
offset by that of subjects’ estimates, subjects in the Full-Information treatment outperform those in the
Low-Information treatment overall (Figure 4, bottom panel). There is a significant distributional
difference between the two treatments at the 10% level (Wilcoxon matched-pairs signed-ranks test,
p=0.073). Meanwhile, the means test shows that subjects in the Full-Information treatment perform
significantly better than those in the Low-Information treatment (one-sided t-test, p=0.021). The final
correct rates of difficult questions are relatively higher in the Full-Information treatment, while there
is little difference in terms of the easy questions (Figure 4, bottom panel).
Thus, we further observe that in terms of marginal treatment effects of information:
Result 2: The Moderate-Information condition yields the best performance out of the three information
treatments, mainly due to improvement on difficult questions. Though the Full- Information yields
lower overall performance than the Moderate- information treatment, it still generates higher
performance on difficult questions than Low- Information treatment, compensating for its negative
11
impact on some of the easy questions.
Figure 4: Comparisons between Information Treatments
3.3 Subject-based Analysis
While the previous subsections focused on question-based effects of information provided,
another relevant angle to examine is the effects on individual subjects’ performance. In this section,
we analyze the performances across the treatments in terms of individual subject initial performance,
rather than by question difficulty. In particular, such analysis can inform us about the degree to which
originally well-performing or poorly-performing individuals are assisted by the various information
treatments.
3.3.1 Treatment effects
Firstly, the Wilcoxon rank-sum test shows no significant difference in subjects’ performance
between the No-Information treatment and the Low-Information treatment (p=0.427). The two-sided
Welch t-test for unpaired data also demonstrates that the mean of subjects’ correct rates in the No-
Information treatment is not significantly different from that in the Low-Information treatment
(p=0.404).
Compared with those in the No-Information treatment, subjects in the Moderate-Information
treatment appear to perform better on the whole. The distribution test shows that there exists a
significant difference in correct rates between the two treatments (Wilcoxon rank-sum test, p=0.001).
Besides, the mean of correct rate in the No-Information treatment is significantly lower than that in
the Moderate-Information treatment (one-sided Welch t-test, p<0.001), which is consistent with the
question-based results. We also observe that information mainly helps improve the performance for
the lower end of the performance distribution, i.e., the subjects whose original correct rates are
12
relatively low, rather than for the higher end. This is intuitive since subjects in the higher end of the
distribution have higher knowledgeability, and therefore the choices of others are limited in terms of
the help provided. However, subjects from the lower end of the initial performance distribution can
more likely benefit from the wisdom of crowds, by inferring more correct answers from other
respondents’ choices.
Similar to the results from the question-based comparison between the No-Information treatment
and the Full-Information treatment, subjects in the Full-Information treatment perform significantly
better than those in the No-Information treatment (Wilcoxon rank-sum test, p=0.038; one-sided Welch
t-test, p=0.016). Meanwhile, the additional information still mainly improves the performance of
subjects from the lower end of the performance distribution, as we will see further in the next section.
Further evidence can be discerned from the Pearson correlation coefficients between the changes
in individual correct rates within each treatment (second stage - first stage) (Low-Information: corr =
-0.288, p = 0.109; Moderate-Information: corr = -0.562, p = 0.001; Full-Information: corr = -0.483, p
= 0.005). Thus, we can observe that Moderate or Full-Information significantly improves performance
of subjects with relatively low knowledge levels.
We also examine the Pearson correlation coefficients between the changes in correct rates and
subjects’ own average confidence, finding that performance improvements concentrate among less
confident subjects (Low-Information: corr = -0.368, p = 0.038; Moderate-Information: corr = -0.367,
p = 0.039; Full-Information: corr = 0.158, p = 0.387).3
Our main results in this section can be summarized as follows:
Result 3: Performance improves from first to second stage in the Moderate and Full-Information
treatments, at an individual subject level. The individual performance improvements are negatively
correlated with initial performance and initial confidence, indicating that the performance gains are
primarily concentrated among subjects with low initial knowledge and confidence levels.
3.3.2 Comparisons Across Information Conditions
In Figure 5, subjects are ordered by their correct rates in the first stage from low to high (left to
right) in each information treatment. Their performance is normalized by subtracting the mean of
correct rate of the baseline treatment, NI (0.482).
Information about the average confidence level lends a substantial advantage to the less
knowledgeable subjects in the Moderate-Information treatment, whereas subjects in the Low-
Information treatment did not seem to benefit from the additional information in their treatment.
Between the two treatments, there are significantly different distributions (Wilcoxon rank-sum test,
p=0.001) and mean values (one-sided Welch t-test, p<0.001).
Although the additional information in the Full-Information treatment about other respondents’
estimates and beliefs does not raise the correct rates at the question-level compared with the Moderate-
Information treatment based on the previous analyses, it does contribute to a significant change in the
3 Note that throughout the paper, we define difficult questions and knowledgeable individuals based on the correct rates
rather than the self-reported confidence level. We could also take confidence level as an ex-ante measure of question
difficulty and knowledge. The choice between the two measures depends on our focus: questions are objectively difficult
(and subjects are objectively knowledgeable), or the subjects think the questions are difficult (subjects regard themselves
as knowledgeable). Since we are interested in the wisdom of crowds as an information mechanism, we focus on the former.
13
distribution of individual subjects’ correct rates (Wilcoxon rank-sum test, p=0.086). Specifically, the
average correct rate is lower (one-sided Welch t-test, p=0.045) and the variance of correct rates is larger
in the Full-Information treatment, as shown in Figure 2 (bottom panels).
As for the subject-level comparison between the Low-Information treatment and the Full-
Information treatment, the distributional test (Wilcoxon rank-sum test, p=0.161) does not indicate a
significant difference, while the latter treatment has a higher average correct rate (one-sided Welch t-
test, p=0.051). From Figure 5, the subjects with moderate knowledge levels are noticeably improved
in the Full-Information treatment, which is not found in the Low-Information treatment. This indicates
that the additional information about others’ confidence and estimates in the Full-Information
treatment is still beneficial compared to the basic information provided in the Low-Information
treatment.
Figure 5: Relative Correct Rate for Each Subject in Three Information Treatments
Note: Nonparametric kernel estimation with Epanechnikov function used to obtain smooth curve.
These observations lead us to the following result regarding the marginal effects of information
levels on subjects with heterogeneous initial knowledge levels:
14
Result 4: Subjects with relatively low initial knowledge levels benefit most from the moderate amount
(MI) of information.
To more directly examine the relationship between the effects of different levels of information
across different initial knowledge levels, we subdivide the subjects of each information treatment into
three knowledge groups of approximately equal size, based on their original correct rates as follows:
11 individuals in the low knowledge group, 10 individuals in the moderate knowledge group, and 11
individuals in the high knowledge group. The average improvement in correct rates (second stage -
first stage) for the three groups are 0.022, 0.012, and -0.002 respectively in the Low-Information
treatment, 0.116, 0.054, and 0.022 respectively in the Moderate-Information treatment, and 0.075,
0.058 and 0.005 respectively in the Full-Information treatment. Thus, in each information treatment
there is a negative monotonic relationship between initial knowledge and information-driven
improvement, with statistical test results as follows.
For the cross-group comparisons between the Moderate-Information treatment and the Low-
Information treatment, both the distribution test and means test demonstrate that the additional
information about the average confidence level helps the less knowledgeable subjects have
significantly better performance in the second stage relative to the first stage (low knowledge group:
𝑝𝑊𝐶𝑋<0.001, 𝑝𝑇1<0.001; for the moderate knowledge group: 𝑝𝑊𝐶𝑋=0.083, 𝑝𝑇1=0.047; and for the
high knowledge group: 𝑝𝑊𝐶𝑋=0.208, 𝑝𝑇2=0.225), where 𝑝𝑊𝐶𝑋 is the p-value of the Wilcoxon rank-
sum test (two-sided), 𝑝𝑇1 is the p-value of the one-sided Welch t-test, and 𝑝𝑇2 is the p-value of the
two-sided Welch t-test). Yet, no significant treatment effects are detected across the three groups
between the Moderate-Information treatment and the Full-Information treatment (low knowledge
group: 𝑝𝑊𝐶𝑋 =0.155, 𝑝𝑇2 =0.133; moderate knowledge group: 𝑝𝑊𝐶𝑋 =0.879, 𝑝𝑇2 =0.905; high
knowledge group: 𝑝𝑊𝐶𝑋=0.320, 𝑝𝑇2=0.463).
Finally, the subjects with a lower or middle level of knowledge in the Full-Information treatment
experience greater improvement in the second stage on average, compared with their counterparts in
the Low-Information treatment (low knowledge group: 𝑝𝑊𝐶𝑋=0.014, 𝑝𝑇1=0.014; moderate knowledge
group: 𝑝𝑊𝐶𝑋=0.268, 𝑝𝑇1=0.063; high knowledge group: 𝑝𝑊𝐶𝑋=0.921, 𝑝𝑇2=0.753).
3.4 Regression Analysis, Performance
We find further support for the basic results above through an econometric analysis of the link
between the information attributes and the final performance. Logit regressions of the binary outcome
of an individual giving the correct answer in the second stage are provided in Table 2.
Note that in the regressions, the independent variables are transformed from the objectively
displayed information in the experiment, to information framed from the perspective of the subject’s
own answer: consistent with the subject’s answer, and inconsistent with the subject’s answer,
respectively. This is because what subjects presumably care about in the answer revision decision, is
from the perspective of their own original answer. For example, in the Low-Information treatment,
subjects likely pay attention to how many participants have the same answer as their own, rather than
the raw information about the proportion of participants agreeing with the original statement, as
presented on the screen. For each subject, we define his SameGroup as being composed of those who
15
agree with the subject’s answer to the corresponding question in the No-Information treatment, and a
subject’s OppoGroup as being composed of those who disagree with the subject’s answer in the
corresponding question from the No-Information treatment.
Table 2 documents the regression results of information treatments and other relevant variables
on final performance. Cfdc denotes subjects’ self-reported confidence in their own answer, EstiSame
denotes their estimate of others’ giving the same answer, EstiCfdc denotes their estimate of the average
confidence in “own answer” of all participants, RealSame denotes the proportion of subjects having
the same choice as their own. Subjects’ confidence levels, knowledge levels and the ease of questions
are consistently positively associated with final performance.
Among the information treatments, the regressions help confirm the prior observation that
performance is the best in the Moderate-Information treatment (note Columns 2 and 5, differing by
inclusion of fixed effects). The negative coefficients on the interaction terms between the actual choice
distribution (RealSame) and information treatment dummies indicate that the negative relationship
between the percent agreement with own answer (RealSame) and the accuracy of answers, is
strengthened by information provision (Columns 3 and 6, differing by inclusion of fixed effects), the
rationale being that other factors equal, a larger proportion of participants having the same view as the
decision-maker boosts their confidence in the original answer. However, examining the coefficient
magnitudes and significance levels, such a relationship appears to weaken with greater levels of
information provided.
Table 2: Logit Regression: Dependent Variable: Correct Answer in 2nd Stage
Dependent variable:
I(NewTorF)
Marginal effects
(1) (2) (3) (4) (5) (6)
Cfdc 0.237*** 0.234*** 0.263*** 0.300*** 0.300*** 0.347***
(0.044) (0.044) (0.083) (0.050) (0.050) (0.098)
EstiSame -0.215*** -0.226*** -0.176* -0.238*** -0.238*** -0.246**
(0.047) (0.047) (0.095) (0.051) (0.051) (0.116)
EstiCfdc -0.051 -0.047 -0.043 -0.117 -0.117 -0.087
(0.066) (0.066) (0.126) (0.075) (0.075) (0.160)
RealSame -0.147*** -0.142*** 0.001 -0.181*** -0.181*** -0.060
(0.035) (0.035) (0.086) (0.043) (0.043) (0.094)
LI 0.016 0.297*** 0.037 0.267**
(0.014) (0.096) (0.084) (0.131)
MI 0.071*** 0.247** 0.164** 0.310**
(0.015) (0.099) (0.075) (0.128)
FI 0.044*** 0.135 0.131* 0.238*
(0.015) (0.096) (0.077) (0.128)
Cfdc×LI 0.120 0.086
(0.121) (0.138)
Cfdc×MI -0.175 -0.206
(0.125) (0.140)
16
Cfdc×FI -0.043 -0.077
(0.120) (0.135)
EstiSame×LI 0.009 0.086
(0.129) (0.149)
EstiSame×MI -0.199 -0.108
(0.142) (0.157)
EstiSame×FI -0.043 0.028
(0.128) (0.147)
EstiCfdc×LI -0.383** -0.373*
(0.179) (0.212)
EstiCfdc×MI 0.269 0.231
(0.192) (0.218)
EstiCfdc×FI 0.101 0.040
(0.182) (0.211)
RealSame×LI -0.191* -0.174*
(0.106) (0.103)
RealSame×MI -0.187* -0.135
(0.109) (0.106)
RealSame×FI -0.183* -0.161
(0.107) (0.104)
Knowledge 0.761*** 0.738*** 0.757***
(0.055) (0.056) (0.056)
Easiness 1.127*** 1.128*** 1.129***
(0.015) (0.015) (0.015)
Individual fixed effects N N N Y Y Y
Question fixed effects N N N
Y Y Y
Observations 6,400 6,400 6,400 6,400 6,400 6,400
Pseudo R2 0.269 0.272 0.275 0.311 0.311 0.313
Notes: Logit estimation of the effect of information provision on the final answer accuracy. Dependent variable is whether a subject
correctly answered a certain question in the second stage. Cfdc denotes subjects’ self-reported confidence in their own answer, EstiSame
denotes their estimate of others’ giving the same answer, EstiCfdc denotes their estimate of the average confidence in own answer of all
participants, RealSame denotes the proportion of subjects having the same choice as their own, LI, MI and FI are dummy variables for
the Low-Information, Moderate-Information and Full-Information treatment respectively, Knowledge and Easiness refer to the aggregate
correct rate of each subject in the first stage and the aggregate correct rate of each question in the No-Information treatment respectively.
Coefficients displayed are marginal effects. Robust standard errors are displayed in parentheses. *p<0.1, **p<0.05, ***p<0.01.
3.5 Summary of Information Content
Before investigating the underlying mechanisms for the observed results via subjects’ decision
heuristics, we first provide a comprehensive understanding of some features of the information
generated from the baseline (No-Information) treatment, which is the information that subjects in the
information treatments base their second round decisions upon.
Subjects’ real confidence levels are notably significantly higher than their estimates of the mean
17
confidence level (Wilcoxon matched-pairs signed-ranks test, p<0.001), which implies that individuals
are typically more confident than they believe others are. In addition, the average actual confidences
in some difficult questions are even much higher than those in the easy questions. For example, all
subjects in the No-Information treatment choose “False” for the question “The last name of Confucius
is Zi.” with the average confidence level as high as 90.13%, which is actually a true statement. 4 Such
questions are not as simple as they seem, such that subjects are often not knowledgeable about the
underlying difficulty level.
Since our design provides separate information for the respondents agreeing and disagreeing with
the statement, we analyze the Right group composed of the subjects who answer the question correctly,
and the Wrong group composed of the subjects who answer the question incorrectly.
For all questions, the average confidence level of the Right group is 74.01%, which is 2.07%
higher than that of the Wrong group (71.94%) (Wilcoxon matched-pairs signed-ranks test, p=0.013).
However, for some difficult questions such as (translated from Chinese) “The first woman to proclaim
herself emperor in the history of China is Zetian Wu.” (Question 47), “Severe winter refers to the
twelfth month of the lunar calendar.” (Question 46) and “The Italian scientist and astronomer
Copernicus was burned to die for his adherence to heliocentric theory.” (Question 22), the Wrong group
are more confident than the Right group.
The pattern for the average estimated mean confidence is similar to that for average real
confidence, however the difference between the two groups’ estimates is not statistically significant
(Wilcoxon matched-pairs signed-ranks test, p=0.201). As a consequence, subjects in the Right group
are more likely to exhibit overconfidence in their answers, (i.e., the average real confidence level is
higher than the average estimated confidence level, Wilcoxon matched-pairs signed-ranks test,
p<0.001), yet the phenomenon is not significant in the Wrong group (Wilcoxon matched-pairs signed-
ranks test, p=0.141).
We also examine the overconfidence phenomenon at the individual level. We provide an overall
measure for each subject by defining the overconfidence degree as the percentage deviation of his
average self-reported confidence from his original correct rate ((average confidence- original correct
rate)/ original correct rate). The overconfidence degree ranges from 0.02 to 1.35 in the No-Information
treatment, and those who exhibit severe overconfidence are typically less knowledgeable than the
average. While adopting this measurement, the overconfidence degree of the Right group is lower than
that of the Wrong group (Wilcoxon matched-pairs signed-ranks test, p<0.001).5 Therefore, the subjects
in the Right group appear more overconfident than those in the Wrong group if we compare their self-
reported confidences with their beliefs on the average confidence level of others, while they appear
less overconfident if we compare their self-reported confidences with their actual performance.
In addition to the confidence level, we also collect each subject’s estimate of the proportion of
participants who have the same choice as their own. Subjects tend to overestimate the proportion of
those having the same choice as their own (Wilcoxon matched-pairs signed-ranks test, p<0.001 in both
the Right and Wrong group). The average for each question is above 50%, and there are very few cases
4 We do not have the relevant information about the average confidence, as well as the average estimates and beliefs of
those who agree with the statement from the No-information treatment for this particular question. 5 This result still holds if we define the overconfidence degree as the difference between a subject’s average self-reported
confidence and correct rate.
18
in the raw data in which a subject predicts the percent agreement with his own answer for a given
question among other subjects is below 50%. In other words, nearly every subject believes that the
majority of other respondents choose similarly to their own answer, and such consensus bias (Marks
and Miller, 1987; Brosig et al., 2003) also prevails in our information treatments.
Result 5. Subjects tend to be overconfident about their self-reported correct rate compared to their
actual performance. Furthermore, they tend to overestimate the proportion of participants having the
same choice as their own.
3.6 Investigation of Decision Heuristics
We now evaluate three simple heuristics in the context of our experiment: the Majority Rule (MR),
the Maximum Confidence rule (MC), and the Surprising Popularity rule (SP) (Prelec et al., 2017).
We consider an answer as in the “majority” (MR) if at least 50% of respondents in the No-
Information treatment selected it as their response. We categorize an answer as consistent with
Maximum Confidence (MC) if the average confidence level of subjects agreeing with it is not lower
than that of subjects disagreeing with it; Finally, an answer satisfies the Surprising Popularity (SP) rule
if the actual proportion of subjects choosing it is not lower than the average predicted proportion.6
The Surprising Popularity rule has the potential to produce the best answer under certain
behavioral assumptions, as found in Prelec et al. (2017). As an illustration, in the classic Philadelphia
question (Prelec et al., 2017), most respondents mistakenly regard Philadelphia to be the capital of
Pennsylvania. The confidences associated with ‘yes’ and ‘no’ votes are roughly similar, which results
in the failure of both the Majority Rule and the Maximum Confidence rule. However, respondents
voting ‘yes’ believe that most others will agree with them, while respondents voting the correct answer
‘no’ expect that few people possess such specialized knowledge as them, so that the average predicted
percentage of ‘no’ votes falls short of the actual percentage of ‘no’ votes. Therefore, the answer ‘no’
turns out to be surprisingly popular, while it is in fact the right answer.
Ideally, as shown in Prelec et al. (2017), as long as the subjects have enough evidence to determine
the correct answer, the Surprising Popularity rule is superior to the Maximum Confidence rule and
Majority Rule, since it can extract more information from the available evidence. However, this
heuristic is in practice not as successful in our experiment, because empirically, in our No-Information
treatment, subjects choosing “True” or “False” always expect to be in the majority among the
respondents, a pattern which is also encountered in the experiments of Baillon et al. (2020), which
makes the Surprising Popularity rule incapable of overriding an incorrect majority answer.
Figure 6 compares the accuracies of these three decision rules (MR, MC, SP) in our experiment.
On the vertical axis, a value of 1 (solid bar) indicates that rule gives the true answer, while a value of
0 indicates that the rule gives the incorrect answer. Majority Rule yields an overall correct rate of 52%,
which is by definition, fully concentrated on the easier questions, demonstrating the serious limitations
of this heuristic. The distribution of accuracies for the SP rule is quite similar in our experiment, with
a correct rate of 50%, and mostly concentrated on the easier questions. This does not necessarily
6 Note that the criteria “not lower than” is never binding when we define the Maximum Confidence answers and
Surprisingly Popular answers. There is only one statement (Question 1) for which the percentage of respondents agreeing
is exactly 50%, thus considering equal outcomes as satisfying majority rule does not affect the results.
19
invalidate the heuristic, but indicates that it requires the proper combination of the correct answer being
popular in practice but not in terms of beliefs over others’ answers, which apparently is not met among
our subjects and questions. Compared to the two other rules, the Maximum Confidence rule is more
uniformly distributed across the question difficulty levels. It yields an overall correct rate of 68%,
which far exceeds 98.44% (126/128) of subjects’ original performances in stage 1 of the experiment.
Fig. 6 Accuracies of Three Decision Rules
Questions ordered from difficult to easy (left to right)
The overlap of the answers provided by the three rules can also be inferred from Figure 6. The
number of questions which yield different answers between the Maximum Confidence rule and the
Majority Rule (Surprising Popularity rule) amounts to 20 (23), but the Majority Rule is difficult to
distinguish from the Surprising Popularity rule since their predictions coincide for 45 out of the 50
questions. To investigate which of the decision rules is favored most by subjects, we limit our attention
to the questions which yield different answers based on the pairwise set of rules under consideration,
and compare the rate of coincidence in subjects’ final answers to the answers to the questions using
the given rule.
We begin with the comparison between the Maximum Confidence rule and the Majority Rule,
which are potential competing approaches in the Moderate-Information and Full-Information
treatments. There are 16 (24 in FI treatment) out of 32 subjects whose answers coincide with Majority
Rule more often than coinciding with the Maximum Confidence, and 12 (6 in FI treatment) out of 32
subjects whose answers have the reverse feature favoring the Maximum Confidence rule, in the
Moderate-Information (Full-Information) treatment, which indicates that the majority rule is generally
more favored. This pattern is robust to a greater threshold gap between the rules, for example, a
“favored” rule being defined as used over 20% more often by a subject. Using this criterion, there are
8 (22 in FI treatment) subjects whose answers coincide with the Majority Rule over 20% more often,
and only 4 (4 in FI treatment) subjects having the opposite pattern in the Moderate-Information (Full-
Information) treatment.
Comparing the Maximum Confidence rule to the Surprising Popularity rule, which are possible
approaches in the Full-Information treatment, we find that more subjects favor the Surprising
20
Popularity rule (22 subjects) over the Maximum Confidence rule for the 23 eligible questions studied,
with 10 subjects having the reverse pattern. This is consistent with the earlier mentioned finding
comparing Majority Rule to Maximum Confidence, given the high similarity between Majority Rule
and Surprising Popularity in our setting.
The opportunity to distinguish between the Majority Rule and Surprising Popularity rule is limited,
due to the high coincidence of predictions between the two rules in our setting. As for the particular
five questions where the Majority Rule and the Surprising Popularity rule do imply different answers,
81.25% of the subjects choose answers consistently with Majority Rule more often. Thus, Majority
Rule appears to be the most favored heuristic behaviorally.
Table 3: Decision rules and question difficulty, as measured by confidence
Easy questions (self- reported
confidence≥75%)
Difficult questions (self-reported
confidence<75%)
Decision rules Treatment % of subjects
favoring the
former rule
% of subjects
favoring the
latter rule
% of subjects
favoring the
former rule
% of subjects
favoring the
latter rule
Majority rule vs
Max.
confidence rule
MI 53.125% 31.25% 43.75% 43.75%
FI 84.375% 15.625% 64.516% 25%
SP rule vs Max.
confidence rule
FI 68.75% 21.875% 62.5% 28.125%
Majority rule vs
SP rule
FI 70% 10% 60% 20%
To further explore whether the subjects’ reliance on a certain rule is mediated by the perceived
difficulty of questions, we examine subsamples and use self-reported confidence levels as a measure
of the question’s perceived difficulty. We set the benchmark that a question is thought to be relatively
difficult for the subject if his self-reported confidence is strictly below 75%, which is the middle value
between 50% and 100%, and also close to the mean confidence level of all the subjects (74%), while
a question is defined to be easy otherwise. The results show that no matter whether questions are
perceived as difficult or easy, when the suggested answers by any two of the three rules differ from
one another, the Majority Rule is always attractive. However, subjects are comparatively more likely
to favor the Maximum Confidence rule over the other rules for difficult questions than for easy
questions, as shown in Table 3.
To summarize the results of this section on subjects’ use of decision heuristics:
Result 6: Decision-makers’ answers are most frequently consistent with Majority Rule, where
discrepancies between the three rules exist, even though following the Maximum Confidence rule can
yield better performance.
As for the intuition behind the above result, one possible explanation is the cognitive burden
associated with the different heuristics. Determining the answer through Majority Rule is relatively
simple, while consider other subjects’ confidence levels and utilizing Maximum Confidence could
require more cognitive processing. In particular, since many questions have high mean confidences
21
reported, subjects may doubt others’ reported confidence levels or suspect overconfidence of others,
leading to discounting of this potentially valuable information.
4. Analysis of Answer Changes
The previous section analyzed how performance varies with different amounts of information
provision. To gain further insight on the effects of different information levels, we further analyze
subjects’ answer changes in stage 2 at the individual-level. We first present results from the raw data
and then construct regression models that control for variables that could potentially affect the answer
change decisions.
We define a subject’s revision rate as the frequency with which they revise their answers across
the 50 questions. The overall likelihoods of revision are 18.81% in the Moderate-Information treatment,
and 18.19% in the Full-Information treatment, both of which are significantly higher than in the Low-
Information treatment’s revision percentage of 12.94% (Wilcoxon rank-sum test, LI vs MI: p=0.004;
LI vs FI: p=0.005). We cannot reject the null hypothesis that the likelihoods of revision are the same
in the Moderate-Information treatment and the Full-Information treatment (Wilcoxon rank-sum test,
p=0.968). The relatively low revision rate in our data is in line with the egocentric judgment widely
found in the literature (e.g., Chambers & Windschitl, 2004; Yaniv & Milyavsky, 2007; Tump et al.,
2018; Niu et al., 2019), implying that subjects tend to discount public information in favor of their own
original perspective.
In addition, we define the question-based revision rate as the proportion of subjects who choose
to change their answers to a given question from their initial choice, and explore the interaction effect
between question difficulty and information attributes.
4.1 Role of Self-reported Confidence Levels
Each subject’s self-reported confidence level is a reasonable indicator of how certain they are of
the answers they chose in the first stage. It is natural that subjects who are less confident in their answer
are more likely to be influenced by the information provided, whereas more confident subjects are
likely to have stronger information requirements to revise their original answer.
We calculate the revision rates over low confidence answers (self-reported confidence < 75%)
and high confidence answers (self-reported confidence≥75%) separately for each subject and each
question.7 As seen from Figure 7, revision rates over low confidence answers exceed those over high
confidence answers, whether considering the subject level or the question level. In addition, there are
higher revision rates in the Moderate-Information and Full-Information treatments than in the Low-
Information treatment for both high and low confidence answers, as seen by comparing the rows of
Figure 7.
In Figure 8, we distinguish between high and low knowledge subjects. Around half of subjects’
knowledge is fairly well-matched with their confidence. The average revision rate is generally higher
among subjects with less knowledge. When the information amount is low or moderate, the high
7 Note that among the three information treatments, there are only two subjects who never modify their answers. Besides, all the subjects
encounter both the questions they have high confidence in and the questions they have low confidence in, except that one subject in the
Low-Information treatment always reports a confidence level below 75%. .
22
confidence subjects (average self-reported confidence≥75%) revise answers less frequently than the
low confidence ones, conditional on having low knowledge levels (original correct rate<50%). By
contrast, high knowledge subjects are more likely to revise answers conditional on having high
knowledge levels (original correct rate≥50%).
Figure 7: Revision Rate and Self-reported Confidence Notes: Subject number is ordered by subjects’ correct rates in the first stage in each treatment from low to high; Question number
is ordered by question difficulty levels in the No-Information treatment from difficult to easy.
Figure 8: Revision Rate and Consistency between Confidences and Performance
4.2 Influence of Other Respondents’ Choices
Comparing the percentage of agreement for each question across treatments, for the statements
for which over half of subjects in the No-Information treatment agree, the average rate of agreement
is 84.00% in the Low-Information treatment in the second stage (71.40% in the first stage), 72.92% in
the Moderate-Information treatment in the second stage (67.42% in the first stage), and 76.99% in the
Full-Information treatment in the second stage (67.05% in the first stage). Furthermore, the average
rates of agreement in the second stage of the three information treatments are always higher than that
in the No-Information treatment (70.83%), and that in the corresponding first stage, demonstrating a
consensus building effect of information.
23
Similarly, for the statements for which over half of subjects in the No-Information treatment
disagree, the average rate of agreement falls to only 28.13% in the Low-Information treatment in the
second stage (39.06% in the first stage), 25.39% in the Moderate-Information treatment in the second
stage (38.09% in the first stage), and 22.85% in the Full-Information treatment in the second stage
(37.70% in the first stage), which is lower than that in the No-Information treatment (35.74%). Thus,
when the majority in the No-Information treatment agree or disagree with some statements, subjects
in the information treatments tend to gravitate towards the majority’s judgement, further strengthening
it.
Examining subjects’ responses to information regarding the actual choice distribution of answers,
it is rare for subjects to change answers that are consistent with the majority, especially in the Low-
Information treatment. Revisions primarily concentrate on the answers that are not widely consistent
with others’ (see Figure 9). Given that subjects also tend to expect themselves to be in the majority,
many of them revise their answers in the second stage once learning that they are in the minority,
consistently with the Majority Rule. Conditional on having an initial answer against the majority, the
revision rates on the easiest and most difficult questions far exceed the revisions on moderate-level
questions, indicating that the strength of a majority answer also matters.
Figure 9: Revision Rates and Majority Answer Notes: Subject number is ordered by subjects’ correct rates in the first stage in each treatment from low to high; Question number
is ordered by question difficulty levels in the No-Information treatment from difficult to easy.
4.3 Influence of Other Respondents’ Confidence Levels
For subjects in the Moderate-Information treatment, a straightforward way to utilize the
information provided about average confidence levels is to compare the average confidence of those
who agree and disagree with the statement, and favor the answer with higher average confidence, i.e.,
the Maximum Confidence rule.
As shown in Figure 10, while having an answer consistent with the majority is clearly influential,
the revision rate also depends on the average confidence level. Subjects seldom revise their answers
24
which are simultaneously in line with both a majority answer and maximum confidence, while being
most likely to revise answers that are in the minority and minimum confidence category.8
Figure 10: Revision Rate in the Moderate-Information and Full-Information Treatment Notes: Subject number is ordered by subjects’ correct rates in the first stage in each treatment from low to high; Question number
is ordered by question difficulty levels in the No-Information treatment from difficult to easy.
The patterns of the Full-Information treatment differ slightly, in that subjects tend to put relatively
more weight on the majority answer. The revision rates of subjects whose answers are in the minority
but with maximum confidence are much higher than those of subjects whose answers are in the
majority but with minimum confidence, when compared to the Moderate-Information treatment. One
possible explanation is that some subjects in the Full-Information treatment might have difficulty
processing all the information provided leading them to focus on the information about the majority
answer.
From the above results on determinants of answer revision in the second stage, we can summarize:
Result 7: Although subjects’ own confidence level matters, subjects are significantly influenced by
the answers and confidences of others in their answer revision choices.
8 In fact, 60% of the majority answers are associated with higher average confidence.
25
4.4 Regression Analysis, Revision Decisions
Now we have established some basic findings regarding under what conditions individuals
change their judgments against differing degrees of persuasive information. To test whether the results
in the raw data are robust to conditioning on control variables, we conduct regression analyses.
Specifically, we regress the binary variable of a subject revising his answer to a question in the second
stage, on the provided information and treatment variables.
The regression results reported in Table 4 show that subjects’ own confidence levels (Cfdc) and
the actual percentage of participants having the same answer as their own (RealSame) show
consistently negative effects on revision decision at the 1% level in the information treatments. If we
measure the difficulty levels of questions (Easiness) by the corresponding correct rates in the No-
Information treatment, and measure the knowledge levels of subjects (Knowledge) by their initial
correct rates in the first stage, we find that knowledgeable subjects are less affected by the information
provision - yet we do not find strong evidence that the tendency to revise answers differs by the
difficulty level of questions. The revision decisions in the Moderate-Information and Full-Information
treatments are less affected by the actual choice distribution of answers compared to the Low-
Information treatment, regardless of whether controlling for individual and question fixed effects.
These results are largely consistent with our basic findings described in the previous section.
Table 4: Logit Regression of Revision Decisions
Dependent variable:
I(Revision)
Marginal effects
(1) (2) (3) (4) (5) (6)
Cfdc -0.319*** -0.324*** -0.485*** -0.333*** -0.333*** -0.448***
(0.043) (0.043) (0.088) (0.046) (0.046) (0.103)
EstiSame 0.043 0.039 -0.130* 0.075 0.075 0.041
(0.041) (0.041) (0.076) (0.046) (0.046) (0.093)
EstiCfdc 0.115* 0.102 0.406*** -0.050 -0.050 0.017
(0.063) (0.064) (0.123) (0.070) (0.070) (0.147)
RealSame -0.759*** -0.753*** -1.176*** -0.719*** -0.719*** -1.146***
(0.022) (0.023) (0.078) (0.024) (0.024) (0.081)
MI 0.066*** -0.228*** -0.041 -0.420***
(0.012) (0.081) (0.061) (0.115)
FI 0.050*** -0.138 -0.067 -0.278**
(0.011) (0.084) (0.052) (0.113)
Cfdc×MI 0.192* 0.125
(0.115) (0.129)
Cfdc×FI 0.183 0.133
(0.112) (0.125)
EstiSame×MI 0.229** 0.048
(0.102) (0.120)
EstiSame×FI 0.217** 0.036
(0.103) (0.119)
EstiCfdc×MI -0.393** -0.031
26
(0.160) (0.186)
EstiCfdc×FI -0.364** -0.099
(0.164) (0.183)
RealSame×MI 0.657*** 0.649***
(0.093) (0.094)
RealSame×FI 0.417*** 0.411***
(0.099) (0.097)
Knowledge -0.228*** -0.246*** -0.275***
(0.054) (0.053) (0.055)
Easiness 0.045 0.043 0.041
(0.029) (0.029) (0.028)
Individual fixed effects N N N Y Y Y
Question fixed effects N N N Y Y Y
Observations 4,800 4,800 4,800 4,700 4,700 4,700
Pseudo R2 0.259 0.267 0.286 0.355 0.355 0.372
Notes: Logit estimation of the effect of information provision on the probability of revision. Dependent variable is whether a subject
revised his answer to a particular question in the second stage. Cfdc denotes subjects’ self-reported confidence in their own answer,
EstiSame denotes their estimate of others’ giving the same answer, EstiCfdc denotes their estimate of the average confidence in own
answer of all participants, RealSame denotes the proportion of subjects having the same choice as their own, MI and FI are dummy
variables for the Moderate-Information treatment and the Full-Information treatment respectively, Knowledge and Easiness refer to the
aggregate correct rate of each subject in the first stage and the aggregate correct rate of each question in the No-Information treatment
respectively. Coefficients displayed are marginal effects. Robust standard errors are displayed in parentheses. *p<0.1, **p<0.05,
***p<0.01.
The regressions also demonstrate that subjects’ own beliefs are important as well, which
influences the way subjects use the Surprising Popularity rule. Recall that the Surprising Popularity
algorithm compares the average predicted percentage of “True” answers with the actual percentage of
“True” answers. However, subjects have also established their own predictions in the first stage. They
might be surprised at the “False (True)” answer if the actual percentage of the “True (False)” answers
is lower than their own predictions. In other words, it is likely that they put more weight on their own
judgement than the group’s consensus, and would like to revise their answers if the opposite answer is
subjectively surprising to them.
Such a Surprising Popularity rule (which we refer to as an ‘egocentric SP rule’) based on the
private prediction and the actual choice distribution of answers is possible no matter whether in the
Low-Information, Moderate-Information or Full-Information treatment. Since the standard SP rule
produces the same answer as the majority rule in most cases, and we find little evidence for its
application in the Full-Information treatment, we examine the egocentric SP rule in this section as a
variation of the Surprising Popularity rule.
To test whether the three heuristic rules are utilized and which one is the most commonly used in
subjects’ revision decisions, we define three dummy variables at the individual level: IsMajority,
GroupCfdc_IsLower, IsSurprising. To be specific, IsMajority=1 if the subject’s answer in the first
stage turns out to be in the majority, i.e., RealSame ≥ 50%, otherwise IsMajority=0;
GroupCfdc_IsLower=1 if the average confidence level of his SameGroup is strictly lower than that of
27
his OppoGroup, i.e., SameGroup_Cfdc< OppoGroup_Cfdc, otherwise GroupCfdc_IsLower=0;
IsSurprising=1 if the actual percentage of participants having the same answer as his is strictly lower
than the subject’s own prediction (the opposite answer emerges to be unexpectedly popular), i.e.,
RealSame< EstiSame, otherwise IsSurprising=0.
The Logit regressions for the Low-Information treatment in detail are shown in Table 5. The
coefficients of the confidence variable are again, always significant at conventional levels (p<0.001).
Adding the variable of the actual choice distribution of answers (Column 2) or the majority indicator
(Column 3) absorbs more variation and largely improves the Pseudo R2. However, further adding the
IsSurprising variable and its interaction term yields statistically insignificant coefficients, and leaves
the Pseudo R2 slightly changed (Columns 4-5), thus there is little evidence for the use of the egocentric
SP rule.
Table 5: Logit Regression of Revision Decisions, Low-Information Treatment
Dependent variable:
I(Revision)
Marginal effects
(1) (2) (3) (4) (5)
Cfdc -0.279*** -0.284*** -0.266*** -0.264*** -0.264***
(0.092) (0.073) (0.072) (0.072) (0.072)
EstiSame -0.180* -0.026 -0.052 -0.085 -0.083
(0.094) (0.070) (0.068) (0.073) (0.076)
EstiCfdc -0.019 0.040 0.036 0.047 0.046
(0.138) (0.114) (0.113) (0.110) (0.111)
RealSame -0.932***
(0.111)
IsMajority -0.321*** -0.279*** -0.286***
(0.016) (0.033) (0.087)
IsSurprising 0.057 0.050
(0.042) (0.083)
IsMajority×
IsSurprising
0.009
(0.090)
Individual fixed effects Y Y Y Y Y
Question fixed effects Y Y Y Y Y
Observations 1,488 1,488 1,488 1,488 1,488
Pseudo R2 0.142 0.577 0.569 0.571 0.571
Notes: Logit estimation of the effect of information provision on the probability of revision. Dependent variable is whether a subject
revised his answer to a certain question in the second stage. Cfdc denotes subjects’ self-reported confidence in their own answer, EstiSame
denotes their estimate of others’ giving the same answer, EstiCfdc denotes their estimate of the average confidence in own answer of all
participants, RealSame denotes the proportion of subjects having the same choice as their own, IsMajority and IsSurprising are indicator
variables. Robust standard errors are displayed in parentheses. *p<0.1, **p<0.05, ***p<0.01.
In the Moderate-Information treatment, we find that the average confidence of the participants
who share the same answer with the subject (SameGroup_Cfdc) and the average confidence of those
28
who do not (OppoGroup_Cfdc) are also influential (Column 2 of Table 6). Intuitively, they function in
opposite directions with similar magnitude. A one-standard-deviation (0.099) increase in the average
confidence of those giving the same answer is associated with a 0.125 drop in the probability of
revision, and a one-standard-deviation (0.098) increase in the average confidence of those giving the
opposite answer enhances the probability of answer revision by 0.123. Again, the variables IsMajority
and GroupCfdc_IsLower can explain more of the observed revision variations, and the coefficients of
variables related to IsSurprising are all not significant (Columns 3-6 of Table 6).
In addition, as we have seen in Figure 10, subjects in the Moderate-Information treatment
generally view the Majority answer and Maximum Confidence answer as complements. The related
two approaches are also well-reflected in the marginal impact on revision decisions (Column 5 of Table
6). The odds ratio of revising from the minority answer to the majority answer (or revising from the
minimum confidence answer to maximum confidence answer) is around 12. The estimates of the
interaction terms in Column 6 suggest that the marginal impact of having an answer with lower average
confidence on the rate of revision, if their answer is in the majority, is also significantly larger than if
their answer is in the minority.
Table 6: Logit Regression of Revision Decisions, Moderate-Information Treatment
Dependent variable:
I(Revision)
Marginal effects
(1) (2) (3) (4) (5) (6)
Cfdc -0.479*** -0.301*** -0.445*** -0.444*** -0.348*** -0.361***
(0.107) (0.093) (0.097) (0.097) (0.091) (0.091)
EstiSame -0.069 0.022 0.054 0.043 -0.069 -0.054
(0.104) (0.084) (0.094) (0.099) (0.091) (0.094)
EstiCfdc 0.138 -0.012 0.030 0.030 0.071 0.062
(0.151) (0.134) (0.138) (0.138) (0.129) (0.129)
RealSame -0.656***
(0.054)
SameGroup_Cfdc -1.260***
(0.221)
OppoGroup_Cfdc 1.253***
(0.231)
IsMajority -0.251*** -0.246*** -0.223*** -1.509***
(0.016) (0.022) (0.022) (0.112)
IsSurprising 0.009 0.030 -0.036
(0.030) (0.028) (0.075)
GroupCfdc_IsLower 0.229*** -1.027***
(0.018) (0.111)
IsMajority ×IsSurprising 0.054
(0.066)
IsMajority
×GroupCfdc_IsLower
2.466***
(0.185)
29
IsSurprising×GroupCfdc
_IsLower
0.025
(0.061)
Individual fixed effects Y Y Y Y Y Y
Question fixed effects Y Y Y Y Y Y
Observations 1,600 1,568 1,600 1,600 1,568 1,568
Pseudo R2 0.139 0.405 0.273 0.273 0.392 0.394
Notes: Logit estimation of the effect of information provision on the probability of revision. Dependent variable is whether a subject
revised his answer to a certain question in the second stage. Cfdc denotes subjects’ self-reported confidence in their own answer, EstiSame
denotes their estimate of others’ giving the same answer, EstiCfdc denotes their estimate of the average confidence in own answer of all
participants, RealSame denotes the proportion of subjects having the same choice as their own, SameGroup_Cfdc denotes the average
confidence of those giving the same answer, OppoGroup_Cfdc denotes the average confidence of those giving the opposite answer,
IsMajority, IsSurprising and GroupCfdc_IsLower are indicator variables. Coefficients displayed are marginal effects. Robust standard
errors are displayed in parentheses. *p<0.1, **p<0.05, ***p<0.01.
In the Full-Information treatment, the information about the answer distributions and the average
confidence of those who hold different opinions (OppoGroup_Cfdc) still has a significant impact on
the revision decisions, while the information about other respondents’ estimates is not significantly
influential (Column 2 of Table 7). One possibility is that the information complexity and limited
attention lead subjects to use simpler heuristics, thus reducing the amount of information considered.
Finally, as we can see from Column 5 of Table 7, subjects mainly rely on their own confidence
levels and the Majority Rule, followed by the Maximum Confidence rule, while the impact of the
egocentric SP rule is economically insignificant. The odds of revising from the minority answer to
majority answer is 40 times that of revising from the majority answer to minority answer, while the
odds of revising from the minimum confidence answer to maximum confidence answer (from the
surprisingly unpopular answer to surprisingly popular answer) is only 14.508 (3.212) times that of
revising from the maximum confidence answer to minimum confidence answer (from the surprisingly
popular answer to surprisingly unpopular answer). Once we add the interaction terms into the model,
the coefficients of variables related to IsSurprising are no longer statistically significant. Therefore,
subjects in the Full-Information treatment seldom employ the egocentric SP rule. When they happen
to be in the majority, they are more likely to rely on confidence information (Column 6 of Table 7).
30
Table 7 Logit Regression of Revision Decisions, Full-Information Treatment
Dependent variable:
I(Revision)
Marginal effects
(1) (2) (3) (4) (5) (6)
Cfdc -0.432*** -0.303*** -0.313*** -0.312*** -0.310*** -0.311***
(0.093) (0.077) (0.076) (0.077) (0.079) (0.080)
EstiSame -0.003 0.087 0.084 -0.012 -0.051 -0.046
(0.092) (0.078) (0.087) (0.091) (0.082) (0.091)
EstiCfdc -0.172 -0.130 -0.162 -0.146 -0.102 -0.103
(0.147) (0.113) (0.128) (0.128) (0.114) (0.117)
RealSame -0.836***
(0.056)
SameGroup_Cfdc 1.479
(1.108)
OppoGroup_Cfdc 2.535**
(1.114)
SameGroup_EstiSame -0.708
(1.655)
OppoGroup_EstiSame -1.428
(1.632)
SameGroup_EstiCfdc -1.745
(1.699)
OppoGroup_EstiCfdc -0.083
(1.633)
IsMajority -0.313*** -0.262*** -0.294*** -1.268***
(0.013) (0.021) (0.033) (0.134)
IsSurprising 0.089*** 0.093*** 0.116
(0.030) (0.027) (0.139)
GroupCfdc_IsLower 0.213*** -0.694***
(0.031) (0.109)
IsMajority×IsSurprising 0.035
(0.102)
IsMajority×GroupCfdc_
IsLower
1.905***
(0.172)
IsSurprising×GroupCfd
c_IsLower
-0.063
(0.085)
Individual fixed effects Y Y Y Y Y Y
Question fixed effects Y Y Y Y Y Y
Observations 1,550 1,519 1,550 1,550 1,519 1,519
Pseudo R2 0.147 0.469 0.405 0.410 0.477 0.478
Notes: Logit estimation of the effect of information provision on the probability of revision. Dependent variable is whether a subject
revised his answer to a certain question in the second stage. Cfdc denotes subjects’ self-reported confidence in their own answer, EstiSame
31
denotes their estimate of others’ giving the same answer, EstiCfdc denotes their estimate of the average confidence in own answer of all
participants, RealSame denotes the proportion of subjects having the same choice as their own, SameGroup_Cfdc denotes the average
confidence of those giving the same answer, OppoGroup_Cfdc denotes the average confidence of those giving the opposite answer,
SameGroup_EstiSame denotes the average estimate of others’ giving the same answer from those giving the same answer,
OppoGroup_EstiSame denotes the average estimate of others’ giving the same answer from those giving the opposite answer,
SameGroup_EstiCfdc denotes the average estimate of the mean confidence in own answer from those giving the same answer,
OppoGroup_EstiCfdc denotes the average estimate of the mean confidence in own answer from those giving the opposite answer,
IsMajority, IsSurprising and GroupCfdc_IsLower are indicator variables. Coefficients displayed are marginal effects. Robust standard
errors are displayed in parentheses. *p<0.1, **p<0.05, ***p<0.01.
Our individual-level analysis finds that the Majority Rule is commonly adopted by participants
in their decision regarding whether to change their previous answer or not. When information about
other respondents’ confidence levels is provided as well, individuals take both the Majority Rule and
the Maximum Confidence rule into consideration. However, the additional information about other
respondents’ estimates and beliefs may serve as a distraction, leading them to rely relatively more
heavily on Majority Rule.
5. Initial Performance and Subsequent Revision
Finally, we want to evaluate the more general impact of social information on the revision
decisions and the subsequent changes to performance. Measures such as Type I Type Ⅱ errors are often
used to describe possible errors made in a statistical decision process. Type I error refers to rejecting
a true null hypothesis, while Type II error refers to failure to reject a false null hypothesis. In our
experiment’s context, a Type I error refers to revising a correct answer, and Type Ⅱ error corresponds
not revising an incorrect answer.
Table 8 displays the two types of errors in each information treatment. Although on average Type
I errors in the Low-Information treatment are the lowest, it is accompanied by the highest level of Type
Ⅱ error. Subjects in the Moderate-Information treatment make the lowest frequency of Type Ⅱ errors
and a moderate level of Type I errors, comparatively.
Table 8: Average Type I and Type Ⅱ Errors
LI Revise Not Revise MI Revise Not Revise FI Revise Not Revise
R 12.09% 87.91% R 12.28% 87.72% R 13.92% 86.08%
W 13.76% 86.24% W 25.44% 74.56% W 22.28% 77.72%
Notes: R (W) denotes that the submitted answer in the first stage is right (wrong). Average type I error made by subjects is displayed
in bold font and average type Ⅱ error is displayed in italics.
To further understand how the heuristic rules considered can potentially improve decisions,
subjects are assumed to adhere to a specific rule (Majority Rule, Maximum Confidence rule, standard
SP rule or egocentric SP rule). We calculate the errors for each rule accordingly and compare them
with the actual values.
As shown in Figure 11, if subjects strictly follow the Majority Rule, they would on average revise
about 30% of the correct answers and decline to revise about 60% of the incorrect answers. Errors of
the standard SP rule are close with that of the majority rule since the two rules suggest the same answer
in most cases. The egocentric SP rule has a lower Type Ⅱ error at the expense of doubling the chance
of rejecting a correct answer, which is due to the overestimation of the percentage agreement with own
answer. By contrast, the Maximum Confidence rule is relatively reliable such that mistaken decisions
32
would occur only moderately if it is employed.
Generally speaking, subjects do not fully exploit the wisdom of crowds via the heuristics examine,
since the actual Type Ⅱ error they make is substantially higher than that of any of the specific heuristics
we have discussed here. In our setting, Type I errors are generally rarer than Type Ⅱ errors, however
the Maximum Confidence rule performs best among the considered heuristics for its success in
maintaining low Type I errors while also minimizing Type II error.
Figure 11: Type I and Type II Errors of Different Heuristics
Notes: Actual denotes the actual value. MR, EgoSP, MC and SP denote the error for the majority rule, egocentric SP rule, maximum
confidence rule and standard SP rule respectively.
Finally, we consider multinomial logit regressions in which subjects are classified by the states
of their answers to a given question in both stages. Table 9 reports the relative risk ratios for the
subjects giving the wrong answer in the first stage but giving the right answer in the second stage
(W&NewR), the subjects giving the right answer in the first stage but giving the wrong answer in the
second stage (R&NewW), the subjects giving the right answer in both stages (R&NewR), with those
who give the wrong answer in both stages (W&NewW) as the comparison group.
The results show that the relative risk of always giving the right answer over always giving the
wrong answer is significantly higher for knowledgeable subjects and easy questions. In addition,
revising an initially incorrect answer in the second stage (W&NewR) is significantly more likely than
not revising (W&NewW) for easy questions. Subjects who submit an answer consistent with a larger
proportion of respondents in the first stage are more likely to have poor overall performance.
Examining the Moderate-Information and Full-Information treatments, the relative risk of always
giving the right answer over always giving the wrong answer is much higher for subjects providing an
answer endorsed with higher average confidence. We also find a pronounced increase in the relative
risk of correcting a wrong answer (W&NewR) if it is refuted by those with high average confidence in
the first stage. Mistakenly revising a right answer (R&NewW) is more likely to occur when the answer
is agreed upon by the group with higher average confidence, or refuted by the group with a higher
average estimate of the average confidence for all participants.
33
Table 9: Multinomial Logit Regression of Two-stage Performance
Panel A. Models for the Low-Information treatment
LI Relative Risk Ratio
(1a) (2a) (3a)
W&NewR R&NewW R&NewR
Cfdc 0.205 0.007*** 17.062***
(0.284) (0.010) (11.597)
EstiSame 0.283 0.043*** 0.193**
(0.353) (0.046) (0.130)
EstiCfdc 0.766 62.481** 0.050***
(1.546) (103.659) (0.048)
RealSame 2.42e-06*** 1.55e-05*** 0.158**
(2.90e-06) (1.29e-05) (0.114)
Knowledge 88.644** 0.259 756.683***
(171.836) (0.401) (714.509)
Easiness 5.060e+05*** 0.379 8,412.899***
(6.163e+05) (0.395) (4,266.940)
Constant 0.084** 838.613*** 0.004***
(0.100) (1,049.001) (0.003)
Panel B. Models for the Moderate-Information treatment
MI Relative Risk Ratio
(1b) (2b) (3b)
W&NewR R&NewW R&NewR
Cfdc 0.057*** 0.286 3.297*
(0.060) (0.345) (2.171)
EstiSame 0.471 1.420 0.104***
(0.447) (1.560) (0.074)
EstiCfdc 2.876 0.100 1.114
(4.202) (0.159) (1.134)
RealSame 0.007*** 0.0005*** 0.231***
(0.004) (0.0003) (0.106)
SameGroup_Cfdc 3.30e-05*** 0.009** 18,476.853***
(5.20e-05) (0.016) (19,407.700)
OppoGroup_Cfdc 4.957e+05*** 62.187** 1.89e-05***
(7.258e+05) (111.054) (2.13e-05)
Knowledge 0.135* 0.642 78.815***
(0.161) (0.881) (70.493)
Easiness 69.704*** 0.537 756.798***
(41.028) (0.423) (275.620)
Constant 0.962 199.268*** 0.039***
(1.156) (299.243) (0.034)
Panel C. Models for the Full-Information treatment
FI Relative Risk Ratio
(1c) (2c) (3c)
34
W&NewR R&NewW R&NewR
Cfdc 0.276 0.003*** 2.058
(0.289) (0.004) (1.222)
EstiSame 0.714 1.175 0.234**
(0.766) (1.260) (0.139)
EstiCfdc 0.867 13.399 1.614
(1.411) (21.642) (1.455)
RealSame 0.001*** 0.0003*** 0.449
(0.001) (0.0002) (0.232)
SameGroup_Cfdc 0.024 1.327e+06*** 2.882e+06***
(0.068) (4.126e+06) (5.691e+06)
OppoGroup_Cfdc 8,284.578*** 0.0001** 1.46e-06***
(23,507.025) (0.001) (2.27e-06)
SameGroup_EstiSame 0.008 1.323 0.052
(0.024) (4.326) (0.112)
OppoGroup_EstiSame 1.65e-06*** 0.001* 0.129
(5.20e-06) (0.003) (0.204)
SameGroup_EstiCfdc 0.027 6.04e-10*** 0.001**
(0.130) (3.53e-09) (0.003)
OppoGroup_EstiCfdc 3.65e+07*** 7.24e+08*** 1.06e+05***
(1.99e+08) (4.09e+09) (2.97e+05)
Knowledge 0.384 4.620 173.623***
(0.527) (6.969) (147.432)
Easiness 620.973*** 0.179* 678.707***
(421.727) (0.161) (252.711)
Constant 0.434 233.316** 0.003***
(0.750) (499.690) (0.003)
Notes: Multinomial Logit estimation of the effect of information provision on the two-stage performance. Dependent variable is whether
a subject correctly answered a certain question in the first stage and the second stage. W&NewR denotes that the subject gives the wrong
answer in the first stage but gives the right answer in the second stage, R&NewW denotes that the subject gives the right answer in the
first stage but gives the wrong answer in the second stage, R&NewR denotes that the subject gives the right answer in both stages (without
revision). The comparison group is those who give the wrong answer in both stages. Cfdc denotes subjects’ self-reported confidence in
their own answer, EstiSame denotes their estimate of others’ giving the same answer, EstiCfdc denotes their estimate of the average
confidence in own answer of all participants, RealSame denotes the proportion of subjects having the same choice as their own,
SameGroup_Cfdc denotes the average confidence of those giving the same answer, OppoGroup_Cfdc denotes the average confidence
of those giving the opposite answer, SameGroup_EstiSame denotes the average estimate of others’ giving the same answer from those
giving the same answer, OppoGroup_EstiSame denotes the average estimate of others’ giving the same answer from those giving the
opposite answer, SameGroup_EstiCfdc denotes the average estimate of the mean confidence in own answer from those giving the same
answer, OppoGroup_EstiCfdc denotes the average estimate of the mean confidence in own answer from those giving the opposite answer,
Knowledge and Easiness refer to the aggregate correct rate of each subject in the first stage and the aggregate correct rate of each question
in the No-Information treatment respectively. Coefficients displayed are relative risk ratios. Robust standard errors are displayed in
parentheses. *p<0.1, **p<0.05, ***p<0.01.
35
6. Conclusion and Discussion
There can be wisdom and intelligence in a group, however, the ‘wisdom of crowds’ can also lead
individual decision-makers astray. In particular, collective decisions can become more extreme and
eventually less correct than choices made by individuals alone. The key to understanding the validity
of the wisdom of crowds is to understand how the relevant information about a crowd’s viewpoints
are processed by individuals in their decision-making. Using a laboratory experiment approach, our
study tests whether more information content is necessarily better for collective knowledge, and how
individual heuristics adopted depend on the amount of information about others’ views provided.
Our experiment shows that information creates a more disperse distribution of correct rates at the
question level, as a result of the accuracy improvement on easy questions but deterioration on difficult
questions. Yet, we find that the extent to which “crowd wisdom” helps or harms knowledge is crucially
mediated by the amount of information. Subjects receiving our Moderate- Information treatment
perform better on most easy questions and partially better on difficult questions. On the other hand,
individuals’ correct response rates become more evenly distributed with information provision than
those in the baseline treatment, mainly because the information significantly improves the overall
performance of those with less initial knowledge.
The raw data patterns are reinforced by our regression analysis results: participants’ answer
revision rates are negatively correlated with their self-reported confidences. It is also relatively rare
for subjects to update their answers if their own answer is in the majority. Furthermore, the proportion
of respondents giving the same answer as originally, generally has a negative association with the
subject’s final performance.
Our experiment also allows us to explore how information affects answer revisions. When Full-
Information is provided, participants tend to put more weight on the Majority answer than on the
Maximum Confidence answer compared with those in the Moderate-Information treatment. To
summarize, the actual Type Ⅱ error of subjects’ revision decisions is much higher than any of the
heuristic rules discussed, indicating subjects’ tendencies to adhere to their original choices.
Furthermore, subjects are overall more likely to give the wrong answer in both stages without revision
if the original response is in line with the majority opinion. On the other hand, the relative risk of
giving the right answer in stage two over maintaining a wrong answer, is significantly higher when the
answer is endorsed with higher average confidence by others sharing the same view.
The accuracies of different revision heuristics are determined by the features of the generated
information in our setting. In our data, individuals tend to overestimate the proportion of respondents
having the same choice as their own, expecting themselves to be in the majority. This contributes to
the popularity of the Majority Rule and the failure of the standard Surprising Popularity rule. The
Maximum Confidence rule yields the best performance due to the confidence exhibited by those who
answer the questions correctly compared with their beliefs on the average confidence level of others.
Our analysis shows that despite subjects’ overall favoritism towards the Majority Rule, Maximum
Confidence performs best out of the rules in terms of minimizing Type 1 and Type II errors altogether.
Hence, overall performance could be enhanced if subjects were to adopt this heuristic.
Our findings have some potentially important policy implications with regard to current social
issues, in which access to the views and intensity of views of other members of society are readily
36
available to individuals through social media and online platforms.
Firstly, our study shows that even when a multitude of information on answers and incentivized
confidence levels of others is available, decision-makers tend to rely heavily on Majority Rule as a
favored heuristic, perhaps due to its simplicity. Providing information about other individuals’
confidence levels in their answers does help to an extent, as exhibited by the best performances arising
from the Moderate-Information treatment, although the marginal information provided in that
treatment (confidence levels) remains under-utilized in subjects’ decisions. That Majority Rule is
heavily utilized even when other available social information presented is helpful to fact-finding,
indicates that the composition of individuals participating in social media or online platforms can be
highly influential, since the proportion of proponents and opponents of a statement tends to sway
public opinion the most.
In addition, our study points to the potential for information about other individuals’ incentivized
confidence reports to improve collective knowledge. While the information on confidence levels in
our experiment was not utilized to its fullest extent, our analysis shows that it could have helped
performance further. This suggests a possible role for incentivized confidence elicitation as a
mechanism in online discussion platforms and social media. Most current mechanisms for evaluating
online statements are still of a majority rule nature, such as users’ ability to rate one another’s
comments. A first policy step could be to introduce a simple method for online users to weight their
self-endorsements of their own statements according to their true confidence levels, while a follow-up
policy might help bring users’ attention to these confidence statements, deflecting users’ tendency to
focus on majority opinions within a platform. Such policies may help to realize the potential of
information about others’ confidence levels in promoting better performance outcomes.
Finally, our study indicates that there is a limit to the amount of information about others’ views
that decision-makers can effectively process. This is exemplified by the reduction in overall
performance in the Full-Information treatment compared to the Moderate-information treatment. In
particular, decision-makers may not know how to utilize the higher-order information provided in the
Full-Information treatment, and could even be confused by it such that on net, it cancels some of the
beneficial effects obtained in the Moderate-Information treatment. Thus, a policy implication is that a
mechanism which seeks to improve collective knowledge need not collect or present users with higher-
order information, since it may be unlikely to be utilized in a beneficial manner.
We believe there are promising directions for future research based on the findings in this study.
One of the interesting findings in our setting is that the Surprising Popularity rule does not deliver the
improved performance hypothesized, mainly due to the lack of Surprisingly Popular answers in our
data. Future research can explore the question-based conditions needed for this and other heuristics to
serve as effective decision-making tools. For example, it may be possible that particular phrasing or
framing of statements may lead to better conditions for realizing the potential of the heuristics
examined.
Another possible direction for future work is to consider other types of social knowledge
information in a High-Information treatment besides the variables we have considered here. Although
Full-Information did not help our subjects on the margin, it is possible that subjects might find other
types of information about others’ viewpoints useful in making better decisions. In addition, while our
current study examined a range of question difficulty levels, and tested for the marginal effects of
37
information provision across the spectrum of difficulty, we found relatively limited scope for
improvement on very difficult questions, based on the information provided to subjects in this study.
An important future direction could be to study more specifically the types of information that can
help individuals obtain more accurate answers to questions of high difficulty levels.
38
References:
Aspinall, W., 2010. A route to more tractable expert advice. Nature, 463(7279), 294-295.
Baars, J. A., and Mass, C. F., 2005. Performance of national weather service forecasts compared
to operational, consensus, and weighted model output statistics. Weather and Forecasting, 20, 1034-
1047.
Baillon, A., Tereick, B., and Wang, T.V., 2020. Follow the money, not the majority: Incentivizing
and aggregating expert opinions with Bayesian markets. Working paper.
Bazazi, S., von Zimmermann, J., Bahrami, B., and Richardson, D., 2019. Self-serving incentives
impair collective decisions by increasing conformity. PLoS One, 14(11), e0224725.
Becker, G. M., DeGroot, M. H., and Marschak, J., 1964. Measuring utility by a single-response
sequential method. Behavioral Science, 9, 226-232.
Brosig, J., Weimann, J., and Yang, C-L., 2003. The hot versus cold effect in a simple bargaining
experiment. Experimental Economics, 6, 75-90.
Brown, R., 2000. Group processes. Malden, MA: Blackwell.
Budescu, D. V., and Chen, E., 2014. Identifying expertise to extract the wisdom of crowds.
Management Science, 61(2), 267-280.
Chacoma, A., and Zanette, D. H., 2015. Opinion formation by social influence: From experiments
to modelling. PLoS One, 10(10): e0140406.
Chambers, J. R., and Windschitl, P. D., 2004. Biases in social comparative judgments: The role
of nonmotivated factors in above-average and comparative-optimism effects. Psychological Bulletin,
130, 813-838.
Chen, G., Lien, J. W., and Zheng, J., 2017. The value of the knowledge of the others. Technical
report.
Chen, H., De, P., Hu, Y., and Hwang, B., 2014. Wisdom of crowds: The value of stock opinions
transmitted through social media. Review of Financial Studies, 27(5), 1367-1403.
Chen, X., Hong, F., and Zhao, X., 2020. Concentration and variability of forecasts in artificial
investment games: An online experiment on WeChat. Experimental Economics, 1-33.
Cooke, R., 1991. Experts in uncertainty: Opinion and subjective probability in science. Oxford
University Press, USA.
Fischbacher, U., 2007. Z-tree: Zurich toolbox for ready-made economic experiments.
Experimental Economics, 10(2), 171-178.
Gottschlich, J., and Hinz, O. A., 2014. A decision support system for stock investment
recommendations using collective wisdom. Decision Support System, 59, 52-62.
Jayles, B., Kim, H., Escobedo, R., Cezera, S., Blanchet, A., Kameda, T., Sire, C., and Theraulaz,
G., 2017. How social information can improve estimation accuracy in human groups. Proceedings of
the National Academy of Sciences, 114(47), 12620-12625.
39
Jayles, B., and Kurvers, R. H. J. M., 2020. Exchanging small amounts of opinions outperforms
sharing aggregated opinions of large crowds. Working paper.
King, A. J., Cheng, L., Starke, S. D., and Myatt, J. P., 2012. Is the true ’wisdom of the crowd’ to
copy successful individuals? Biology Letters, 8(2),197-200.
Koriat, A., 2012. When are two heads better than one and why? Science, 336, 360-362.
Kurvers, R. H. J. M., Herzog, S. M., Hertwig, R., Krause, J., Carney, P. A., Bogart, A.,
Argenziano, G., Zalaudek, I., and Wolf, M., 2016. Boosting medical diagnostics by pooling
independent judgments. Proceedings of the National Academy of Sciences, 113 (31), 8777-8782.
Lee, M. D., Zhang, S., and Shi, J., 2011. The wisdom of the crowd playing the price is right.
Memory and Cognition, 39(5), 914-923.
Lorenz, J., Rauhut, H., Schweitzer, F., and Helbing, D., 2011. How social influence can
undermine the wisdom of crowd effect. Proceedings of the National Academy of Sciences, 108(22),
9020-9025.
Marks, G. and Miller, N., 1987. Ten years of research on the false-consensus effect: An empirical
and theoretical review. Psychological Bulletin, 102, 72–90.
Mellers, B., Ungar, L., Baron, J., Ramos, J., Gurcay, B., Fincher, K., Scott, S.E., Moore, D.,
Atanasov, P., Swift, S. A., Murray, T., Stone, E., and Tetlock, P. E., 2014. Psychological strategies for
winning a geopolitical forecasting tournament. Psychological Science, 25, 1106-1115.
Morton, R. B., Piovesan, M., and Tyran, J-R., 2019. The dark side of the vote: Biased voters,
social information, and information aggregation through majority voting. Games and Economic
Behavior, 113, 461-481.
Niu, X., Li, J., Browne, G. J., Li, D., Cao, Q., Liu, X., Wang, G., and Wang, P., 2019. Transcranial
stimulation over right inferior frontal gyrus increases the weight given to private information during
sequential decision-making. Social Cognitive and Affective Neuroscience, 14 (1), 59-71.
Palley, A. B., and Soll, J. B., 2019. Extracting the wisdom of crowds when information is shared.
Management Science, 65, 2291-2309.
Prelec, D., Seung, H. S., and McCoy, J., 2017. A solution to the single-question crowd wisdom
problem. Nature, 541, 532-535.
Raykar, V. C., Yu, S., Zhao, L. H., Valadez, G. H., Florin, C., Bogoni, L., and Moy, L., 2010.
Learning from crowds. Journal of Machine Learning Research, 11, 1297-1322.
Silva, S., and Correia, L., 2016. An experiment about the impact of social influence on the wisdom
of the crowd effect. Working paper.
Tump, A. N., Wolf, M., Krause, J., and Kurvers, R. H. J. M., 2018. Individuals fail to reap the
collective benefits of diversity because of over-reliance on personal information. Journal of the Royal
Society Interface, 15, 20180155.
Wang, G., Kulkarni, S. R., Poor, H. V., and Osherson, D. N., 2011. Aggregating large sets of
probabilistic forecasts by weighted coherent adjustment. Decision Analysis, 8(2), 128-144.
Wolf, M., Krause, J., Carney, P. A., Bogart, A., and Kurvers, R. H. J. M., 2015. Collective
40
intelligence meets medical decision-making: The collective outperforms the best radiologist. PLoS
One, 10 (8), e0134269.
Wolfers, J, and Zitzewitz, E., 2004. Prediction markets. Journal of Economic Perspectives, 18,
107-126.
Yaniv, I., and Milyavsky, M., 2007. Using advice from multiple sources to revise and improve
judgments. Organizational Behavior and Human Decision Processes, 103, 104-120.
Yum, H., Lee, B., and Chae, M., 2012. From the wisdom of crowds to my own judgment in
microfinance through online peer-to-peer lending platforms. Electronic Commerce Research and
Applications, 11(5), 469-483.
41
Appendix A. Question-based Comparisons between Treatments (Raw Data Graphs)
Figure A1: Comparisons between the Information Treatments and the No-Information Treatment
Notes: Question number ordered by question difficulty levels in the No-Information treatment from difficult to easy. Second-stage
correct rate on the y-axis. Nonparametric kernel estimation with Epanechnikov function used to obtain smooth curve.
42
Figure A2: Comparisons between the Information Treatments
Notes: Question number ordered by question difficulty levels in the No-Information treatment from difficult to easy. Second-stage
correct rate on the y-axis. Nonparametric kernel estimation with Epanechnikov function used to obtain smooth curve.
43
Appendix B. Question-based Confidences and Answer Distributions in the No-Information
Treatment
Figure B1: Average Real and Estimated Confidence
Notes: Question number ordered by question difficulty levels in the No-Information treatment from difficult to easy. Average
confidence level of subjects on the y-axis.
Figure B2: Average Real and Estimated Percentage of Agreement with Own Answer
Notes: Question number ordered by question difficulty levels in the No-Information treatment from difficult to easy. Average
percentage of proponents (i.e., participants choosing the same answer as his own) aggregated by subjects on the y-axis. No
significant difference in the estimates of others’ giving the same answer between the Right group and Wrong group (Wilcoxon
matched-pairs signed-ranks test, p=0.958).
44
Appendix C. Experimental Instructions (translated from original Chinese version)
C.1 Experimental instructions for the first-stage tasks (common to all the treatments)
Thank you for participating in this experiment! Please read the following instructions carefully. If you have any
questions, feel free to ask us. Please note that you cannot communicate with other participants during the experiment.
Experimental task
You will see 50 trivia questions, each of which contains 4 sub-questions.
Question (a) includes a statement. Please choose between the options “True” and “False” based on your
knowledge.
Question (b) requires you to estimate the probability of giving the correct answer to question (a).
Question (c) requires you to estimate the proportion of participants (including yourself) in the experiment who
give the same answer to question (a) as you.
Question (d) requires you to estimate the average value of the confidence in own answer reported by all the
participants (including yourself) to question (b) in the experiment.
Experimental payoff
For each question, you will get 10 points if you answer question (a) correctly, and 0 points otherwise.
For question (b), we will randomly generate a number from [0%, 100%] (the number will be generated again
randomly for different questions). If your answer (i.e., the percentage entered in the question) is smaller than the
number, your score is: the random number * 2 points. If your answer (i.e., the percentage entered in the question) is
greater than or equal to the number, your score depends on your answer to the corresponding question (a). You will
get 2 points if question (a) is correctly answered, and 0 points otherwise.
For questions (c) and (d), you will get 2 points if the difference between your answer and the actual result is
within the range [-5%, 5%]; you will get 0 points otherwise.
Each 1 point can be exchanged for 0.1 RMB. Therefore, your total payoff in the experiment equals your total
score multiplied by 0.1 RMB.
Example
For example, you will see the following 4 questions:
(a) The Russians celebrated the October Revolution in October. True/ False
(b) What is your estimate of the probability that you are correct? (an integer between 50 and 100, %)
(c) Among all the participants in this experiment (including yourself), what do you think is the proportion of
participants who have the same answer to question (a) as you? (an integer between 1 and 100, %)
(d) Among all the participants in this experiment (including yourself), what do you think is the average value of
answers given to question (b)? (an integer between 50 and 100, %)
Suppose there are 5 participants in an experiment. Their answers are shown in the following table.
Subject (a) (b) Xi% (c) Yi% (d) Zi%
N1 True X1 Y1 Z1
N2 False X2 Y2 Z2
N3 False X3 Y3 Z3
N4 True X4 Y4 Z4
N5 True X5 Y5 Z5
45
Suppose the correct answer to question (a) is "False". Then the participants N1, N4 and N5 who answer “True”
will get 0 points for question (a); the participants N2 and N3 who answer “False” will get 10 points for question (a).
Question (b) requires you to estimate the possibility of giving the correct answer to question (a). For each subject
i, the computer will generate a random number %iR in [0%, 100%]. If i iR X , he will get % 2iR points for
question (b); if i iR X , he will get 2 points when he answers question (a) correctly for question (b), and 0 points
otherwise.
Let us analyze whether the subject has the incentive to report his true estimate. Suppose the true estimate of
subject i is *
iX , but he reports iX
If *
i iX X , he gets % 2iR points when i iR X ; he gets 2 points (answering question (a) correctly) or 0
point (answering question (a) incorrectly) when *
i iR X and *
i i iX R X . However, when *
i i iX R X , his
expected score is * * *% 2+(1 %) 0= % 2i i iX X X − , which is smaller than % 2iR . Therefore, it is not beneficial
for the subject to report *
i iX X .
If *
i iX X , he gets % 2iR points when *
i iR X and *
i i iX R X ; he gets 2 points (answering question
(a) correctly) or 0 point (answering question (a) incorrectly) when i iR X . However, when *
i i iX R X , his
expected score % 2iR is smaller than * * *% 2+(1 %) 0= % 2i i iX X X − . Therefore, it is also not beneficial for
the subject to report *
i iX X .
Question (c) requires you to estimate of the proportion of participants (including yourself) in the experiment
who give the same answer to question (a) as you.
There are 3 participants answering “True” and 2 participants answering “False” in the example. That is, 60% of
the participants answer “True” and 40% of the participants answer “False”.
For subject N1, the proportion of others’ giving the same answer is 60%. He will get 2 points for question (c) if
| 60| 5iY − , and 0 point otherwise. For subject N2, the proportion of others’ giving the same answer is 40%. He will
get 2 points for question (c) if | 40| 5iY − , and 0 point otherwise. We can calculate the scores for subject N3, N4
and N5 in a similar way.
Question (d) requires you to estimate the average value of the confidence in own answer reported by all the
participants (including yourself) to question (b) in the experiment. A subject i will get 2 points if
1 2 3 4 5| | 55
i
X X X X XZ
+ + + +− , and 0 points otherwise.
If you have any questions, please raise your hand. The experiment will start if all subjects have no questions in
understanding the experimental procedures.
C.2 Experimental instructions for the second-stage tasks
Note. Subjects were informed of the second stage only when all the participants had finished the experimental task
in the first stage.
Thank you for answering the above 50 questions. Next, we will provide you with the answers submitted by the
participants in an earlier session to each of the 50 questions, and also provide you with an opportunity to revise your
answers. For each question (a), you can feel free to revise your previous answers or simply choose not to revise them.
If you decide to revise the previous answer, your answer will be automatically updated according to the
46
following rule: the modified answer becomes "False" if your previous answer is "True"; and the modified answer
becomes "True" if your previous answer is "False".
If you decide not to revise the previous answer, your answer remains the same as your previous answer.
Your final payoff for question (a) will depend on your answer in this stage.
Ⅰ. Example [specific to the treatment LI]
If you have answered the question "the Russians celebrate the October Revolution in October" before, then you
will see the question again as follows:
The Russians celebrated the October Revolution in October.
Your previous answer is: True / False
Do you want to modify your answer: Yes / No
At the same time, you will see the answer submitted by the participants in an earlier session on the screen. The
details are as follows:
For the participants who agreed with the statement in an earlier session, the number of them accounted for AW%
of the total number of participants.
For the participants who disagreed with the statement in an earlier session, the number of them accounted for
DW% of the total number of participants.
Suppose there were 5 participants in the earlier session. Their answers are shown in the following table.
Subject (a) (b) Xi% (c) Yi% (d) Zi%
N1 True X1 Y1 Z1
N2 False X2 Y2 Z2
N3 False X3 Y3 Z3
N4 True X4 Y4 Z4
N5 True X5 Y5 Z5
Therefore, we have AW%= 60%, DW%= 40%.
Ⅱ. Example [specific to the treatment MI]
If you have answered the question "the Russians celebrate the October Revolution in October" before, then you
will see the question again as follows:
The Russians celebrated the October Revolution in October.
Your previous answer is: True / False
Do you want to modify your answer: Yes / No
At the same time, you will see the answer submitted by the participants in an earlier session on the screen. The
details are as follows:
For the participants who agreed with the statement in an earlier session, the number of them accounted for AW%
of the total number of participants; the average probability that they estimated they were correct was AX%.
For the participants who disagreed with the statement in an earlier session, the number of them accounted for
DW% of the total number of participants; the average probability that they estimated they were correct was DX%.
Suppose there were 5 participants in the earlier session. Their answers are shown in the following table.
47
Subject (a) (b) Xi% (c) Yi% (d) Zi%
N1 True X1 Y1 Z1
N2 False X2 Y2 Z2
N3 False X3 Y3 Z3
N4 True X4 Y4 Z4
N5 True X5 Y5 Z5
Therefore, we have AW%= 60%, AX%= 1 4 5 100%3
X X X+ + ; DW%= 40%, DX%= 2 3 100%
2
X X+ .
Ⅲ. Example [specific to the treatment FI]
If you have answered the question "the Russians celebrate the October Revolution in October" before, then you
will see the question again as follows:
The Russians celebrated the October Revolution in October.
Your previous answer is: True / False
Do you want to modify your answer: Yes / No
At the same time, you will see the answer submitted by the participants in an earlier session on the screen. The
details are as follows:
For the participants who agreed with the statement in an earlier session, the number of them accounted for AW%
of the total number of participants; the average probability that they estimated they were correct was AX%; the
average proportion of participants they estimated agreeing with the statement was AY%; the average value they
estimated of the average probability that each subject believed himself to be correct was AZ%.
For the participants who disagreed with the statement in an earlier session, the number of them accounted for
DW% of the total number of participants; the average probability that they estimated they were correct was DX%;
the average proportion of participants they estimated disagreeing with the statement was DY%; the average value
they estimated of the average probability that each subject believed himself to be correct was DZ%.
Suppose there were 5 participants in the earlier session. Their answers are shown in the following table.
Subject (a) (b) Xi% (c) Yi% (d) Zi%
N1 True X1 Y1 Z1
N2 False X2 Y2 Z2
N3 False X3 Y3 Z3
N4 True X4 Y4 Z4
N5 True X5 Y5 Z5
Therefore, we have AW%= 60%, AX%= 1 4 5 100%3
X X X+ + , AY%= 1 4 5 100%
3
Y Y Y+ + , AZ%=
1 4 5 100%3
Z Z Z+ + ; DW%= 40%, DX%= 2 3 100%
2
X X+ , DY%= 2 3 100%
2
Y Y+ , DZ%= 2 3 100%
2
Z Z+ .
48
Appendix D. 50 Quiz Questions Used in the Experiment (translated from original Chinese
version)
1. The national anthem of Spain has no lyrics.
2. Bangladesh has a smaller population than Russia.
3. Pluto has not orbited the sun one round since its discovery.
4. Russia has the most time zones of any country.
5. Nata de coco is made from coconut meat.
6. Liu Bang was three years younger than Qin Shihuang.
7. Uranus has a moon named after a character in King Lear written by Shakespeare.
8. The farthest place from the center of the earth is the Himalayas.
9. We can send WeChat messages on Mount Qomolangma.
10. “亖” is pronounced [sì].
11. Sharks are the animal that have the most teeth in the world.
12. The gas produced by melting gold is of green color.
13. “Sworn brothers” includes friends between generations.
14. The name of the Polish special forces is “Giant Palm”.
15. The Arabia in Arabian Nights refers to India.
16. The Chinese saying “This is just between you and me” was first said by Yang Zhen in the Han Dynasty, who
meant to refuse to accept gifts.
17. Among the four major basins in China, the Tarim Basin in Xinjiang Province has a special kind of soil called
“purple soil”.
18. The Chinese saying “桃李年华” refers to a woman’s age of 24.
19. The dying wish of Goethe was to be buried beside the poet Schiller.
20. The Pyrenees is a natural border between Spain and France.
21. The candidates for the 2012 FIFA Ballon d ’Or Award include Lionel Messi, Cristiano Ronaldo and Neymar.
22. The Italian scientist and astronomer Copernicus was burnt to die for his adherence to the heliocentric theory.
23. The father of Dayu, who is the hero of controlling flood in the history of China, was called Gu.
24. Duan Jingzhu is ranked 108 and has a nickname of “Golden Retriever” in the Water Margin.
25. In the classic FC game Super Mario, when Mario can fire bullets, he wears blue clothing.
26. Zishi in ancient China refers to the present time of 23:00-00:59 in a day.
27. After Amsterdam, the second most populous city in the Netherlands is The Hague.
28. The top left corner of the flag of Australia is the flag of the United Kingdom.
29. Czechoslovakia, a central European country, had split from the Czech Republic.
30. In the Chinese story “No 300 taels of silver buried here”, Wang Er stole the silver.
31. On the reverse side of the 5th edition of 5-yuan RMB bills is Huangshan Mountain in China.
32. The Indus River is located in India.
33. Among the “six top ancient water-towns in the southern Yangtze River area”, the three ancient towns in Zhejiang
49
Province are Wuzhen, Zhouzhuang and Xitang.
34. The capital of the South American country of Panama is Panama City.
35. The first “9” in the C919 medium-size airliner means everlasting.
36. The Hundred Years’ War lasted 116 years.
37. The “Black box” on an airliner is purple.
38. The Canary Islands in the Atlantic Ocean are named after the Canary Dog.
39. The label number of diesel oil, such as 0 and -10, is classified by its solidifying point.
40. In the history of the Golden Horse Awards in Taiwan, only two actors, Jackie Chan and Tony Leung Chiu-wai,
have won two Best Actor awards in a row.
41. “东风” in the Chinese poetry “东风不与周郎便,铜雀春深锁二乔” refers to Borrowing Arrows with
Thatched Boats.
42. Zhang Qian brought corn back from the Western Regions in the Tang Dynasty.
43. “Beat the snake seven inches” means beating its heart.
44. The Bentley LOGO has 10 large feathers on each side of the wing.
45. Sea cucumbers go dormant in winter.
46. Severe winter refers to the twelfth month of the lunar calendar.
47. The first woman to proclaim herself emperor in the history of China is Wu Zetian.
48. The name “84 Disinfectant” is derived from the successful development in 1984.
49. “Grand finale” was originally a Chinese opera term, referring to the last part of a drama performance.
50. The last name of Confucius is Zi.