own-other differences in the realism of some metacognitive judgments

9

Click here to load reader

Upload: marcus-johansson

Post on 14-Jul-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Own-other differences in the realism of some metacognitive judgments

Scandinavian Journal of Psychology, 2007, 48, 13–21 DOI: 10.1111/j.1467-9450.2006.00565.x

© 2007 The Authors. Journal compilation © 2007 The Scandinavian Psychological Associations. Published by Blackwell Publishing Ltd., 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA. ISSN 0036-5564.

Blackwell Publishing Ltd

Cognition and Neurosciences

Own-other differences in the realism of some metacognitive judgments

MARCUS JOHANSSON and CARL MARTIN ALLWOOD

Department of Psychology, Lund University, Sweden

Johansson, M. & Allwood, C. M. (2007). Own-other differences in the realism of some metacognitive judgments.

Scandinavian Journal ofPsychology

,

48

, 13–21.

The present study investigated differences in judgments of one’s own and others’ knowledge (the own-other difference). Consistent with thebelow-average effect (e.g., Kruger, 1999), our main results showed that the participants gave lower knowledge ratings of their own extent ofknowledge than of another person’s extent of knowledge (Experiment 1). Furthermore, lower and more realistic judgments were found when theparticipants judged their own as compared with when judging another person’s overall accuracy (frequency judgments) of answering knowledgequestions correctly (Experiment 1 and 2). On the basis of these results it is argued that judgmental anchoring may be important also in the contextof indirect comparisons, and that previous conclusions of cross-cultural psychology regarding the above-average effect may be oversimplified.

Key words:

Realism, confidence judgments, frequency judgments, social influence, metacognition.

Carl Martin Allwood, Department of Psychology, Lund University, Box 213, SE-221 00 Lund, Sweden

; e-mail: [email protected]

INTRODUCTION

In everyday life, people make various judgments of their ownand others’ knowledge. In relation to long-held beliefs inpsychology about people being victims of biased thinking, suchas viewing themselves as better than the average person andmaking systematic overestimations of their knowledge, it is ofinterest to analyze whether and to what extent such judgmentstend to be more favorable towards one’s own knowledge thanthat of others. The present study focuses on the contrast betweenjudgments of one’s own and other people’s knowledge, and,in this context, examines two types of metacognitive judgments,namely knowledge ratings and frequency judgments (explainedbelow). We were also interested to examine the stability ofjudgments of others’ knowledge, meaning the extent to whichjudgments about a person’s knowledge would be affected byprior judgments of one’s own or others’ knowledge.

A

knowledge rating

refers to a rating of the extent of aperson’s knowledge in a given knowledge domain (Allwood& Granhag, 1996; Granhag

et al.

, 1999). For example, some-one may claim to know practically nothing about televisionsoap operas. Although it is complicated to estimate the levelof

realism

in people’s knowledge ratings, it is of interest thatAllwood and Granhag (1996) found seemingly high levels ofsuch ratings. Across several knowledge domains (e.g., con-temporary rock music, historical battles, etc.), and assumingthat the participants interpreted the assessment scale ascommensurate with a proportion, the participants in thisstudy claimed to possess slightly more than a quarter of allexisting knowledge (

M

= 2.76) as rated on a scale rangingfrom zero (“no knowledge”) to ten (“all existing knowledge”).

A

frequency judgment

is made in contexts where manyanswers have been given, and states how many of those answers

the person believes to be correct. Frequency judgments havealso been called aggregated-item judgments (Treadwell &Nelson, 1996) and global judgments (e.g., Liberman, 2004;Sniezek & Buckley, 1991). When a frequency judgmentprovides an accurate assessment of how many questions havebeen answered correctly, it is said to show perfect realism. Whentoo many questions are believed to have been correctlyanswered, the frequency judgment shows overconfidence;when too few questions are believed to have been correctlyanswered, it shows underconfidence.

Frequency judgments have typically been examined in thecontext of an individual judging his or her own answers, andtypically show fairly good realism or even underestimation.Different explanations for these results have been proposed(Allwood & Granhag, 1996, 1999; Gigerenzer, Hoffrage &Kleinbölting, 1991; Griffin & Tversky, 1992; Liberman2004; Sniezek, Paese & Switzer, 1990; Sniezek & Buckley,1991, 1993; Treadwell & Nelson, 1996).

Below, we first discuss knowledge ratings and frequencyjudgments in the context of how judgments concerning one’sown knowledge compare with judgments concerning theknowledge of others (such a comparison is referred to belowas the

own-other difference

). Next, we discuss how judgmentsabout one’s own and others’ knowledge may be influencedby previous such judgments. In this latter context we alsodiscuss confidence judgments (ratings of the probability thatthe answer to a specific question is correct).

Own-other differences in metacognitive judgments

Previous research on own-other differences shows mixedresults. In the area of cross-cultural differences, researchershave observed that Westerners (e.g., in North Europe and

Page 2: Own-other differences in the realism of some metacognitive judgments

14

M. Johansson and C. M. Allwood

Scand J Psychol 48 (2007)

© 2007 The Authors. Journal compilation © 2007 The Scandinavian Psychological Associations.

North America) tend to exhibit an optimistic bias by, forexample, showing evidence of an above-average effect –meaning that, on average, participants believe themselves tobe better performers than the average person (see e.g., Heine& Lehman 2004; Markus & Kitayama, 1991). For example,a study from Sweden showed that the participants on averagethought themselves to be better drivers than the average cardriver (Svenson, 1981). In contrast, Kruger (1999) argued andreported empirical support for the notion that the above-average effect should be differentiated depending on the judgeddifficulty of the task: for easy tasks, an above-average effectis expected; and for difficult tasks, a below-average effect isexpected. Both the above- and the below-average effecthave received support in the literature (e.g., Alicke, Klotz,Breitenbecher, Yurak & Vredenburg, 1995; Klar, Medding& Sarel, 1996; Kruger, 1999; Kruger & Dunning, 1999;Svenson, 1981; Van Yperen, 1992). In this context, it is worthnoting that optimistic bias can be measured both by a directand by an indirect method. In the

direct comparative method(direct comparison)

, the respondents directly compare them-selves with another person. In the

indirect method (indirectcomparison)

, the respondents make separate absolute judg-ments about themselves and about others, whereafter the

researcher

compares the level of the two judgments.As a framework for the present study, models of how people

form conceptions about others’ knowledge are relevant (Kruger,1999; Nickerson, 1999). Both Kruger’s and Nickerson’smodels propose that people use their own knowledge as astarting point, that is, as an anchor, in order to infer whatothers know, and then adjust this conception by consideringhow others appear to differ from themselves. Nickerson alsoreviewed results indicating that people make insufficientadjustments, which can result in biased estimates of what othersknow. These models nicely illustrate how social differencesmay be explained by phenomena studied in cognitive psycho-logy. A difference between the two models is that Nickerson’smodel is formulated on a general level whereas Kruger’smodel (1999) only addresses direct comparative judgments.

In contrast to Nickerson’s (1999) model, Chambers andWindschitl (2004, p. 831) argued in a literature review that“the anchoring and insufficient adjustment account isirrelevant” in indirect comparisons “because there is no reasonfor a respondent to actively adjust his or her absolute assess-ment of one entity as a function of his or her absoluteassessment of another entity”. However, as further discussedbelow, Chapman and Johnson (2002), also on the basis of aliterature review, concluded that anchoring effects can (andusually do) take place without adjustment. If so, this sup-ports that anchoring effects are also relevant in the contextof indirect comparisons.

Own-other differences in the knowledge ratings.

Based on themodels described above, we assumed that when judgingthe extent of another’s knowledge, people use their ownknowledge as an anchor. However, as implied by Kruger’s

(1999) research, the own-other difference in knowledgeratings would also be expected to depend on the task’s thresh-old for successful performance. The present research testswhether or not Kruger’s results concerning direct compara-tive judgments hold for indirect comparisons as well.

Since the performance levels of the general knowledgequestions (GKQs) used in the present study tend to be quitepoor (see, for example, Allwood & Johansson, 2004, as wellas the accuracy levels in the Results sections of this study),the threshold for successful performance is likely to be high.Accordingly, we expected a pessimistic bias in the own-otherdifference in the knowledge ratings. That is, those participantswho rated the extent of their own knowledge were expectedto show lower ratings than those who rated the extent ofanother’s knowledge.

Own-other differences in realism in frequency judgments.

Onlylittle research has examined the own-other difference inthe context of frequency judgments (Allwood & Johansson,2004). Allwood and Johansson found that participants gavelower frequency judgments of their own accuracy as com-pared with their frequency judgments of another person’saccuracy. The frequency judgments of one’s own accuracywere associated with underestimation in both experiments,while the frequency judgments of the other’s accuracy wereeither realistic (Experiment 1) or showed overestimation(Experiment 2).

Effect of metacognitive judgments on following metacognitive judgments

In the present study we also examined if the effect of previousjudgments of one’s own or others’ knowledge would affect thefollowing judgments about others’ knowledge (Experiment 1and 2) or one’s own knowledge (Experiment 2).

The effect of knowledge ratings on following confidence judg-ments.

Previous research has found that confidence judgmentsshow overconfidence (i.e., a higher level of confidence thancorrectness in answers) in contexts similar to the present study.Furthermore, Allwood and Granhag (1996; Granhag

et al.

,1999) found that giving prior knowledge ratings was asso-ciated with improved realism in the following confidencejudgments of one’s own general knowledge assertions. By“better realism” we mean that there was a better correspond-ence between the proportion correct and the level of theconfidence judgments. This

knowledge-rating effect

on therealism in confidence judgments was consistent with theseauthors’ reasoning that performing the knowledge ratingswould leave an impression of the limitation of one’s ownknowledge that would restrain the level of one’s confidencejudgments, and (given initial overconfidence) consequentlylead to improved realism. In the present study, we askedwhether a similar effect would hold when the participantsrated either their own or another person’s knowledge before

Page 3: Own-other differences in the realism of some metacognitive judgments

Scand J Psychol 48 (2007)

Own-other differences in metacognitive judgments

15

© 2007 The Authors. Journal compilation © 2007 The Scandinavian Psychological Associations.

making a confidence judgment of the

other person’s

know-ledge claims.

As described above, we expected higher knowledge ratingsfor others’ knowledge as compared with the ratings for one’sown knowledge. As a consequence of this, we also thoughtthat those participants who had rated others’ knowledge,would, on average, later give higher confidence ratings thanthose who had rated their own knowledge. Assumingprevailing overconfidence, we expected that the higherconfidence judgments of the participants who had rated theother’s knowledge would be associated with poorer realismas compared with the participants who had rated their ownknowledge. Furthermore, in line with the results presentedby Allwood and Granhag (1996; Granhag

et al.

, 1999), wepredicted that the knowledge ratings, either of one’s ownor of the other’s knowledge, would attenuate the level ofthe ensuing confidence judgments more than in a controlcondition with no knowledge ratings.

The effect of frequency judgments on the following frequencyjudgments.

Experiment 2 examined the effect of havinggiven prior frequency judgments on ensuing frequency judg-ments. In order to explain the effect of the prior frequencyjudgments on following frequency judgments, the anchoringeffect may be important. Chapman and Johnson (2002)argued on the basis of a literature review that the anchoringand adjustment effect may be explained to the greatest extentby the anchoring part. The anchoring effect was argued tobe most plausibly explained by an anchor (i.e., previouslyactivated relevant information) having an effect by makingsome information more readily available (by priming) whenforming the answer to the target question. Thus, the adjust-ment part of the anchoring and adjustment heuristic was notseen as having received support. However, Epley and Gilovich(2002) argued and presented evidence that when theanchor is self-generated the adjustment part does occur. Thepresent research is not designed to resolve this debate butonly assumes the effect of anchoring in some form, althoughwe note that it seems likely that in some situations bothmechanisms may be active at the same time. More in general,we propose that a prior frequency judgment of one’s own oranother person’s performance may give rise to anchoringeffects on a following frequency judgment.

Moreover, we note that previous research has shown thatassimilation (or anchoring) and contrast effects (i.e., con-trasting away from the implications of primes) appear to becontingent on whether or not the source of a previous primeor anchor can be judged as a possible source of contamination(see Mussweiler & Neumann, 2000; Mussweiler & Strack,1999). Thus, for example, if a person finds that a previousfrequency judgment may contaminate an ensuing frequencyjudgment, a contrast effect could occur.

Conversely, on occasions where it is more difficult for theperson to be aware that the ensuing frequency judgmentmay be influenced by the prior frequency judgment, an

assimilation effect is more probable. In line with this reason-ing and the purpose of eliminating the own-other differencewithin subjects, in the present study, our participants’ fre-quency judgments of their own and another pair member’soverall accuracy were separated by a filled delay. Here it isalso relevant to note that anchoring effects have been foundto be durable over time (Mussweiler, 2001).

Hypotheses

To summarize, our first hypothesis was that participantswould give lower knowledge ratings of their own than ofanother person’s knowledge. The second hypothesis was thatthe participants would give lower frequency judgments oftheir own than of the others’ answers, and, due to this, alsoless realistic frequency judgments. This prediction pertainedto Experiment 1 and the first (of two) frequency judgmentmade in Experiment 2.

Our third and fourth hypotheses concerned the stability ofboth the confidence judgments and the frequency judg-ments. The third hypothesis was that the knowledge ratingsof one’s own knowledge would result in better realism in thefollowing confidence judgments than the knowledge ratingsof the other’s knowledge, and that these two conditionswould result in better realism in the confidence judgmentsthan the Control condition where no knowledge ratingstook place. The fourth and last hypothesis concerned thesecond frequency judgment of one’s own or another’s accu-racy in Experiment 2. This judgment was hypothesized toshow an effect of anchoring in the level of the precedingfrequency judgment (irrespective of whether the precedingfrequency judgment concerned one’s own or the other’sanswers). Specifically, when a frequency judgment of one’sown answers is followed by a frequency judgment of theother’s answers (or vice versa) anchoring in the precedingfrequency judgment was expected to eliminate the own-otherdifference within subjects.

EXPERIMENT 1

Method

Participants.

One hundred and twenty university students (54women, 66 men; age

M

= 23, ranging from 19 to 30) from LundUniversity, Sweden, participated in the experiment. Each participantreceived a reward equivalent to approximately $7 for participatingin the experiment.

Design.

The experiment had two parts (see Fig. 1). Part 1 had threebetween-subjects conditions called the Own Knowledge condition,the Other’s Knowledge condition and the Control condition, with20 pairs randomly assigned to each condition.

In Part 2, the participants were reorganized into three newbetween-subjects conditions. Six or seven pairs (i.e., as close aspossible to one-third) of the pairs in each of the conditions of Part 1were randomly allocated to each of the three new conditions calledthe Own Accuracy condition, the Other’s Accuracy condition and

Page 4: Own-other differences in the realism of some metacognitive judgments

16

M. Johansson and C. M. Allwood

Scand J Psychol 48 (2007)

© 2007 The Authors. Journal compilation © 2007 The Scandinavian Psychological Associations.

the Pair condition

1

respectively, resulting in 20 pairs in each condition.This was done in order to equalize the influence of the conditionsin Part 1 of the experiment on the conditions in Part 2.

Materials.

(1)

General knowledge questions (GKQs

)

: 105 forcedchoice two-alternative (

2AFC

) GKQs, covering subject areas suchas geography, arts and history, were prepared. For each question,one of the two-answer alternatives was correct.

The 105 GKQs were divided into three question sets, each con-taining 35 GKQs. These sets were randomized to the participantswith the following constraints: each participant received two of thethree sets of GKQs – that is, a total of 70 GKQs. For the twoexperimental conditions in Part 1 of the experiment (i.e., not thecontrol condition), the first question set (henceforth, New set) differedbetween the two pair members, while the second set (henceforth,Old set) was always the same within each pair. Each of the threequestion sets occurred as equally often as possible in each of theseNew and Old positions, within- and between conditions. The New/Old manipulation controlled for the effect of making knowledgeratings and confidence judgments of another’s answers to GKQsthat the participants had also answered themselves. In the Controlcondition, these constraints were true for 20 of the pairs (hence-forth, New-Old), while for the other 20 pairs both sets were thesame within the pairs (henceforth, Old-Old). This was done in orderto control for the effect of making confidence judgments of answersto questions never seen before, after having made the precedingknowledge judgments (

cf

. the two experimental conditions).(2)

Knowledge ratings

. As outlined above, the participants in thetwo experimental conditions in Part 1 of the experiment rated theextent of their own or the other pair member’s knowledge within aspecified knowledge domain that always covered the followingGKQ.

2

That is, a knowledge rating was made in connection to eachGKQ. An example of a knowledge domain is “contemporary rockmusic”. The knowledge ratings were made on a scale ranging fromzero (defined as “No knowledge”) to ten (defined as “All know

-

ledge”) of, as the participants were told: “all existing knowledge”within the specified knowledge domain.

Procedure.

Sixty pairs were created from the 120 participants. Themembers in each pair were of the same gender and acquaintedwith one another. The age difference within the pairs was 1.15 years(

SD

= 1.34). The reason for using the matching criteria of samegender and similar age was to restrict possible effects of stereotypesof gender and age category. When recruiting the participants, wedefined “similar age” as an age difference less than 10 years. The

reason for requiring that the members in each pair were acquaintedwas that they had to be able to give ratings of the other’s knowledgein a variety of knowledge domains. The participants had knowneach other for 3–6 months and had kept private company duringthat period. The number of male and female pairs was approxim

-

ately equal between conditions.While the members in a pair participated in the experiment at the

same time, they did so independently, separated by a non-transparentscreen and with no possibility to communicate with one anotherduring the experiment.

Part 1: Knowledge ratings and confidence judgments.

First, all parti-cipants answered 70 GKQs by selecting one of two answer alterna-tives; the participants were told that one answer alternative for eachquestion was correct. Next, the participants exchanged responsesheets (including questions and answers). Then, the participants inthe

Own Knowledge condition

first rated, for each GKQ, the extentof

their own

knowledge within a specified knowledge domain thatcovered the GKQ. Secondly, immediately after the knowledge rat-ing, each participant made a confidence judgment of

the other pairmember’s

chosen answer alternative to the GKQ. The confidencejudgment was made on a scale ranging from 50% (defined as“Guessing”) to 100% (defined as “Absolutely certain”) that thechosen answer alternative was correct.

The only difference between the

Own knowledge condition

and the

Other’s knowledge condition

was that the participants in the lattercondition, for each GKQ, rated the extent of the

other pair member’s

knowledge, rather than their own knowledge.In the

Control condition

no knowledge ratings were made. Foreach GKQ the participants only made confidence judgments of the

other pair member’s

answers.In all three conditions, the participants were told that if they

regarded the answer alternative selected by the other pair memberas erroneous, they should choose (draw a circle around) the otheralternative, and confidence-rate that alternative by utilizing the scaleranging from 50% to 100%. In order to separate two pair members’differently chosen answer alternatives to the same GKQ, each pairmember was provided a unique color pencil.

Part 2: Frequency judgments.

In Part 2 of the experiment, the 40participants in the

Own Accuracy condition

individually made afrequency judgment of their own overall accuracy by writing theirjudgment on paper. Similarly, in the

Other’s Accuracy condition

, 40participants individually frequency-judged

t

he other pair member’soverall accuracy.

Calibration measures.

We used three measures of realism in theconfidence judgments: calibration, over/underconfidence (hence-forth,

overconfidence

), and resolution, see Lichtenstein, Fischhoffand Phillips (1982).

Calibration

shows the goodness of fit betweenconfidence and accuracy in terms of the (squared) differencebetween the level of the confidence judgments and the accuracy ineach of a number of confidence classes (e.g., 50–59% . . . 90–99%,and 100%). The difference between calibration and

overconfidence

isthat the latter provides a directional measure of realism inconfidence judgments, because the differences between the level ofthe confidence judgments and the accuracy in the confidence classesare not squared. Loosely speaking,

resolution

reflects the judge’sability to differentiate between correct and incorrect answers. Forcalibration and overconfidence a lower score reflects better realismthan a higher score, while for resolution a higher score denotes bet-ter realism than a lower score (the equations for calibration andresolution are given in the Appendix).

In order to obtain a measure of the realism in the frequencyjudgments, we used a

difference score

that was defined as the differ-ence between the level of the frequency judgment and the target’sactual accuracy. This measure indicates the degree of deviation(under- or overestimation) from perfect realism (i.e., zero).

Fig. 1. The design of Experiment 1.Notes: Arrows indicate random allocation of pairs of participantsto the conditions in each part of the experiment. Dotted lines indi-cate that the pairs allocated to the “Pair” condition in Part 2 werenot analyzed.

Page 5: Own-other differences in the realism of some metacognitive judgments

Scand J Psychol 48 (2007)

Own-other differences in metacognitive judgments

17

© 2007 The Authors. Journal compilation © 2007 The Scandinavian Psychological Associations.

Results

Own-other difference in the knowledge ratings.

We first reporton the knowledge ratings. The average knowledge rating ofone’s own knowledge was

M

= 3.91 (

SD

= 1.51) and the averagerating of the other’s knowledge was

M

= 5.42 (

SD

= 1.54).A 2 (Condition: Own Knowledge

vs

. Other’s Knowledge)

×

2 (Set: New set

vs

. Old set) repeated measures ANOVAwas carried out in order to analyze the own-other differencein knowledge ratings. Two significant main effects resulted,while there was no interaction effect. First, the effect of theCondition factor,

F

(1, 78) = 19.37,

p

< 0.001, showed thatthe Own Knowledge condition (

M

New

set = 4.06,

SD

= 1.53;and

M

Old

set = 3.77,

SD

= 1.57) resulted in markedly lowerknowledge ratings than did the Other’s Knowledge con-dition (

M

New

set = 5.44,

SD

= 1.48; and

M

Old

set = 5.39,

SD

= 1.65). That is, the participants rating their own knowledgegave lower ratings than did the participants rating the otherpair member’s knowledge. Unexpectedly, the effect of the Setfactor,

F

(1, 78) = 6.80,

p

= 0.011, indicated that the Newset was associated with higher knowledge ratings than theOld set.

Own-other difference in the level of, and realism in, the frequency judgments.

The average frequency judgment of one’s ownaccuracy was

M

= 0.670 (

SD

= 0.124), and of the other’saccuracy,

M

= 0.732 (

SD

= 0.166), which was found to bea marginally significant difference (

p

= 0.06). No significantdifference in actual accuracy was found between the OwnAccuracy condition (

M

= 0.693;

SD

= 0.062) and theOther’s Accuracy condition (

M

= 0.668;

SD

= 0.077).A measure of realism in the frequency judgments was

computed for each participant as frequency judgment minusactual accuracy. An independent samples

t

-test showed thatthe Own Accuracy condition (

M

=

0.023;

SD

= 0.114) andthe Other’s Accuracy condition (

M

= 0.065;

SD

= 0.135)differed significantly in the degree of realism in frequency

judgments,

t

(78) = 3.15,

p

= 0.002. Moreover, separateone-sample

t

-tests (test-value = 0, i.e., perfect realism)showed that although the Own Accuracy condition tendedto show underestimation, this was far from significantly so,while the Other’s Accuracy condition resulted in significantoverestimation,

t

(39) = 3.04,

p

= 0.004.

Realism in the confidence judgments of the other’s knowledge.

Inorder to analyze the impact of the knowledge ratings on theconfidence judgments we conducted a 3 (Condition: OwnKnowledge

vs

. Other’s Knowledge

vs

. Control)

×

2 (

Set

: Oldset

vs

. New set) repeated measures ANOVA for each of thedependent measures calibration, overconfidence and resolu-tion. No significant effects were found. It might be notedthat for the Control condition, independent samples

t

-testsshowed no significant differences between the New set in theNew-Old combination and the first Old set in the Old-Oldcombination. For easy comparison, the results of each dependentmeasure given in Table 1 are the collapsed means of the Newset and the Old set. In Table 1 the results for calibration,overconfidence and resolution are reported, along withmean accuracy and mean confidence.

As described in the Methods section, the participants alsohad the possibility to disagree with the other person’s answers.On average, the participants disagreed on

M

= 17.1% (

SD

= 8.7)of the items in the New set and

M

= 21.8% (

SD

= 10.8) ofthose in the Old set. When removing the items about whichthe participants disagreed still no significant effects werefound in the ANOVA described just above. The result valuesfor the

agree items

are given in the

Agree

rows, Table 1.

Discussion

The results of Experiment 1 supported our first hypothesisthat the participants who rated their own knowledge wouldgive lower knowledge ratings than the participants that ratedanother person’s knowledge. These results provide further

Table 1. Experiment 1: Means (and standard deviations) of the dependent measures for the three conditions

Measure Items

Condition

Observer’s knowledge Actor’s knowledge Control

Calibration All 0.046 (0.023) 0.043 (0.022) 0.044 (0.023)Agree 0.051 (0.025) 0.049 (0.024) 0.048 (0.025)

Overconfidence All 0.059 (0.074) 0.052 (0.088) 0.060 (0.082)Agree 0.039 (0.085) 0.047 (0.094) 0.045 (0.088)

Resolution All 0.043 (0.018) 0.036 (0.016) 0.045 (0.020)Agree 0.045 (0.020) 0.041 (0.022) 0.045 (0.021)

Accuracy All 0.699 (0.075) 0.696 (0.090) 0.719 (0.070)Agree 0.737 (0.088) 0.712 (0.095) 0.758 (0.074)

Confidence All 0.758 (0.049) 0.749 (0.068) 0.779 (0.066)Agree 0.776 (0.054) 0.759 (0.072) 0.803 (0.075)

Notes: n = 40 per condition. All in the Items column refers to the occasions where the pair members agreed and disagreed, while Agree refers to the occasions where the participants agreed.

Page 6: Own-other differences in the realism of some metacognitive judgments

18

M. Johansson and C. M. Allwood

Scand J Psychol 48 (2007)

© 2007 The Authors. Journal compilation © 2007 The Scandinavian Psychological Associations.

support for Kruger’s (1999) theory about the below-averageeffect. This is elaborated on in the General discussion.Parenthetically, our results for the mean knowledge ratingsof the Own Knowledge condition were higher than thosereported by Allwood and Granhag (1996). However, it is notclear how this difference may be explained.

Our third hypothesis predicted that the Own Knowledgecondition would result in better realism in the confidencejudgments than the Other’s Knowledge condition, which inturn was expected to result in better realism than the Con-trol condition, where no knowledge ratings were made. Thishypothesis was not supported. The results showed that therealism in the confidence judgments was not influenced bythe knowledge ratings. Possible reasons for this are discussedin the General discussion.

Experiment 1 also examined the difference betweenfrequency-judging one’s own and another person’s overallaccuracy. The second hypothesis was that the participantswould give lower frequency judgments of their own than ofthe other pair member’s overall accuracy. The hypothesis wassupported in that the Own Accuracy condition was associatedwith lower level frequency judgments (p = 0.06) and morerealistic frequency judgments than the Other’s Accuracycondition. In addition, whereas the Own Accuracy conditionwas associated with fairly realistic frequency judgments, theOther’s Accuracy condition showed overestimation.

EXPERIMENT 2

In Experiment 2, we further investigated the own-other dif-ference in the realism in frequency judgments by using adesign in which all participants made a frequency judgmentof their own and another person’s accuracy. The order ofthese two frequency judgments was counterbalanced. Thepurpose of this experiment was to analyze the robustness of theown-other difference by studying its possible susceptibilityto an anchoring effect of the first frequency judgment on thesecond.

The participants’ first frequency judgment was a test ofour second hypothesis that frequency judgments of one’sown accuracy would be lower and more realistic than fre-quency judgments of the other pair member’s accuracy. Themaking of a second frequency judgment tested the fourthhypothesis that the level of the second frequency judgmentwould be anchored in the first, and consequently that theown-other difference would be eliminated within subjects.

Method

Participants. Sixty-eight students (46 women, 22 men) participatedin the experiment. Information about the age of two participantswas missing. The mean age was 23 years (n = 66), ranging from 20to 34 (the average age difference within the pairs was 2.6 years, SD= 2.1). The participants were rewarded approximately $7 for theirparticipation.

Design and materials. The experiment had two conditions, referredto as the Own-Other’s and the Other’s-Own condition respectively.Seventy 2AFC GKQs were used in the experiment. The majority ofthese questions were also used in Experiment 1.

Procedure. The participants were first matched into pairs with theconstraints that the members in a pair were of the same gender, andthat the age difference between the members of a pair was lowerthan 10 years. In the context of this experiment, it was not judgedessential that the pair members were acquainted with one another. Thepairs were randomized to the two conditions. In the Own-Other’scondition (n = 36) there were 24 women and 12 men, and in theOther’s-Own condition (n = 32) there were 22 women and 10 men.The experiment had two phases. In both phases the pair membersperformed their tasks at the same time, but independently, withoutcommunicating with each other and separated by a non-transparentscreen.

(1) Own-Other’s condition. In Phase 1 of the Own-Other’s condi-tion, each participant first answered the 70 GKQs individually,by selecting an answer-alternative for each question. Next, on aseparate response sheet, each pair member indicated for each ques-tion whether s/he regarded the answer given in the previous answersheet to be correct or incorrect. This part of the task was similarbetween the two conditions and was intended to enhance the prob-ability that the participants, in both conditions, actually consideredthe answers provided, rather than only reading what the answerswere. Finally, in this phase, each pair member made a frequencyjudgment of the total number of questions answered correctly inthe answer sheet.

In Phase 2, the pair members first exchanged answer sheets. Next,the participants indicated on a new separate response sheet, for eachquestion, whether s/he regarded the answer chosen by the other pairmember to be correct or incorrect. Finally, in this phase, each pairmember made a frequency judgment of the accuracy obtained bythe other pair member in his or her answer sheet.

(2) Other’s-Own condition. The only differences between theOther’s-Own condition and the Own-Other’s condition was that theparticipants in the Other’s-Own condition exchanged answer sheetsdirectly after having answered the 70 GKQs, and made a frequencyjudgment of the accuracy of the other’s answers in Phase 1 and theirown in Phase 2.

Results

No significant difference in actual accuracy was foundbetween the Own-Other’s condition (M = 0.630, SD = 0.081)and the Other’s-Own condition (M = 0.634, SD = 0.068). A2 (Condition) × 2 (Phase) repeated measures ANOVA wascarried out in order to compare the level of realism (i.e.,frequency judgment – actual accuracy) between conditionsand phases. The results only showed a significant main effectof the Condition factor, F(1, 66) = 26.79, p < 0.001. As canbe seen in Table 2, this effect shows that the Own-Other’scondition resulted in markedly more realistic frequencyjudgments than did the Other’s-Own condition, which wasassociated with overestimation.

To further explore the difference in realism between theconditions, a separate one-sample t-test (test-value = 0, i.e.,perfect realism) was carried out for each of the two phasesin each condition. The results showed that none of the fre-quency judgments in the Own-Other’s condition deviatedsignificantly from perfect realism. In contrast, both frequency

Page 7: Own-other differences in the realism of some metacognitive judgments

Scand J Psychol 48 (2007) Own-other differences in metacognitive judgments 19

© 2007 The Authors. Journal compilation © 2007 The Scandinavian Psychological Associations.

judgments in the Other’s-Own condition differed significantlyfrom perfect realism (t-values > 6.8, p-values < 0.001) byshowing overestimation.

Discussion

The results of Experiment 2 confirmed our hypothesis thatthe first frequency judgment of one’s own accuracy would belower than that of the other’s. We also found support for thehypothesis that the own-other difference in the second fre-quency judgment would be eliminated within subjects. In theOwn-Other’s condition, the participants’ frequency judg-ments of both their own and of the other’s accuracy showedgood realism, whereas the Other’s-Own condition showedoverestimation for both judgments. These results suggestclear signs of anchoring effects, in that, comparing betweenthe two conditions, the own-other difference was in factreversed in the second frequency judgment.

GENERAL DISCUSSION

The present study investigated differences in judgments ofone’s own and other’s knowledge (the own-other difference).These judgments concerned the extent of knowledge in agiven domain (knowledge ratings) and judgments of the numberof questions correctly answered (frequency judgments). Wealso investigated the stability of confidence judgments( judgments of the correctness of an answer to a specificknowledge item) and frequency judgments by studying howthey were affected by prior judgments. The results will bediscussed in this order.

Experiment 1 showed that the participants gave lower leveljudgments of the extent of their own knowledge as comparedto how they rated the extent of the knowledge of the otherperson. Likewise, the participants, in both Experiment 1 and2, in their first frequency judgments rated their own number ofcorrectly answered questions as lower than the number ofcorrect answers by the other person. These results supportedour first and third hypotheses, and were also consistent withKruger’s (1999) conception of the below-average effect.

On a general level, the present results indicate that thebelow-average effect holds not only for direct comparativejudgments (studied by Kruger, 1999), but also for indirectcomparative judgments (studied in the present research).This can be seen as a strong test of Kruger’s theory, since ithas been argued that indirect comparisons may be moreconservative than direct ones (Otten & van der Pligt, 1996).Interpreted in terms of Kruger’s theory (1999) about thebelow-average effect, the participants may have rated theirown knowledge as lower than that of their pair member’sdue to the perceived high threshold for successful performancein the task. It remains for future research to corroborate thisinterpretation of our result.

However, since our results show that our participants onaverage rated themselves as less knowledgeable than anotherperson, the results are problematic for conclusions fromcross-cultural psychology stating that Westerners tend toshow optimistic bias in the form of an above-average effect,that is, that people on average think they have better abilitiesthan the average person (e.g., Heine & Lehman, 2004). Infuture research it may be useful to more thoroughly evaluate thevalidity of the above-average effect in cross-cultural research.

Our results also relate to the realism of the frequencyjudgments, that is, the extent to which they coincide with theparticipant’s actual level of accuracy in their answers. Here,the results showed that the frequency judgments of one’sown answers exhibited good realism, and markedly betterrealism than the frequency judgments of the other’s answers,which resulted in overestimation. The result showing over-estimation of the other’s accuracy is intriguing in that previousresearch, which has always investigated the frequency judg-ments of one’s own accuracy, has often reported realistic orunderestimated frequency judgments (e.g., Allwood & Granhag,1996; Gigerenzer et al., 1991; Griffin & Tversky, 1992).

Stability in confidence and frequency judgments

The present study also tested the stability of confidence andfrequency judgments about others’ knowledge. The resultsfor the two judgments differed and will be discussed in turn.

Table 2. Experiment 2: Means (and standard deviations) of frequency judgments and realism in frequency judgments for each target (i.e., Ownand Other) for each condition

Target

Condition

Own-Other’s Other’s-Own

Frequency j. Realism Frequency j. Realism

Own accuracy 0.637 (0.153) 0.007 (0.116) 0.787 (0.125) 0.153 (0.127)Other’s accuracy 0.667 (0.121) 0.037 (0.134) 0.774 (0.094) 0.140 (0.098)

Notes: n = 36 in the Own-Other’s condition, and n = 32 in the Other’s-Own condition. Own accuracy = Phase 1 in the Own-Other’s condition and Phase 2 in the Other’s-Own condition. Other’s accuracy = Phase 2 in Own-Other’s and Phase 1 in the Other’s-Own condition. Frequency j. = frequency judgments, i.e., estimated accuracy. Realism = difference between frequency judgment and actual accuracy.

Page 8: Own-other differences in the realism of some metacognitive judgments

20 M. Johansson and C. M. Allwood Scand J Psychol 48 (2007)

© 2007 The Authors. Journal compilation © 2007 The Scandinavian Psychological Associations.

The results showed that rating the extent of one’s own orthe other’s knowledge did not influence the realism in one’sconfidence judgments of the other’s knowledge assertions.That is, the own-other difference found in the knowledgeratings was not associated with differences in the realism inthe following confidence judgments of the other person’sknowledge assertions. Nor did the two knowledge ratingconditions differ from the control condition. This meansthat our third hypothesis was not supported.

It can be noted that Allwood and Granhag (1996;Granhag et al., 1999) found that knowledge ratings improvedthe realism in confidence judgments when the knowledgeratings concerned one’s own knowledge and the followingconfidence judgments concerned one’s own knowledge asser-tions. In contrast to these studies, the confidence judgmentsin the present study concerned another person’s knowledgeassertions. We speculate that compared with the previousresearch the null result in the present study might reflect thatit was the knowledge assertions by another person and notone’s own that were confidence-judged. One possibility isthat the assumed “sobering effect” of the knowledge ratingsin the present study was offset by the participants’ greaterfaith in the extent of the other’s knowledge, compared withthat of their own. Another possibility is that the participantssimply failed to perceive the task as involving judgments ofanother person’s knowledge.

We next turn our attention to the stability of the frequencyjudgments. Although we found a clear own-other differencein the realism in the frequency judgments, these judgmentsdid, however, show a relatively low level of robustness, asindicated by the additional results of Experiment 2. Theresults clearly showed that the second frequency judgmentwas anchored in the level of the one made first. Moreover,the results showed that first making a frequency judgment ofone’s own accuracy, and thereafter of the other pair mem-ber’s accuracy, resulted in markedly better realism in bothof these judgments than did the inverse order. These resultsare in line with our fourth hypothesis that the own-other dif-ference in the frequency judgments would be eliminated forthe second frequency judgment when a frequency judgmentof one’s own answers followed a frequency judgment ofthe other’s answers (or vice versa). In fact, our results showthat the own-other difference was actually reversed in thesecond frequency judgment. These results also have relevancefor the interpretation of cross-cultural findings about theabove-average effect, and it would be of interest to testwhether other research results in this research area are equallyfragile.

In brief, the present study suggests that the own-otherdifference in knowledge ratings and frequency judgmentsappears to be contingent on the below-average effect. Ourstudy provides further support for Kruger’s (1999) theoryabout the below-average effect, and it contributes theoreticallyby suggesting that judgmental anchoring may be importantalso in the context of indirect comparisons. In addition, the

fact that the own-other difference in the frequency judgmentswas successfully eliminated (and reversed) within subjectsfurther suggests that below- and above-average effects arefragile, and not as stable as appears to be assumed in cross-cultural psychological research.

This research was supported by a grant from the Swedish ResearchCouncil.

NOTES1 However, no further report will be given on the Pair condition since

it falls outside the focus of the present study.2 Three items out of a total of 5.600 were treated as missing for

participants that had omitted making a knowledge rating of those items.Likewise, six items out of a total of 8.400 were treated as missing forparticipants who had omitted making a confidence judgment of theanswer.

REFERENCES

Alicke, M. D., Klotz, M. L., Breitenbecher, D. L., Yurak, T. J. &Vredenburg, D. S. (1995). Personal contact, individuation, andthe better-than-average effect. Journal of Personality and SocialPsychology, 68, 804–825.

Allwood, C. M. & Granhag, P. A. (1999). Feelings of confidenceand the realism of confidence judgments in everyday life. InP. Juslin & H. Montgomery (Eds.), Judgment and decision making:Neo-Brunswikian and process tracing approaches (pp. 123–146).Hillsdale, NJ: Lawrence Erlbaum.

Allwood, C. M. & Granhag, P. A. (1996). Considering the know-ledge you have: Effects on realism in confidence judgements. TheEuropean Journal of Cognitive Psychology, 8, 235–256.

Allwood, C. M. & Johansson, M. (2004). Actor-observer differencesin realism in confidence and frequency judgments. Acta Psycho-logica, 117, 251–274.

Chambers, J. R. & Windschitl, P. D. (2004). Bias in social com-parative judgments: The role of non-motivated factors inabove-average and comparative-optimism effects. PsychologicalBulletin, 130, 813–838.

Chapman, G. B. & Johnson, E. J. (2002). Incorporating the irrele-vant: Anchors in judgments of belief and value. In T. Gilovich,D. Griffin & D. Kahneman (Eds.), Heuristics and biases: Thepsychology of intuitive judgment (pp. 120–138). Cambridge, UK:Cambridge University Press.

Epley, N. & Gilovich, T. (2002). Putting adjustment back into theanchoring and adjustment heuristic. In T. Gilovich, D. Griffin &D. Kahneman (Eds.), Heuristics and biases: The psychology ofintuitive judgment (pp. 139–149). Cambridge, UK: CambridgeUniversity Press.

Gigerenzer, G., Hoffrage, U. & Kleinbölting, H. (1991). Probabilisticmental models: A Brunswikian theory of confidence. PsychologicalReview, 98, 506–528.

Granhag, P. A., Strömwall, L. A. & Allwood, C. M. (1999).Confidence judgments processes: Selective but possible to prime.In Metacognition: Process, Function and Use (pp. 46–52),France: Clermont Ferrand.

Griffin, D. & Tversky, A. (1992). The weighing of evidence and thedeterminants of confidence. Cognitive Psychology, 24, 411–435.

Heine, S. J. & Lehman (2004). Move the body, Change the self:Acculturative effects on the self-concept. In M. Schaller &Crandall C. S. (Eds.), The psychological foundations of culture.Mahwah, NJ: Lawrence Erlbaum.

Page 9: Own-other differences in the realism of some metacognitive judgments

Scand J Psychol 48 (2007) Own-other differences in metacognitive judgments 21

© 2007 The Authors. Journal compilation © 2007 The Scandinavian Psychological Associations.

Klar, Y., Medding, A. & Sarel, D. (1996). Nonunique invulnerability:Singular versus distributional probabilities and unrealistic optimismin comparative risk judgments, Organizational Behavior andHuman Decision Processes, 67, 229–245.

Kruger, J. (1999). Lake Wobegon Be Gone! The “below-averageeffect” and the egocentric nature of comparative ability judgments.Journal of Personality and Social Psychology, 77, 221–232.

Kruger, J. & Dunning, D. (1999). Unskilled and unaware of it: Howdifficulties in recognizing one’s own incompetence lead to inflatedself-assessments. Journal of Personality and Social Psychology,77, 1121–1134.

Liberman, V. (2004). Local and global judgments of confidence.Journal of Experimental Psychology: Learning, Memory & Cog-nition, 30, 729–732.

Lichtenstein, S., Fischhoff, B. & Phillips, L. D. (1982). Calibrationof probabilities: The state of the art in 1980. In D. Kahneman,P. Slovic & A. Tversky (Eds.), Judgment under uncertainty:Heuristics and biases (pp. 306–334). Cambridge: CambridgeUniversity Press.

Markus, H. R. & Kitayama, S. (1991). Culture and self: Implica-tions for cognition motivation and motivation. PsychologicalReview, 98, 224–253.

Mussweiler, T. (2001). The durability of anchoring effects. EuropeanJournal of Social Psychology, 31, 431–442.

Mussweiler, T. & Neuman, R. (2000). Source of mental contamina-tion: Comparing the effects of self-generated versus externallyprovided primes. Journal of Experimental Social Psychology, 36,194–206.

Mussweiler, T. & Strack, F. (1999). Hypothesis-consistent testingand semantic priming in the anchoring paradigm: A selectiveaccessibility model. Journal of Experimental Social Psychology,35, 136–164.

Nickerson, R. S. (1999). How we know – and sometimes misjudge– what others know: Imputing one’s own knowledge to others.Psychological Bulletin, 125, 737–759.

Otten, W. & Van der Pligt, J. (1996). Context effects in the measure-ment of comparative optimism in probability judgments. Journalof Social and Clinical Psychology, 15, 80–101.

Sniezek, J. A. & Buckley, T. (1991). Confidence depends on level ofaggregation. Journal of Behavioral Decision Making, 4, 263–272.

Sniezek, J. A. & Buckley, T. (1993). Becoming more or less un-certain. In J. Castellan (Ed.), Current Issues in Individual andGroup Research. Hillsdale, NJ: Lawrence Erlbaum.

Sniezek, J. A., Paese, P. W. & Switzer, F. S. (1990). The effect ofchoosing on confidence in choice. Organizational Behavior andHuman Decision Processes, 46, 264–282.

Svenson, O. (1981). Are we all less risky and more skillful than ourfellow drivers? Acta Psychologica, 47, 143–148.

Treadwell, J. R. & Nelson, T. O. (1996). Availability of informationand the aggregation of confidence in prior decisions. Organiza-tional Behavior and Human Decision Processes, 68, 13–27.

Van Yperen, N. W. (1992). Self-enhancement among major-leaguesoccer players: The role of importance and ambiguity on socialcomparison behavior. Journal of Applied Social Psychology, 22,1186–1198.

Received 16 February 2005, accepted 26 October 2006

APPENDIX

(1)

In (1), n is the total number of questions answered, T is thenumber of confidence classes used, ct is the proportion cor-rect for all items in the confidence class rt, nt is the numberof times the confidence class rt was used and rtm is the meanof the confidence ratings in confidence class rt.

(2)

In (2), c is the proportion of all items for which the correctalternative was selected. A higher value reflects better reso-lution than a lower value.

Calibration n n r ctt

T

tm t / ( )= −=∑1

1

2

Resolution n n c ctt

T

t / ( )= −=∑1

1

2