information projection: model and applications. -...

45
Information Projection: Model and Applications. Abstract Evidence from both psychology and economics shows that people underestimate informational differences. I model such information projection by assuming that after processing a signal, a person overestimates the probability with which this signal is available to others. I apply the model to agency and communication settings. When learning about an expert’s skill using ex-post information, a biased evaluator exaggerates how much a skilled expert could have known ex ante, and hence underestimates the expert on average. To minimize such underestimation, ex-ante experts will be too reluctant to produce useful information that will be seen more clearly by the evaluator ex post, and too eager to gather information that the evaluator will independently learn ex post. I also show that information projection introduces noise into evaluations, decreasing an expert’s incentive to exert effort and lowering the optimal use of monitoring relative to the Bayesian case. Evidence from, and applications to, medical malpractice, liability regulation, and effective communication are discussed. Keywords: Hindsight Bias, Curse of Knowledge, Internal Labor Markets, Medical Malpractice, Communication, Retrospectroscope. 1

Upload: truongphuc

Post on 29-Mar-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Information Projection: Model and Applications.

Abstract

Evidence from both psychology and economics shows that people underestimate informational

differences. I model such information projection by assuming that after processing a signal, a person

overestimates the probability with which this signal is available to others. I apply the model to

agency and communication settings. When learning about an expert’s skill using ex-post information,

a biased evaluator exaggerates how much a skilled expert could have known ex ante, and hence

underestimates the expert on average. To minimize such underestimation, ex-ante experts will be too

reluctant to produce useful information that will be seen more clearly by the evaluator ex post, and

too eager to gather information that the evaluator will independently learn ex post. I also show that

information projection introduces noise into evaluations, decreasing an expert’s incentive to exert

effort and lowering the optimal use of monitoring relative to the Bayesian case. Evidence from, and

applications to, medical malpractice, liability regulation, and effective communication are discussed.

Keywords: Hindsight Bias, Curse of Knowledge, Internal Labor Markets, Medical Malpractice,

Communication, Retrospectroscope.

1

1 Introduction

The study of how asymmetric information affects economic activity, typically builds on the assumption

that people perceive informational differences correctly. Evidence shows however, that people system-

atically mispredict informational differences and exaggerate the similarity between the information they

have and information available to others. For example, having learned novel information about a patient,

one may well exaggerate whether an attentive physician should have diagnosed cancer earlier.

In this paper, I model such information projection by assuming that having processed a signal, a

biased person exaggerates the probability with which this signals is also available to others. I show that

as a result, a biased examiner will be too pessimistic about the skill of the agents she evaluates. In turn,

agents will have incentives to change the type of information they produce to mitigate the adverse effects

of hindsight bias on their reputation. I also investigate how this bias affects the optimal use of incentives,

communication and social inference.

In the context of financial markets, Camerer, Loewenstein, and Weber (1989) provide laboratory

evidence that better informed traders overestimate how much uninformed traders know, and that such

curse of knowledge affects market outcomes. In Section 2, I review both controlled laboratory, and more

stylized field evidence to support my claim that information projection is a widespread phenomenon. In

Section 3, I develop a formal model of information projection building on CWL (1989).

In Sections 4 and 5, I turn to the main application of the paper: the influence of information projection

on performance evaluation. To illustrate the results, consider a radiologist who diagnoses a patient based

on an ambiguous X-ray. After the diagnosis is made, the patient returns with novel symptoms and an

evaluator is asked to assess the radiologist’s original diagnosis. A biased evaluator projects ex-post

information, and acts as if all radiologists should have guessed the symptoms earlier. This leads to two

types of inferential mistakes: underestimation, and over- and under-inference.

Assume that radiologists differ in skill, and skilled ones understand the X-ray and unskilled ones

do not. If more ex-ante information increases the chances of an ex-post successful treatment, a biased

evaluator exaggerates the success rate for both types of radiologists. In hindsight, a successful treatment

becomes the norm and a failed one becomes a surprise. If the probability of failure decreases with

skill, the evaluator thus underestimates the radiologist skill on average. The ’surprisingly’ high failure

to success ratio is perceived to be the result of the lack of skill, rather than the lack of sufficient ex-ante

1

information.

While the evaluator underestimates the agent on average, information projection will typically affect

her conditional beliefs as well. Whenever knowing the symptoms ex-ante would have increased the

chances of a successful diagnosis for a skilled type than for an unskilled type, the evaluator over-infers

skill from performance. For example, if the symptoms alone are uninformative, but combined with

the X-ray they are perfectly indicative of cancer, a biased evaluator perceives differences in luck to

be differences in skill. If however, knowing the symptoms alone is almost perfectly informative, and

hence the probability of a failed treatment depends very little on understanding the X-ray, the evaluator

perceives differences in performance to be due to differences in luck. Here, she underinfers skill from

performance.

Given these results, a natural question to ask is how the radiologist might change his behavior to

minimize the adverse effects of information projection on his reputation. Evidence suggests that ex-

perts are often aware that those evaluating them are biased. For example, it is argued that “defensive

medicine”, medical procedures designed to minimize false liability rather than maximize cost-effective

health care, is due partly to the fear of experts that those evaluating them will suffer from hindsight bias.

To study such behavior, assume that the radiologist can decide what radiographs to order ex ante. I

show that if a radiograph is a substitute of the ex-post information, i.e., it provides information ex-ante

that the evaluator will independently learn ex post, then the radiologist has an incentive to over-produce

this radiograph. Overly costly MRI’s might be ordered for all patients if such MRI’s produce the same

information that the evaluator inevitably learns ex post. At the same time, the radiologist is too reluctant

to produce complement information, i.e., radiographs that help him make a good diagnosis but can be

interpreted better in light of ex-post information. He might avoid ordering a mammography that helps

detect breast cancer if he fears it can be interpreted much better in hindsight than in foresight. Thus as a

result of information projection, increasing the likelihood of ex-post evaluations could increase produc-

tion costs, lower productivity and exacerbate the types of over- and underproduction of information that

observers have attributed to medical malpractice regulation.1

If the management of a hospital is aware of evaluators propensity for hindsight bias, it can correct this

mistake to some extent. In important situations however, even the perfect anticipation of the evaluator’s1See Studdert et al. (2005), Kessler, and McClellan (2002) and Jackson, and Righi (2006).

2

bias cannot eliminate inefficiencies. To show this, in Section 5, I turn from a context where the amount

of information that the radiologist learns from an X-ray is a function of effort rather than of skill.

To motivate the radiologist, a hospital may provide incentives to encourage a careful reading of the

X-ray. When the radiologist’s effort is not observed however, he might be rewarded and punished based

solely on whether the patient’s condition improved or deteriorated. In cases of limited liability or risk-

averse radiologists, no such reward scheme can be first-best optimal. A second-best scheme may instead

involve monitoring whether the radiologist’s diagnosis accorded with the information that was available

to him ex-ante.

A biased evaluator, however, is prone to judge the correctness of the diagnosis not on the basis of the

ex-ante available information, but on the basis of both the ex-ante and the ex-post information. Thus the

radiologist is punished too often for bad luck and rewarded too rarely for good decisions. As a result, the

radiologist’s incentive to carefully understand the X-ray is lower than under a Bayesian evaluator. More

generally, an agent who is de jure facing a negligence rule is de facto punished and rewarded according

to strict liability if he is assessed by a biased judge or jury. I show that the report of a biased evaluator

contains too much noise and hence even if the hospital anticipates the bias, it has a reason to monitor less

often than in the rational case. I also show that if the hospital does rely on biased reports, it nevertheless

decides to induce lower levels of effort to save on incentives that are appropriate in the rational case, but

too strong in the biased case.

In Section 6, I turn to the influence of information projection on communication. I show that a

listener who projects his private information will be too credulous of a speaker’s advice because he

overestimates how much the speaker knows. I also show that a speaker who projects information on her

non-communicable background knowledge, will mistakenly send messages that are too ambiguous for

her audience to interpret. I identify conditions for over- and under-communication.

Finally, in Section 7, I conclude with a brief discussion of some further implications and extensions

of my model. I discuss how information projection might affect social inferences in networks causing

hostility between groups, as well as the possibility of extending my model to capture the related phe-

nomenon of ignorance projection, where a person who does not observe a signal underestimates the

probability that this signal is available to others.

3

2 Evidence and Related Literature

Folk wisdom has for a long time recognized the existence of what I call information projection, as noted

by the common refrain, ”hindsight is 20/20”. I begin this section by discussing both laboratory and more

stylized evidence on two closely related phenomena: hindsight bias – the phenomenon that people form

biased judgements in hindsight relative to foresight –, and the curse of knowledge – the phenomenon

that informed people overestimate the information of those uninformed.2 I then turn to a brief summary

of some evidence on related biases lending support to the existence of the projection of various forms

of private information. Although individual studies are often subject to alternative interpretations, the

sum-total of the studies provides a compelling case for the widespread existence of this phenomenon.

The presence of information projection in experimental financial markets was demonstrated by

Camerer, Loewenstein, and Weber (1989). A group of Wharton and Chicago MBA students traded

assets of eight different companies in a double-oral auction. Traders were divided into two groups. In

the first group, traders were presented with the past earnings history of the companies (not including

1980) and traded assets that yielded returns in proportion to the actual 1980 earnings of these compa-

nies. In the second group, traders received the same information, and in addition they also learned the

actual 1980 earnings of the companies. By design, returns for traders in the second group depended on

the market price established by those in the first group. Therefore to maximize earnings, better-informed

traders had to guess as correctly as possible the market price at which less-informed traders traded these

assets. If traders in the second group project their information, then their guesses and hence the price at

which they trade are significantly different from the market price established by the first group. CLW

(1989) finds that the guesses of better-informed traders were biased by 60% towards the actual 1980

earnings and market prices were biased by 30%.3 The reason why the bias in the market was lower than

in judgements is that traders with a smaller bias traded more aggressively. Less biased traders behaved

as if they had anticipated that others would project information.

Further evidence comes from the experimental study of Loewenstein, Moore, and Weber (2006)

who build on CLW (1989). They study the curse of knowledge using a set of visual recognition tasks.2Hindsight bias studies involve both between-subject and within-subject designs. In the latter, participants have to re-

call their own prior estimates after being presented with new evidence. Since my focus in this paper is on interpersonalinformation projection, I concentrate on the between-subject designs.

3CLW do not report the numbers explicitly, only graphically, so these are approximate. See CLW (1989) pp.1241.

4

In these tasks, subjects are presented with two pictures that differ in one crucial detail. LMW (2006)

divide subjects into three groups: uninformed, informed, and choice. In the uninformed condition, no

additional information is available besides the two pictures. In the informed condition, the difference

between the pictures is highlighted for the subjects. In the choice condition, subjects could decide

whether to obtain additional information for a small fee, or remain uninformed. After looking at the

pictures, the subjects in each group are asked to guess what fraction of people in the uninformed group

could tell the difference between the two pictures. Subjects are compensated based on how well they

predicted this fraction.

As Figure 1 indicates, the informed subjects’ mean estimate was significantly higher than the unin-

formed subjects’ mean estimate. Importantly, a significant portion of the people in the choice condition

paid for additional information. In this group, the mean estimate was 55.4%, while the mean estimate of

subjects who chose to remain uninformed was 34.6%. Hence people not only projected their additional

information, but also paid for information that biased their judgements in a way that lower their earnings.

The work of Fischhoff (1975) initiated research on hindsight bias. He showed that reporting an out-

come of an uncertain historical event increases the perceived ex-ante likelihood of the reported outcome

occurring. Fischhoff’s findings were replicated by a plethora of studies, and most of these find a strong

presence of such hindsight bias, often larger than the one found in this initial study. These studies and the

meta-analyses building on them also show that the presence of hindsight bias is robust to a great number

of debiasing techniques. A robust comparative static result is that the more informative the outcome

the greater is the bias, Harley et al. (2004). As I demonstrate in Section 3, my model of information

projection exhibits the same monotonicity.

5

Less controlled evidence comes from more explicit field studies. In the context of liability judge-

ments, there is a wealth of evidence that juries and experienced judges fail to ignore superior informa-

tion and instead form judgements as if the defendant had information that was unavailable at the time

he acted. Experiments have demonstrated the existence of information projection in the evaluation of

ex-ante judgements of various experts. Anderson et al. (1997) documented the existence of the bias in

judges deciding on cases of auditors’ liability where auditors failed to predict the financial problems of

their audit clients. Caplan, Posner, and Cheney (1991) conducted a study with 112 practicing anesthesi-

ologists. Here physicians saw identical case histories but were either told that the case ended in minor

or were told that it ended in severe damages. Those who were told that a severe damage occurred were

more likely to judge the ex-ante care to be negligent. In certain cases, the difference in the frequency

of ruling negligence was as great as 51%. Bukszar and Terry (1988) demonstrate hindsight bias in the

solution of business case studies, Hastie, Schkade, and Payne (1999) document very serious biases in

jurors’ judgement of punitive liability. Strong effects were found among others in experiments on the as-

sessment of railroad accidents, legality of search, evaluation of military officers, etc. For survey articles

on the evidence, see e.g., Harley (2007).4

A large set of other psychological findings further indicate that people project various types of private

information. For example, a study by Gilovich, Medvec, and Savitsky (1998) shows that people greatly

overestimate the probability that their lies, once made, are detected by others.5 Such overestimation was

also documented in the context of communication. In a set of experiments, Kruger et al. (2005) found

that when people communicate through email, they overestimate how well their intent is transmitted

through their messages.6 Here, senders had to make serious and sarcastic statements either through

email or voice recording, and then guess the probability that receivers would be able to understand their

intent. As Figure 2 shows, the mean estimate for both those sending an email and those sending a

voice recording was 78%, while the actual probabilities were 73% in the voice condition and 58% in the

email condition. Kruger et al. (2005) also conduct an experiment where they ask subjects in the email

condition to vocalize their messages before sending them. Senders are again randomly divided into two4The legal profession has long recognized this bias and developed certain procedures to mitigate its effects. One such

procedure is the bifurcation of trials, where ex-post evidence is suppressed at the initial phases of the trial. More on this seeRachlinski (1998).

5Illusion of transparency was also studied in the context of negotiations, Van Boven, Gilovich, and Medvec (2003). Herethe results are harder to interpret.

6See also Newton (1990) on tappers and listeners.

6

groups; some are asked to vocalize the message in the same tone as the tone of their email, and others

are asked to vocalize it in the opposite tone. Senders in both groups overestimate how easy it would be

to understand their messages, yet such overestimation decreased significantly in the case where senders

vocalize in the opposite tone. While some of these results may be due to general overconfidence about

one’s ability to communicate, the evidence is more consistent with the interpretation of information

projection.

My paper builds closely on the experimental results of Camerer, Loewenstein, and Weber (1989).

CLW offer a preliminary model of this bias by assuming that a better informed trader’s estimate of the

mean of a less informed trader’s estimate of the value of an asset is the linear combination of the better

informed trader’s estimate of this mean value and the less informed traders’ estimate of this mean value.

Biais, and Weber (2007) build on this formalization of CLW and assume that after observing a realization

of a random variable, a person misperceives the mean of her prior on this variable to be the mean of her

own posterior. Biais and Weber then study whether this formulation of within person hindsight bias can

explain trading behavior consistent with underreaction to news. They also test their hypothesis using

psychometric and investment data from a sample of investment bankers in Frankfurt and London.

In the context of predicting future changes in one’s taste, the phenomenon of projection has also

been studied by Loewenstein, O’Donoghue, and Rabin (2003) and Conlin, O’Donoghue, and Vogel-

sang (2007). In contrast to the projection of taste, the projection of information is most relevant in the

interpersonal domain where people think about what others might or might not know, and hence it is

7

primarily a social bias.

Several other papers, with no explicitly developed model, argued that information projection, un-

der the rubric of hindsight bias or curse of knowledge, might have important economic consequences.

Among others, Viscusi, and Zeckhauser (2005), Camerer, and Malmendier (2006), Heath, and Heath

(2007), Rachlinski (1998) argue that information projection might be an important factor in economic

settings affecting both judgements and the functioning of organizations. The model also belongs to the

small but growing literature on quasi-Bayesian models of individual biases e.g., Rabin (2002), Mul-

lainathan (2002) and the literature on social biases e.g., DeMarzo, Vayanos, and Zwiebel (2003).

The evidence I summarized in this section is indicative of the fact that people project various forms

of information. Although this evidence comes from a diverse set of experimental paradigms that use

different methods of identification and classify information projection under a variety of rubrics, the

model that I present in the next section provides a framework to study this phenomenon in a more

unified manner. It also provides a setup to make more precise predictions about the implications of

information projection in organizations and labor markets and to test such predictions.

3 Model

Consider an environment where people observe signals about the underlying physical state ! ! !, ! is

bounded. An example of ! could be the fundamental value of a company’s stock, the medical condition

of a patient, or the geophysical conditions of a place where an engineer is commissioned to build a

bridge. Let there be M people and N different signals {sj}Nj=1. A signal is a function sj(!) : !"#"Z

from the set of states to the set of lotteries over a realization space Z. These signals provide information

about the state through the correlation between the observed outcome from Z and the state ! ! !.

Information is interpreted given a common prior "(!) over !, where this prior also determines people’s

shared view about ! absent any signals.

Let pjm determine the probability with which signal sj is available for person m. If pj

m = 0, then

sj is surely not available for her, and if pjm = 1, it surely is. The collection of these parameters for all

m and all j is given by p = {{pjm}N

j=1}Mm=1. The elements of this vector describe the correct Bayesian

estimates of the distribution of information. The informational environment is then summarized by the

8

tuple {!, {sj}Nj=1, ", p}.

In what follows, I distinguish between the availability and the processing of a signal. Availability

refers to the fact that this signal is ’present’, while processing refers to the fact that its information

content is actually understood. As an illustration, note that only someone who has training in medicine

knows what to infer from the radiograph. In cases where this distinction applies, I assume that pm

concerns the availability of a signal.

We can now define information projection in the following way: a person who projects information

exaggerates the probability that the signals she processed are available to others. To measure the extent

of this mistake, I introduce a parameter # ! [0, 1] which denotes the degree of information projection.

Definition 1 Person i exhibits interpersonal information projection of universal degree # if, after pro-

cessing signal sj, her perception of pjk is given by pj,!

k $ pjk(1" #) + #. for all k !M , k %= i.

Information projection by person i corresponds to the overestimation of the probability that a signal

person i processed, is available others. Such overestimation is increasing in #i. If #i = 0, then the

person has correct Bayesian perception and does not exaggerate the availability of signals. If #i = 1,

then she exhibits full information projection and her perception of the probability that the signals she

processed are available to others equals 1. In cases where 0 < #i < 1, the person exhibits partial infor-

mation projection.7 The above definition captures the key feature of the evidence: people underestimates

informational differences to an extent not warranted by Bayesian reasoning.

Intuition suggests that certain pieces of information are projected more than others, and that the

extent to which a particular piece of information is projected depends on a number of factors. In the

above definition, I allow for heterogeneity in projection by allowing # to vary across signals and across

individuals. If #ji denotes the degree to which person i projects signal sj, then such heterogeneity exists

whenever #ji %= #l

i for some j, l ! N or #ji %= #j

k for i, k ! M . Here, I do not attempt to pin down the

factors determining the value of #. My claim though is that the evidence suggests that in a number of

economically important domains # > 0. More research is needed to get a better understanding of why

certain signals are projected more than others.7In certain contexts, for greater psychological realism or for issues of measurability, the following transformation of the

true probabilities into perceived probabilities might be more appropriate: pj,!k = pj

k/[(1 " !) + !pjk]. This functional form

preserves the same properties as the previous one for all pjk > 0, but assumes that if pj

k = 0, then pj,!k = 0 for all !.

9

While full information projection is not, partial information projection is sensitive to the re-description

of signals. For example, if two signals are collapsed into one, then partial projection of the combined

signal induces a different probability distribution on the information of player m! than if the two signals

were projected individually. In most relevant applications though there is a natural way to break down

information into distinct signals or groups of signals. For example, in the case of hindsight bias in per-

formance evaluation where information projection happens over time, the timing of information already

suggests a way to break down information into distinct signals. Importantly, however, almost all results

in this paper are qualitative and do not depend on the use of partial projection. I indicate it in the text

when a result holds under partial information projection but not under full information projection.

There is another sense in which the exact separation of signals matters in my setup. This concerns

the distinction between availability and processing. If one signal requires skill to be processed and the

other does not, then my model has different implications when these two signals are collapsed into one,

or considered to be separate. Here, I always assume that the degree to which a signal requires skill to be

processed is fixed.

As mentioned in Section 2, evidence suggests that in important contexts, people anticipate that others

are biased. Since I build on this fact in the applications, I define such anticipation formally. Let the

probability density function fi,k(#) describe the beliefs of person i concerning the extent to which person

k %= i projects her information. If fi,k is not concentrated on 0, person i believes that there is a non-zero

probability that person k is biased. Two types of anticipation are of special interest. First, if person i

believes that person k is not biased then the cdf generated by fi,k is such that Fi,k(0) = 1. If person i

has a perfect guess of person k’s bias, then Fi,k(#k) = 1 and Fi,k(#) = 0 for all # <# k.

3.1 A Dinner

Many of the paper’s results follow from two ways information projection biases inference. To demon-

strate these, consider a dinner invitation from Mrs. Robinson (the host) to Mr. Keynes (the guest).

At the dinner, Robinson offers either fish or meat to Keynes, and let her choice be denoted by y !

{M, F}.Assume that Keynes either prefers fish, !F , or meat, !M and that Robinson observes a noisy

signal sr about his taste, where Pr(sr = ! | !) = h & 0.5. Keynes also observes a private signal sk,

about his preference for the evening. Keynes is better informed about his own taste and thus sk is such

10

that Pr(sk = ! | !) = z > h.

The point of this example is to see how Keynes’ views change about Robinson after the dinner.

Consider a case, where Robinson has four possible types $. She is either kind, and follows her signal or

she is mean, and follows only her own taste. In addition, she either prefers meat or fish.

Assume for a moment that Keynes knows his taste, and set z = 1.Assume also that Robinson

observes only a noisy signal where h = 2/3. Let the prior belief of Keynes be %0($), and assume that

he initially believes that each type is equally likely. The following table summarizes a Bayesian versus

a fully biased guest’s beliefs about the kindness of the host after being served the meal he likes and after

being served the meal he dislikes:

Posterior Bayesian, # = 0. Biased, # = 1.

%1(kind | y = sk) = 2/3+2/32/3+1+2/3=

47 ,

1+11+1+1=

23 .

%1(kind | y %= sk) = 1/3+1/31/3+1+1/3=

25 ,

01= 0.

E[%1(kind)] = 712'

47+

512'

25=

12 ,

712'

23=

718 .

Note that a biased guest overestimates kindness if he is served the mean he likes, and underestimates

it if he is served the meal he dislikes. In the former case, he believes that a kind host serves the right

meal with probability one. In the latter case, he believes that a kind host serves the wrong meal with

probability zero. Hence in both situations a biased Keynes reads too much into Robinson’s choice.

More generally, whenever the guest’s signal about his taste is more precise than the host’s, a biased

guest overestimates how well different types separate. Let %!1($) be the posterior of a #"biased guest.

The following proposition shows that the guest overinfers from the host’s choice.

Proposition 1 For all %0, %!1($kind | y = sk) is increasing and %!

1($kind | y %= sk) is decreasing in #.

The above proposition holds independently of whether Keynes actually observes the realization of

s1, or just knows that Robinson observed s1. In both cases, he exaggerates how much she knew, and in

expected terms infers too much from her choice.

As the third row of the above table shows, these two over-attributions do not cancel out, rather on

average Keynes comes to believe that Robinson is mean. Note that relative to his prior, a biased guest

overestimates the probability that he will be served the meal he prefers. A biased guest’s estimate is 3/4

11

while the true probability is 7/12. This means that Keynes will be disappointed in Robinson on average.

More generally, a biased guest who knows more about his own taste than the host, overestimates the

probability with which the host she can serve his preferred meal if she wants to. This implies that on

average he underestimates the probability that the host cares about his taste. As the following proposition

shows such underestimation holds for all z > h.

Proposition 2 For all %0, E[%!1($kind)] is decreasing in # where expectations are taken with respect to

the true distribution of signals.

To further illustrate this point, consider a case where the guest and the host receive i.i.d. noisy signals

about the state. Assume that the guest mistakenly believes that the host observed the exact same signal

realization as he did. As long as the two signals are i.i.d., the expected posterior of a biased observer

and that of a Bayesian one are the same. Underestimation happens only if the biased guest has more

information than the host. Even in this case however, over-inference has an interesting implication.

Assume that it so happens that the host has the same taste as the guest. Here a fully biased guest on

average infers that the host is kind. In contrast, if it happens to be so that the host has a different taste,

then a fully biased guest on average infers that the host is hostile. Thus a biased guest misattributes

differences in taste to differences in intentions.

4 Skill Assessment

Let’s now turn to the main application of the paper and consider the impact of information projection

on performance evaluation. This application is motivated both by the key role performance evaluation

plays in labor markets, organizations, medicine, and law and by the evidence which indicates that this

bias is common in such contexts.8 In this section, I focus on a problem of skill assessment where a

supervisor wants to learn about the competence of her agent. In the next section, I focus on the problem

of incentives and monitoring where a principal wants to motivate the agent to exert effort.

Consider an agent who is hired by a principal to process and act upon information available to

him. Since agents differ in their ability to understand information a supervisor is asked to review the

agent’s performance on a task and assess the agent’s competence. Such assessment then forms the8On the former, see e.g., Alchian, and Demsetz (1972) or Lazear (2000).

12

basis of compensation, firing or job-allocation decisions. Assume, as it is typically the case, that when

evaluating his performance the supervisor has access to some ex-post information that was not available

to the agent.9

Consider for example a social worker who is assigned a case of foster care. After the injury of

the child the state commissions a supervisor to investigate whether the social worker was efficient at

preventing this outcome. All the home visits and the phone calls of the social worker are reviewed to

establish whether the social worker acted appropriately given his information. In doing so, a biased

supervisor projects information that becomes available only through learning that the child was injured.

Similar evaluation might happen, when a CEO is assessed by a Board that knows the market conditions

that were uncertain at the time, when the CEO had to decide how to allocate funds among various

projects.

Agent Supervisor A’s info S’s info

Radiologist Medical Examiner Patient’s X-ray Subsequent case history.

Social Worker Government Official Child’s family history Child’s reaction to treatment.

CEO Board Firm’s investment projects Market conditions.

I first show that a biased supervisor underestimates the skill of the agent on average. Since both

higher skill and more information leads to a higher probability of success in my setup, exaggerating how

much information the social worker had leads to underestimation. The second result identifies conditions

under which the supervisor will infer too little or too much from performance. I conclude this section by

showing how increasing the frequency of monitoring changes the behavior of an agent who anticipates

the bias. I derive predictions on the types of information that will be over-produced and the types that

will be under-produced.

4.1 Setup

The radiologist’s (agent’s) task is to offer a treatment recommendation y which maximizes the probabil-

ity of a successful outcome. Before taking y, he receives a set of signals s0 about the patient’s condition

!, where s0 consists of signals that all radiologists understand and consists of signals that require skill9Berlin (2000) argues that in the case of lung cancer, the generally accepted error rate for radiological detection is between

20% and 50%. For radiographs that were previously evaluated as normal but where the patient later developed lung carcinomahowever, the carcinoma is seen in retrospect in as many as 90% of the cases.

13

to be processed. The probability that a radiologist understands skill-intensive signals depends his type

$ ! [0, 1]. A radiologist of type $, understands such signals in $ fraction of the time, the most skilled

radiologist ($ = 1) always understands the X-ray while the incompetent one ($ = 0) never does. If he

does not understand these signals, he infers nothing from them.10

After the radiologist takes y, the supervisor observes whether a success (S) or a failure (F ) occurred

along with a set of novel signals s1 about !. In most cases, observing success or failure alone has

information about the state !, and it is key to our analysis that the supervisor does learn something novel

about the patient’s medical condition that was not available to the radiologist ex-ante.

Assume that understanding more signals in s0 increases the probability that the radiologist’s ex-ante

optimal choice leads to a success ex-post. Furthermore, assume that if he could use the signals in s1

in addition, this probability would even be higher for all types. As long as the principal cares about

success, she prefers to employ a high type radiologist over low type. Assume finally that neither the

supervisor nor the agent observes $ but they share a common prior %0 with full support over [0, 1].11 The

uncertainty about the radiologist skill motivates the skill assessment of the supervisor because what the

supervisor learns about $ can then form the basis of employment, compensation and allocation decisions

by the principal.

4.2 Example

To illustrate the formal setup, consider first a specific information and task structure. Let ! = {1, 2, 3, 4}

and let an ex-ante signal s0 provide noisy information on whether the state is an even or an odd number.

Formally, let Pr(s0 = z | z) = h where z ! {even, odd} and h ! (0.5, 1). Let a second ex-ante

signal s!0 give precise information on whether the state is low (! ( 2) or high (! > 2). Assume that

s!0 requires skill to be processed but s0 does not. Assume that a success occurs if y = ! and a failure

occurs otherwise. Since s!0 is processed with probability $ by an agent of type $, the true probability of

success for a type $ agent is:10In some specifications, a more natural interpretation of " would refer to the fraction of signals the agent under-

stands/receives or his ability to distinguish between important and unimportant signals. Importantly, the results in thissection require only that the probability of success is increasing in type for any given set of signals.

11The assumption that the agent does not know his type is for simplicity only. It is otherwise standard in the career concernsliterature, e.g. Holmstrom (1999). Since in Sections 4.3-4.6 the agent is passive, this assumption plays no role in the resultsthere.

14

Pr(S | $, h) =h

2(1 + $) (1)

Assume that the supervisor observes an ex-post signal s1, which tells her precisely whether the state is

an even or an odd number. If the supervisor projects s1, then her perceived probability of a success given

a projection of degree # is:

Pr(S | $, h)! =1

2(# + (1" #)h)(1 + $) (2)

where subscript # refers to the degree of the bias. This equation shows that the supervisor’s expectation

of the probability with which a type $ agent should succeed is increasing in #. Assume that the supervisor

observes success or failure, but not y. Let %!1(S) and %!

1(F ) denote a #-biased supervisor’s updated

beliefs after observing success and failure, respectively. The following claim shows that biased inference

after success is the same as the unbiased, but a biased supervisor is more pessimistic after a failure than

a Bayesian one.

Claim 1 For all %0, %!1(S) does not change in #, %!

1(F ) is decreasing in # in the sense of first-order

stochastic dominance (FOSD).

If recruitment and firing decisions are based on the supervisor’s assessment, then this claim implies

that the agent will be fired too often after a failure but not after a success.

4.3 Underestimation

Although the above example was illustrative of the setup, it did not explicate how information projection

changes the supervisor’s assessment more generally. To identify this mechanism, let me now turn to the

more general case. The first result is also the main result of this section. It claims that if the supervisor

projects productive information, she underestimates the radiologist’s skill level on average.

Proposition 3 Taking expectations based on the true distribution of signals, E[%!1 ] FOSD E[%!!

1 ] iff

#! & #, for all %0.

Information projection leads to the systematic underestimation of the agent’s skill. Since the su-

pervisor projects productive information, she overestimates the overall probability of a success and

15

underestimates the overall probability of a failure. Hence she is more surprised observing a failure and

less surprised observing a success than in the unbiased case. As a result, a biased supervisor puts more

weight on the information revealed by a failure and less weight on the information revealed by a success

than a Bayesian supervisor. Since lower types are more likely to fail than higher types, this leads to

underestimation on average. Note that in the Bayesian case the expected posterior always equals the

prior and hence the above proposition also implies that the expected biased posterior is lower than the

prior, and that this is true for any prior.

Proposition 3 shows that if the supervisor has access to information that the agent did not have, the

supervisor is negatively biased in her assessment. Let’s call an increase in s1, the ex-post information,

a change that increases the extent to which knowing and acting upon s1 increases the probability of a

successful outcome after an optimal decision. The next corollary shows that an increase in the projected

signal leads to further underestimation.

Corollary 1 For all # > 0, E[%!1 ] is decreasing in s1 in the sense of FOSD.

In the analysis above, I assumed that the supervisor’s inference is based on a performance measure

which consists of either a success or a failure and the knowledge of what the set of ex-ante signals is. In

many situations, the supervisor has more detailed information, such as observing the exact realizations

of the signals in s0 or the agent’s action y. In a Bayesian setting, such information leads to more precise

estimates of $. In a world of biased evaluators, it might well increase underestimation. Thus in-depth

investigations might be welfare reducing in a biased case, even if they are welfare improving in the

Bayesian one.

4.4 Over- and Under-inference

Proposition 3 is consistent with the general wisdom that hindsight bias leads to too much ex-post blame.

Although this result is true on average, it does not follow that the supervisor assigns too much blame both

after a success and after a failure. Conditional beliefs will depend on the exact nature of the information

projected. If the bias leads to a perception, where the marginal return to skill is higher than in the

Bayesian case, the supervisor overinfers skill from performance. This happens for example in a case,

where in the absence of ex-ante information the outcome is determined only by chance, but in hindsight,

16

an able radiologist should have detected cancer, had he not only understood the X-ray, but knew what

symptoms the patient would develop later. In contrast, if information projection leads to a perception,

where the marginal return to skill is lower than in the Bayesian case, the supervisor underinfers skill from

performance. This is the case where ex-post information completely substitutes for the skill-intensive

information and hence in retrospect all differences in performance are due to differences in luck. The

following two examples show how conditional beliefs depend on the nature of the projected information.

Example 1

Let ! = &1&2 where &i ! {1,"1} for i = 1, 2 and let there be a symmetric prior " both on &1

and &2. Let s0 be a signal about &1that is true with probability h. Let s!0 be a signal about &2 that is

always true. Assume that skill is necessary for the understanding of s!0 but not of s0. In this case, the

true probability of success for a type $ agent is:

Pr(S | $) = $h +1

2(1" $) (3)

Let the ex-post information be given by s1 where s1 reveals &1. Hence the perceived probability of

success of type $ with information projection of degree # is:

Pr(S | $)! = #$ + (1" #)$h +1

2(1" $). (4)

It is easy to see that for all # > 0 the marginal return on skill in the biased case is higher than in the

Bayesian case, h " 12 + #(1 " h) versus (h " 1

2). As a result, the supervisor exaggerates the extent to

which performance is influenced by skill. In the limit, where h = 12 , performance is not informative

about skill yet a biased supervisor infers skill from performance. Here, full information projection leads

to the complete illusion of talent.12

Example 2

Let be everything as before, except the ex-post information. Let s1 now tell the true value of &2 with

probability z. Here, in contrast to the previous example, the productivity of s1 depends on whether the12For other mechanisms leading to the illusion of talent based on false beliefs, see Rabin (2002) or Spiegler (2006).

17

agent processed s!0 or not. If s!0 was processed, s1 adds no information. If it was not, s1 increases the

agent’s chances of producing a successful outcome. The true probability of success for a type $ agent is

Pr(S | $) = $h +1

2(1" $) (5)

and the perceived probability with full information projection is

Pr(S | $)1 = $h + (1" $)[hz + (1" h)(1" z)]. (6)

In contrast to the previous case, here the marginal return on skill is lower in the biased case. In the

limit, where z = 1, the perceived probability of success equals h for all types and hence the marginal

return is perceived to be zero. This means that a fully biased supervisor does not update her beliefs

after observing performance because she believes that differences in performance are due entirely to

differences in luck.

Given a bias of degree #, let '!($) = Pr(S | $)/ Pr(S | $)! be a measure of the exaggeration of the

probability of success for a type $ agent. We can now state the following more general result:

Proposition 4 For all %0 and # > 0, if '!($) is decreasing in $, then %!1(S) FOSD %1(S), '!($) is

increasing in $, then %1(S) FOSD %!1(S).

The above proposition specifies the effects of information projection on the supervisor’s assessment

after a success as a function of the projected information. 13 The impact on the assessment after a failure

is the outcome of the net effect of two forces: underestimation, and over- and under-inference. In the

case of over-inference, these two point in the same direction and the supervisor is always too pessimistic

after a failure. In the case of under-inference, they point in different directions and the net effect is

ambiguous. As Example 2 shows, it is possible that the under-inference effect dominates and a biased

supervisor is too optimistic after a failure.13In the case where the marginal return to skill is the same both in the true and in the biased perception, only underestima-

tion has an effect. See the example of Section 4.2.

18

4.5 Production of Information

As the evidence suggests, many professionals do anticipate the presence of hindsight bias. In our model,

this suggests that the agent might respond strategically to the supervisor’s bias. It follows from the above

analysis, that if the agent prefers a higher assessment to a lower one, information projection decreases his

welfare on average. To avoid such underestimation, the agent has incentives to reduce the information

gap that exists between the ex-ante and the ex-post environment. This incentive might motivate the agent

to change the set of signals he has, and also to avoid certain tasks if possible.

Consider a specification where the radiologist has access to a set of signals s0 and can decide to

produce an additional ex-ante radiograph !s0.14 The cost of producing this radiograph is a and the benefit

is the increased probability of a successful treatment. Assume that the radiologist bears the full cost of

a and also that his compensation w0 depends on the outcome. This compensation is w0,S > 0 after a

success, and 0 after a failure. Assume that the radiologist’s utility depends on the supervisor’s evaluation

as well. In particular, assume that the radiologist’s future wage w1 equals the mean of the evaluator’s

ex-post assessment, i.e., w!1 = E[$ | %!

1 ]. Future wages could be interpreted as a reduced form for

the radiologist’s future employment opportunities.15 Formally, consider the following von Neumann-

Morgenstern utility function for the agent:

U(w, a) = w0 + w!1 " a ' 1{bs0 is produced} (7)

The above specification assumes risk neutrality over assessments, i.e., that ex-ante the agent cares only

about the expected beliefs of the supervisor, and does not care about the difference between the condi-

tional beliefs after a success or a failure. This assumption is mainly for expository purposes, and later in

this section, I discuss the case where this assumption is relaxed.

Let m denote the frequency of monitoring. This frequency corresponds to the ex-ante probability

with which the agent is evaluated. Since the supervisor’s assessment of $ changes only conditional of an

assessment, it follows that for a fixed m and #, the radiologist’s optimal choice whether to produce the14I do not require that !s0 needs skill to be processed, only that there are some skill-intensive signals in s0. For ease of

notation, I suppress s0 in what follows.15The specific assumption that w!

1 = E[" | #!1 ] is without loss of generality in the sense that results hold for all utility

functions that are increasing in #!1 in the sense of FOSD.

19

additional radiograph is determined by the following inequality:

[Pr(S, !s0)" Pr(S)]w0,S " a & mE[w!1 " w!

1(!s0)] (8)

The left-hand is the direct benefit minus the direct cost of producing signal !s0. The right-hand is the

loss/gain in expected wages from producing this additional radiograph.

In the Bayesian case, the expectation of the right-hand side of Eq. (8) is always zero. This is true

because the supervisor’s expected posterior always equals her prior under Bayesian learning, and this

holds independently of what signals the radiologist did or did not see. As a result, given the assumption

of risk-neutrality, the choice of producing the additional radiograph is independent of the frequency of

monitoring. Furthermore, even if we relax the assumption of risk-neutrality, the radiologist’s choice

should be independent of the ex-post information. Since such information does not influence the agent’s

ex-ante productivity, it should not affect the supervisor’s assessment in the Bayesian case.

In the biased case, at the same time, the posterior is decreasing in the information gap between the

ex-ante and the ex-post stage. Here the choice whether to produce the additional radiograph depends

crucially on the relationship of this signal and the ex-post information. To see this, let me distinguish be-

tween two ways the productivity of these two signals could be linked. I call signals !s0 and s1 substitutes,

if processing !s0 decreases the productivity gain from having s1. I call these two signals complements, if

processing !s0 increases the productivity gain from having s1. The following definition introduces these

two properties formally:

Definition 2 Let '! = Pr(S)/ Pr!(S) and !'! = Pr(S | !s0)/ Pr!(S | !s0). Signals !s0 and s1 are substi-

tutes if '! < !'! for all #. Signals !s0 and s1 are complements if '! > !'! for all #.

When signals are substitutes, the radiologist can reduce the information gap between ex-ante and ex-

post by ordering an additional radiograph. When signals are complements, ordering a new radiograph

increases this information gap. For a given m and #, let a(m, #) denote the cost where Eq. (8) holds

for equality. According to the next proposition, an increase in the probability of monitoring leads to an

increase in the production of substitute information and to a decrease in the production of complement

information.

20

Proposition 5 If !s0 and s1 are substitutes, then a(#, m) is decreasing in m iff # > 0. If !s0 and s1 are

complements, then a(#, m) is increasing in m iff # > 0.

A radiologist has additional incentives to undertake diagnostic procedures that substitute for ex-post

information. The reason is that such diagnostic procedures reduce the probability of unwarranted ex-

post blame. Even when such procedures are socially inefficient, because they are too costly or just

undesirable for example because they expose the patient to too much radiation, a radiologist will un-

dertake them to maintain a good reputation. As a result, the more he is monitored, the more expensive

and potentially more harmful his activities will be on such tasks. At the same time, a radiologist has

additional incentives to avoid information that can be interpreted much better in hindsight than in fore-

sight. Even if the production of such information increases productivity more than it increases costs, the

radiologist is better-off without producing such information because this way, he can avoid developing

a bad reputation. Both effects are increasing in the frequency of monitoring.16

Proposition 5 provides distinct predictions on over- and under-production of information as a func-

tion of the environment. Stylized evidence provides support for existence of both over- and under-

production of information. For example, Studdert et al. (2005) survey physicians (in areas of surgery,

radiology and emergency medicine) in Pennsylvania. In their sample, 43% of the physicians report using

imaging technology in clinically unnecessary circumstances, and 42% of them claim, they took steps to

restrict their practices, which included eliminating procedures prone to future complications. Kessler

and Mclellan (2002), show that changes in defensive medicine that result from medical liability reforms,

are primarily on diagnostic rather than on therapeutic practices. Intuition suggests, that there is typically

more room for information projection in the former than in the latter context. While many argue for

a direct link between defensive medicine and hindsight bias, further evidence is needed to test for the

mechanism described.

Note that Proposition 5 rests crucially on the assumption that the supervisor conditions her inference

of the agent’s competence on all signals produces by the radiologist. She does not need to observe the

realization of these ex-ante signals, but the results depend on the fact that she does observer whether !s0

was or was not produced.16A corollary of the underproduction result is that the radiologist is also averse to ordering tests that deliver results after

his recommendation is made. These tests increase the information gap between ex-ante and ex-post, and hence increase theextent of underestimation.

21

Alternatively, one could imagine a situation, where the radiologist can produce !s0 secretly, i.e., in

a way that the supervisor does not learn about !s0. In this case, the radiologist’s expectation of his

future wages is independent from his production choice. Even here, the anticipation of the supervisor’s

bias might lead to distortions in production choices. To see these effects, let’s return for a moment to

Proposition 4. In environments, where the supervisor overinfers skill from performance wages are too

high after a success and too low after a failure. It follows that here the radiologist wants to over-produce

ex-ante information secretly. In environments, where the supervisor underinfers skill from performance,

wages are low high after a success, and hence the radiologist might want to under-produce ex-ante

information secretly. These deviations from the Bayesian incentives disappear if these under- and over-

productions are detected, unlike in the case of Proposition 5.

Given the focus on skill assessment, I assumed that skill-intensive signals are always present as is

true in most situations. In situations, where the agent cannot eliminate, or sufficiently reduce, the infor-

mativeness of novel ex-post signals, ceteris paribus, he would like to avoid procedures that involve infer-

ence about his skill to protect himself from underestimation. A similar incentive might well be present in

a Bayesian setting, where the radiologist is risk-averse over future wages. Absent skill-intensive signals,

the agent is not exposed to wage fluctuations that result from the supervisor’s updating.17 Information

projection amplifies this aversion so that even a risk-neutral agent will exhibit such preferences. Impor-

tantly though, the two mechanisms are distinct and the incentives that can alter them are also different.

5 Reward and Punishment

In the pervious section, I showed that a biased supervisor underestimates the agent’s skill on average.

A principal, responsible for employment decisions, can to some extent correct the supervisor’s mistake

if she anticipates that the supervisor’s reports are too negative on average. In most situations, however,

a principal does not have as detailed information about the agent’s task as the supervisor. Hence such

corrections might introduce other forms of inefficiencies, and might not eliminate the incentives of the

agent to act against underestimation.

In this section, I turn from a context where the amount of information that the agent learns from

a signal is a function of his skill, to situations where it is a function of how much effort he exerts.17On this logic in the Bayesian setting, see Hermalin (1993).

22

How often the radiologist understands X-rays, depends on how carefully he evaluates them. A careful

evaluation is costly because it requires the radiologist to exert effort. To provide incentives for the

radiologist, the principal offers him a contract that rewards the agent for a good health outcome and

punishes him for a bad one. If the health outcome is only a noisy measure of the correctness of the

radiologist’s diagnosis, and effort is unobservable, better incentives can be provided if the principal

hires a supervisor to monitor the radiologist. This way, the principal can tie reward and punishment

closer to whether the radiologist made the correct diagnosis given the information available to him ex

ante.18

The main result of this section shows that if the supervisor projects ex-post information, the effi-

ciency gains from monitoring are decreased. I show that if the supervisor believes that the agent could

have learned the true state, the radiologist is punished too often and exerts less effort than in the Bayesian

case. I also show that when the principal designing incentives anticipates the supervisor’s bias, she wants

to monitor less often. Even if she decides to monitor, she induces less effort on the part of the agent

than in the Bayesian case. The reason is that information projection, even if anticipated by the principal,

introduces noise in the supervisor’s reports, and hence decreases the efficiency of monitoring.

5.1 Effort

Assume that the agent’s level of the effort determines the probability with which he understands signal

s0. Let p(a) be the probability that s0 is understands s0 when the agent exerts effort a. If he does not

understand it, he infers nothing from s0. I assume decreasing returns to effort in terms of the processing

probability. Formally, p!(a) > 0 and p!!(a) < 0. I also assume that lima"0 p!(a) = ) and lima"#

p!(a) = 0.

Let s0 be such that Pr(s0 = ! | !) = "h. Assume that the probability of a success conditional on

the fact that the agent’s action equals the state, y = !, is k. Assume that the probability of success for

actions different from the state, y %= !, is z where k > z. Finally, assume that if the agent does not

process s0, he is equally likely to take any action y ! ! and the probability that such a random action

matches the state is b where b < "h.

For simplicity, assume that both the agent and the principal are risk neutral. Let the agent’s utility18For this classic insight, that increasing observability reduces inefficiency in the context of moral hazard, see Holmstrom

(1979) and Shapiro and Stiglitz (1984).

23

function again be U(w, a) = w0 " a and the principal’s utility function be V (r, w) = r " w0, where r

is the revenue to the principal from the task. Let the revenue of the principal be 1 after a success and 0

after a failure.

5.2 Performance Contract

As the benchmark, I characterize the first-best effort level where the marginal social benefit from exerting

effort equals the marginal social cost. The first best effort level, a$f , is then defined implicitly by the

following equality:

qp!(af ) = 1

where q = (h " b)(k " z), and q measures the productivity gain from processing signal s0.19 This

productivity increases in h, the precision of the agent’s signal, and in k, the probability of success

conditional on an optimal action. The productivity decreases in b, the probability of making the right

choice by chance, and in z, the probability of success conditional on a non-optimal choice. With a slight

abuse of notation, let the vector q denote the collection of the parameters, h, b, k, z.

Let’s now turn to the case, where the agent’s effort is unobservable. Assume that the agent is pro-

tected by limited liability, i.e., w0 & 0 has to be true in all contingencies. Let the agent’s outside option

be 0. Given the assumption of risk-neutrality, the principal’s optimal contract is one that offers the low-

est compensation possible after a failure. This implies that the compensation after a failure is wF = 0.20

Let wS denote the compensation offered to the agent upon a success.

In light of these considerations, the principal’s problem is to maximize her expected utility:

maxa,w

V (r(a, q), w) = [p(a)q + bk + (1" b)z](1" wS) (9)

subject to the agent’s incentive compatibility constraint:

an(q, w) = arg maxa

[p(a)q + bk + (1" b)z]wS " a. (10)

19I assume that the solution is always interior. Furthermore, h = "h + (1 " "h) |!||!|!1b where |!| is the cardinality of the

action space.20On the use of limited liability contracts, see e.g., Innes (1990) and Dewatripont, and Bolton (2005). I believe that the

results of this section hold a fortiori given a risk-averse radiologist.

24

Given the agent’s utility function, we can replace this incentive compatibility constraint with its

first-order condition. To guarantee that there is a unique stable equilibrium, I assume that p!!!(a) ( 0 for

all a. The optimal effort level, a$n(q), which solves this constrained maximization problem is defined

implicitly by following equation:

qp! = 1" p!!(p + (bk + (1" b)z)/q)

(p!)2. (11)

Let w$n(q) denote the corresponding optimal wage.

Note that a$n(q) is always smaller than a$f (q). The reason is that the principal faces a trade-off:

implementing a higher level of effort is only feasible at the cost of leaving a higher rent for the agent.

Thus effort is lower and the agent’s rent is higher than in the first-best. A simple comparative static result

follows from Eq. (11). Increasing h or k, increases the productivity of processing information and thus

generates higher utility for the principal given a contract. Since p! > 1 is always true in equilibrium, a

higher h or k allows for cheaper incentives and thus the principal wants to induce more effort, implying

that effort is increasing in h and k.

Lemma 1 An increase in h or k increases the equilibrium effort level a$n(q) and the payoff to the prin-

cipal.

5.3 Bayesian Monitoring

The effort level characterized by Eq. (11) is optimal given that the supervisor observes a performance

measure that consists only of success and failure, but obtaining more precise reports about the agent’s

action, allows the principal to induce the same level of effort at a lower cost.

Consider that the principal can monitor the agent by learning the agent’s action and the information

that was available to him. In case of such monitoring, the optimal contract rewards the agent if his

action is the one suggested by the information available to him and punishes the agent otherwise. Since

whether a success happens or not does not contain additional information, it is easy to see that such

a compensation scheme is optimal. Given such a reward scheme, the agent’s incentive compatibility

25

constraint can now be expressed by the following first-order condition:

am(q, w) = arg maxa

p(a)(1" b)wS + bwS " a (12)

and the optimal contract induces an equilibrium effort level, a$m(q), defined implicitly by the following

condition:

p!q = 1" p!!(p + b/(1" b))

(p!)2. (13)

Let w$m(q) denote the corresponding optimal wage.

The equilibrium effort under monitoring, a$m(q), is always greater than equilibrium effort without

monitoring a$n(q). The reason is that monitoring improves the trade-off between providing incentives

and leaving a positive rent for the agent. It rewards good decisions rather than good luck. As a result,

if the principal monitors the agent, she can induce the same level of effort at a lower cost and hence for

any given level of effort, she realizes a greater expected profit. The fact that it becomes cheaper for the

principal to induce effort means that the principal is willing to pay for monitoring.

Lemma 2 The equilibrium under monitoring induces a higher effort, a$n(q) < a$m(q) and the principal

is better-off with the option of monitoring.

5.4 Biased Monitoring

Let the supervisor’s ex-post signal be s1 and assume that the projected information is such that along

with s0 it perfectly reveals the state but alone its uninformative. This means that a biased supervisor

perceives the true problem as if h = 1 for all h ( 1. Furthermore, it also implies that upon not

processing s0 the supervisor still believes that the probability that the agent can take the right action

is b .The consequence of such information projection is that the supervisor makes wrong attributions

from the agent’s choice. Whenever y %= !, the supervisor concludes that the agent did not successfully

read the information available to her. Hence, if the agent did read and follow s0, but this information

turned out to be ’incorrect’ ex-post, the supervisor mistakenly infers that the agent did not read s0.21 The

probability of this mistake is p(a)(1 " h), i.e., the probability that s0 is processed times the probability21Note that the ’inference’ of the supervisor is only about whether the agent’s effort was successful or not. In a moral

hazard context, there is no inference about the agent’s effort a.

26

that s0 did not suggest the right action.

Assume that the agent correctly predicts the bias of the supervisor. In this case, the agent’s effort is

given by the solution of the following maximization problem:

a1m(q, w) = arg max

ap(a)h(1" b)wS + bwS " a. (14)

Comparing this condition with that of Eq. (12), it is clear that the return to effort is lower in the

biased case. The reason for the former is that an unbiased supervisor can distinguish – up to probability

(1" b) – between a bad decision that is due to wrong ex-ante information and a bad decision that results

from not processing a signal. In contrast, a biased supervisor mistakes a bad decision due to wrong

ex-ante information to a bad decision that is due to not having processed the available information. This

implies that for any given compensation wS the agent exerts less effort in the biased case.

Proposition 6 Suppose h < 1. Then a1m(q, w) < am(q, w), and a1

m(q, w) is increasing in h with

a1m(q, w) = am(q, w) if h = 1..

The above corollary shows that if a negligence based reward scheme is enforced by a biased evalu-

ator, than it becomes closer to strict liability and in our setup, this reduces care. A possible corollary to

the above proposition is that negligence rule might actually backfire. The reason is that under monitor-

ing the radiologist is offered a lower compensation in equilibrium which is outweighs by the increased

probability of reward in a Bayesian case. Since the probability of a reward is reduced in the biased case,

care might be lower under biased monitoring than under the simple performance contract.

As a final scenario, consider the case where the bias of the supervisor is common knowledge between

the principal and the agent. If the principal is aware of the supervisor’s bias, she knows that at times, the

supervisor comes to the wrong conclusion. Since the principal can only determine the probability of this

mistake, and not whether the supervisor’s report is actually wrong or right, information projection adds

noise to the supervisor’s reports. Thus, the data obtained by monitoring contains more noise than in the

Bayesian case. This decreases the efficiency of monitoring. As a result, the principal decides to induce

less effort than he would had he believed that the supervisor had perfect Bayesian perception.

27

Let the optimal effort level induced be denoted by a1"m (q) and implicitly defined by:

p!q =(p!)2 " p!!(p + b/h(1" b))

(p!)2.

Proposition 7 If the principal anticipates the bias, she induces effort a1"m (q) < a$m(q) and a1"

m (q) is

increasing in h.

The analysis above has implications to the effect of hindsight bias on tort liability. It claims that

whenever there is unobservable effort involved information projection reduces rather than increases an

injurer’s incentive to exercise due care. This observation is in contrast with the common conjecture,

e.g., Rachlinski (1998), that an agent anticipating hindsight bias will take too much pre-caution too

avoid ex-post blame.22

6 Communication

In the previous sections, I focused on the problem of performance evaluation but information projection

might affect other aspects of organizational life as well. One such domain is communication. Both

intuition and evidence presented in Section 2 indicates that when giving or taking advice people assume

too much about what the other party knows. In this Section I demonstrate two ways information projec-

tion affects efficient information transmission between a speaker and a listener. These two themes are

credulity and unintended ambiguity.

Credulity refers to a case where a listener follows the recommendation of a speaker too closely

because he assumes that the recommendation of the speaker already incorporates his private information.

As a result, he will fail to combine his private information with the speaker’s recommendation and will

fail to sufficiently deviate from this recommendation even if he should. Unintended ambiguity refers

to the case where a speaker sends a message that is too ambiguous to the listener. A biased speaker

exaggerates the probability with which her background knowledge is shared with the listener, and hence

will overestimate how likely it is that the listener will be able to interpret her message. I show that22A key difference between my setup and the standard setup for the study of optimal liability, e.g., Shavell (1980), is that

the level of precaution (effort) is unobservable and action is not increasing in precaution rather it is the probability of a rightaction that increases in effort.

28

depending on the messages available for the speaker, the speaker might communicate too often or too

rarely.

6.1 Credulity

Consider a situation where an advisee has to take an action ye that is as close as possible to an unknown

state ! on which the shared prior is N(0, "0). This state could describe the optimal solution of a research

problem, the best managerial solution on the organization of production or the diagnosis of a patient.

The advisee has some private information about ! that is given by se = ! + (e where (e is a mean zero

Gaussian noise term such that the posterior on !, given the prior and se, is N(!se, !"e). The advisor also

has some private information about ! given by sr = ! +(r where (r is a mean zero Gaussian noise term

such that the posterior on !, given the prior and sr, is N(!sr, !"r).23 The advisor makes a recommenda-

tion yr equal to her posterior mean. The advisor cannot communicate the full distribution or the true

signal directly. Such limits on communication might arise due to complexity considerations, or because

it’s prohibitively costly to explain this private information. Instead, she can give a recommendation

regarding the best action she would follow.

Let the advisee’s and the advisor’s objective be

maxye

"E"(ye " !)2 (15)

thus the advisee’s goal is to take an action that minimizes the distance between his action and the state.

Given the advisor’s recommendation yr, and the advisee’s private information se, a rational advisee takes

action y0e such that:

y0e = E[N(!, c0, v0)] (16)

where c0 = yrb#2e

b#2e+b#2

r+ bseb#2

rb#2

r+b#2e

and N(! ; c, v) is a short form for a normally distributed random variable

with mean c and variance v. This action is based on the correct perception of how information is

distributed between the advisor and the advisee. This action efficiently aggregates the information in the

recommendation yr and the advisee’s private information se.

23Formally, if $e * N(0, %e) then !se = "20

"20+"2

ese and #%e = "2

0"2e

"20+"2

e. Similarly, if $r * N(0, %r) then !se = "2

0"20+"2

rsr and

#%e = "20"2

r

"20+"2

r.

29

Consider now the case where the advisee exhibits full information projection. Here, he believes

that the advisor’s recommendation is based not only on the realization of sr, but also on se, and thus it

already incorporates all information available to the parties. As a result, he reacts to the advice yr by

taking action y1e such that:

y1e = E[N(!, c1, v1)] (17)

where c1 = yr and v1 = v0. It follows that if the advisee exhibits full information projection, he puts

all the weight on what the advisor says and no weight on his private information. This way, his private

information is lost. The following proposition shows that a biased advisee follows the recommendation

of his advisor too closely.

Proposition 8 E |yr " y!e | is decreasing in # and E |yr " y1

e | = 0 where expectations are taken with

respect to the true distribution of signals.

This proposition follows from the discussion above. Note that the more precise the advisee’s private

information is, the greater is the loss relative to the unbiased case. In the biased case, information

aggregation fails because the advisee fails to sufficiently update the advisor’s recommendation given

his private information.24 One way to eliminate this information loss is to invest in a technology that

allows the advisor to communicate her posterior distribution. Another option is to block communication

between the advisor and the advisee. Assuming full information projection, the advisee is ex-ante better-

off without a recommendation if and only if his signal is more precise than the advisor’s signal. More

generally, the following corollary is true:

Corollary 2 There exists an indicator function k(#, "e, "r) ! {0, 1} such that the advisee is better-off

with a recommendation if k(#, "e, "r) = 0, and the advisee is better-off without a recommendation if

k(#, "e, "r) = 1. The function k(#, "e, "r) is increasing in # and "r and decreasing in "e.24The logic of why a biased advisee will be too credulous, is also indicative of why information projection can result in

irrational herding behavior. In the context of Banerjee (1992) for example, while rational information updating results inherding type behavior only in contexts where the action space is not as fine as the signal space, information projection leadsto herding even if the action space is as fine as the signal space, and hence where no rational herding should occur.

30

6.2 Ambiguity, Over- and Under-Communication

In the above context, information projection leads to credulity because the advisee projects his private

information. Let’s now turn to a context where the advisor projects her private information about the

state !. Consider an information structure analogous to the Examples in Section 4.4. Let ! = &1&2

and &1, &2 ! {"1, 1}. Assume that s0 = &1 is the advisor’s background knowledge, which cannot be

communicated to the advisee. Let s1 = &2 be the signal that can be communicated to the advisee. As an

example, consider a radiologist who speaks to a patient about the patient’s medical condition !. Signal

s0 incorporates the radiologist’s knowledge of medicine such as the meaning of a complex medical term.

Signal s1 is a medical term that describes the condition of the patient. If the patient does not know the

meaning of a medical term, then s1 does not incorporate any information to him. If the patient knows

the meaning of the medical term, he can interpret s1 in light of s0.

Let there be a third signal s2, Pr(s2 = ! | !) = h where 0.5 < h < 1. This signal provides noisy

information about !, but does not require the patient to know s0, the medical language. For simplicity, let

the true probability with which signals (s0, s1, s2) are available to the advisee be pe = (0, 0, 0). Assume

that the patient has a symmetric prior on ! and that the advisor can send only one signal because sending

two is prohibitively costly. Sending one message costs c. Let the payoff to the advisor be 1 if the advisee

guesses ! correctly, and let it be 0 otherwise.

The advisor has three distinct options: remain silent, send signal s1, send signal s2. The table below

summarizes the advisor’s perceived payoff as a function of a #:

Payoff / action: Silence Send s1 Send s2

EV 0 : 12 ,

12 " c, h" c.

EV ! : #2 + (1" #2)(#h + (1" #)0.5), (1" #)#h + !2+12 " c, #2 + (1" #2)h" c.

Since an unbiased advisor knows that s1 does not convey any valuable information to the patient, she

never sends s1. Furthermore, she decides to spend time describing the state to the patient in lay terms

whenever h"c > 12 , that is when the expected benefit of talking is greater than the cost of not remaining

silent. In contrast, a biased medical advisor exaggerates the probability with which the medical term

conveys valuable information to the patient because she projects the knowledge of the medical language

s0. Hence if she is sufficiently biased, she prefers to send s1 over s2. Formally, this happens when

# & 2h" 1.

31

At the same time, a biased advisor also exaggerates that the patient already knows both the medical

and the lay description. As a result, she underestimates the return to sending a costly message in general.

The net effect of these two forces depends on the degree to which the advisor is biased. If the advisor is

fully biased, she always decides to remain silent because she assumes that the advisee already knows !

anyway. If the advisor is only moderately biased however, she might communicate even when a rational

advisor would remain silent.

Proposition 9 If # < 2h" 1, the advisor sends s2 iff c ( k2(#, h) and is silent otherwise. The function

k2(#, h) is increasing in h and decreasing in #. If # > 2h" 1, the advisor sends s1 iff c ( k1(#, h) and

is silent otherwise. Furthermore, if h = 0.5, the advisor sends s1 iff c ( 0.5#(1" #).

The above proposition shows not only that a biased advisor might send a dominated message but also

offers some comparative static results on whether there will be too much or too little communication. If

the advisor is only moderately biased, and the lay description is sufficiently informative, then she com-

municates too rarely. Here underestimating the return to communication dominates her overestimation

of how informative the medical description is. If the advisor is sufficiently biased, then depending on

how informative the lay description is, she might communicate too often. Since she overestimates the

probability that following the medical description the advisee will take the right action, she engages

in costly communication even if it conveys no information to the advisee. Hence adding dominated

communication options might decrease efficiency in the presence of information projection.

Results in the above proposition are consistent with the intuition that the curse-of-knowledge leads to

too much ambiguity. For example, many argue that this is true for computer manuals written by experts

but targeted to lay people. While in the case of computer manuals, hiring a lay person rather than an

expert to proof-read the manuscript could decrease the curse, in many other situations more explicit

communication protocols might better improve welfare.

7 Conclusion

In this paper, I developed a model of information projection applicable to problems of asymmetric

information. The applications in this paper are motivated by problems and evidence from labor markets,

32

organization, medicine, and law, but they are not exhaustive in any sense. I conclude the paper by

considering some possible further applications and extensions.

The results in Section 4 and Section 5 suggest that if debiasing is ineffective, special kinds of in-

centives might be necessary to mitigate the adverse effects of information choice on the production and

processing of information. Novel insights might be gained in contexts where the radiologist can decide

both about what information to produce and how much effort to exert in understanding the information

he produced.

Another possible extension of the over-inference and underestimation results of Section 2 is to the

analysis of group formation in social networks. Recall that a biased guest will be too optimistic about

the kindness of the host if the host and the guest have similar tastes, and will be too pessimistic about the

host’s kindness if their tastes differ. This implies that if friendships are formed partly on the perception

of social intentions, then members of a group might be too similar in taste. More importantly, such

cliques will misperceive each other as hostile because they mistakenly attribute taste differences to

hostile intentions. As a corollary of the underestimation result of Proposition 2, it might also be true that

a social network will have too few links.

The underestimation result can also be extended to the settings of Section 6. A biased advisee might

underestimate how attentive his advisor is with him because he exaggerates the precision of the advice an

attentive advisor could give if he wanted. Here attentiveness is defined as the probability that the advisor

bases her recommendation on information rather than on noise. A biased advisor might underestimate

how perceptive her advisee is because she does not recognize how ambiguous her messages are. Here

perceptiveness is defined as the probability that the advisee listens to the advisor’s message. Such

inferences can result in the breakdown of communication between parties who have a lot to share with

each other but suffer from projection bias.

Another direction to extend the ideas presented in this paper is to consider the related phenomenon

of ignorance projection. Ignorance projection happens when someone who does not observe a signal un-

derestimates the probability with which this signal is available to others. Though evidence on ignorance

projection is not as strong as the evidence on information projection, it might still be a phenomenon

worth studying, both empirically and theoretically. Finally, one could study information and ignorance

projection in the intrapersonal domains where people project their current information and their current

33

ignorance on their future selves leading to distortions in prospective memory.

8 Appendix

Proof of Proposition 1. Note first that since z > h , the host follows sk if she observes sk. Without loss

of generality assume that sk = meat. The biased conditional likelihoods are given by

%!1($kind | y = meat) =

(# + (1" #)h)%0($kind)

(# + (1" #)h)%0($kind) + %0($mean,meat)(18)

and

%!1($kind | y %= meat) =

(1" #)(1" h)%0($kind)

(1" #)(1" h)%0($kind) + %0($mean,fish)(19)

Since h & 0.5, %!1($1 | y = sk) is increasing in # and %!

1($1 | y %= sk) is decreasing in #.

Proof of Proposition 2. The guest’s perception of the ex-ante likelihood of the event that y = sk, is

increasing in #. By virtue of the properties of Bayes’ rule, the following relation holds for all # :

%0($kind) = %!1($kind | y = sk) Pr !(y = sk | %0) + %!

1($kind | y %= sk) Pr !(y %= sk | %0).

The expected posterior %!1($kind) at the same time, is given by :

E[%!1($kind)] =

%!(y = sk | $kind)%0($kind)

Pr!(y = sk)Pr(y = sk) +

%!(y %= sk | $kind)%0($kind)

Pr!(y %= sk)Pr(y %= sk).

(20)

Since Pr!(y = sk | %0) is increasing and Pr!(y %= sk | %0) is decreasing in # , then given Proposition

1, it follows that E[%!1($kind)] is decreasing in #.

Proof of Claim 1. Note that

12(# + h(1" #))(1 + $)%0($)$ 1

012(# + h(1" #))(1 + $)%0($)d$

=(1 + $))%0($)$ 1

0 (1 + $))%0($)d$=

h2 (1 + $))%0($)$ 1

0h2 (1 + $))%0($)d$

(21)

34

hence it follows that %!($ | S) = %0($ | S) for all # and %0. The result on %!($ | F ) follows from the

Proof of Proposition 3 below.

Proof of Proposition 3. The expected posterior is the probability weighted average of the posterior after

a success and the posterior after a failure. E[%!1 | %0] = Pr0(S)%!

1(S) + (1"Pr0(S))%!1(F ). For a given

type $ this is equal to

E[%!1($) | %0($)] = Pr 0(S) ' Pr!(S | $)%0($)

Pr!(S)+ (1" Pr 0(S)) ' Pr!(F | $)%0($)

(1" Pr!(S)). (22)

Note that E[%1 | %0] = %0.

Let’s introduce two variables: '!S = Pr(S)/ Pr!(S) and '!

F = (1 " Pr 0(S))/(1 " Pr!(S)), where

variables are taken with respect to the expectations in %0. Note that '!S < 1 and '!

F > 1 and '!S is

decreasing '!F is increasing in # given the assumption that Pr!(S | $) = # Pr(S | s1, $) + (1" #) Pr(S |

$).

Since Pr!(S | $) is increasing in $ for all #, it follows that the expected weight on higher types is

decreasing in #. Formally,

'!S Pr !(S | $)%0($) + '!

F Pr !(F | $)%0($) = '!S%0($) + ('!

F " '!S) Pr !(F | $)%0($)

where the equality follows from the fact that Pr!(S) + (1 " Pr!(S)) = 1 for all #. Hence, lower types

are overweighted relative to higher types. Since Pr !(F | $) is decreasing in $ for all #, it follows that

for any $$ < 1$ $"

0 E[%!1($) | %0]d$ >

$ $"

0 E[%1($) | %0]d$. (23)

Furthermore, since '!F " '!

S is increasing in # it follows that

$ $"

0 E[%!1($) | %0]d$ >

$ $"

0 E[%!!

1 ($) | %0]d$ (24)

whenever # ># !.

35

Proof of Corollary 1. If for a given s0, Pr(S | $, s1) > Pr(S | $, s!1) for all $, then for all #,

'!S{s1} =

Pr(S)

Pr!(S, s1)<

Pr(S)

Pr!(S, s!1)= '!

S{s!1}.

Since for both s1 and s!1, Pr!(S | $) is increasing in $, the corollary follows from the above proof of

Proposition 3.

Proof of Proposition 4. To show that %!1(S) FOSD %1(S) we have to show that for all $$ < 1 ,

% $"

0

%1($ | S)d$ &% $"

0

%!1($ | S)d$

for all $$ < 1. One can re-write this inequality in the following way:

Pr !(S)/ Pr(S) &&% $"

0

Pr !(S | $)%0($)d$

'/

&% $"

0

Pr(S | $)%0($)d$

'. (25)

If '!($) = Pr(S | $)/ Pr!(S | $) is decreasing in $ for all $ with '!(0) & '!($) & '!(1), then this

inequality holds since$ 1

0 %!1($ | S)d$ =

$ 1

0 %1($ | S)d$ = 1. If '!($) is increasing in $ then the reverse

inequality holds, and then %1(S) FOSD %!1(S).

Proof of Proposition 5. Note first that in the Bayesian case the RHS of Eq.(9) is zero and does not

depend on s1. In the biased case, w!1 = E[$ | %!

1 ] depends on s1 and it is decreasing in #. The decision

to produce !s0 more often or less often than in the Bayesian case depends on whether the following

expression is positive or negative :

E[w!1 " [w!

1 | !s0]] (26)

It follows from the proof of Proposition 3, that if '!S > !'!

S , then E[w!1] > E[[w!

1 | !s0]] for all # > 0.

This is true because underestimation is decreasing in Pr(S)/ Pr!(S) = '!S . Similarly, if '!

S < !'!S then

E[w!1]] < E[[w!

1 | !s0]] for all # > 0. It follows that if '! > !'!S, then a(m, #) is decreasing in m and if

'! < !'!S , then a(m, #) is increasing in m.

Proof of Lemma 1. First let’s derive the optimal contract as given by Eq. (11). The principal’s maxi-

36

mization problem yields the following Lagrangian:

wS, a, µ) = (p(a)q + bk + (1" b)z)(1" wS) + µ(p!(a)qwS " 1

The FOC with respect to a is given by p!q(1"wS) + µp!!qw = 0 and with respect to wS it is "(p(a)q +

bk + (1 " b)z) + µp!(a)q = 0. Solving for µ and substituting for w = 1/p!(a)q the equilibrium effort

level is given by

p!q = 1" p!!(p + (bk + (1" b)z)/q)

(p!)2= 1" p!!(p + b/(h" b) + z/q)

(p!)2(27)

Let the solution of this equation be denoted by a$n(q). Note that the second-order conditions are satisfied

as long as p!!!(a) ( 0.

An increase in k or h increases q and hence increases the LHS of Eq.(28). An increase in k or h

decreases the RHS of Eq.(28). Since p is increasing and concave and p!!! ( 0, it follows that this leads

to a higher equilibrium effort level. To see the effects of an increase in k and h on the principal’s welfare

note that for a given wS , (p(a)q + bk + (1" b)z)(1"wS) is increasing in a since wS < 1. Furthermore

the optimal wS after an increase cannot be larger than the original wS because h " b < 1 < p! and

k " z < 1 < p!.

Proof of Lemma 2. Let’s first derive the optimal contract given monitoring as given by Eq. (13). The

principal’s maximization problem yields the following Lagrangian:

wS, a, µ) = (p(a)q + bk + (1" b)z)" (p(a)(1" b) + b)wS + µ(p!(a)(1" b)wS " 1

The first-order condition with respect to a is given by p!q " p!(1 " b)wS + µp!!(1 " b)wS = 0 and the

first-order condition with respect to wS is given by "(p(1" b) + b) + µp!(1" b) = 0. Solving for µ and

substituting w = 1/p!(1" b), we get that the equilibrium effort level a$m is determined by

p!q = 1" p!!(p + b/(1" b))

(p!)2(28)

To see the inequality in the lemma, note first that (bk + (1" b)z)/(h" b)(k " z) > b/(1" b)+,

37

b(k " z) + z(1" b) > bh(k " z) is always true if h < 1.

Compare now Eq. (29) with Eq. (28).Note that the LHS’s of these two equations are the same and

the RHS of Eq. (29) is smaller than the RHS of Eq.(28). Given the assumption that p!!! ( 0 it follows

that a is greater under monitoring.

To show the increase in the principal’s welfare, note that

EVn = p(a$n)q + bk + (1" b)z " (p(a$n) + b/(h" b) + z/q)/p!(a$n)

and

EVm = p(a$m)q + bk + (1" b)z " (p(a$m) + b/(1" b))/p!(a$m)

Since (1 " 1/p!(a$n)) and (1 " 1/p!(a$m)) are both positive because p!(a$n), p!(a$m) > 1 and because

b/(h" b) + z/q > b/(1" b), if h < 1, it follows that EVm > EVn.

Proof of Proposition 6. Let’s fix a wage w. It follows that the agent’s effort choice a1m(q, w) is given

by the solution of the maximization problem :

a1m(q, w) = arg max

ap(a)h(1" b)wS + bwS " a (29)

The FOC is given by p!h(1 " b)wS = 1. In contrast, am(q, w) is defined by the FOC p!(1 " b)wS = 1.

Hence, for any given wS , a1m(q, w) < am(q, w) as long as h < 1. Also a1

m(q, w) is increasing in h.

Proof of Proposition 7. To prove this proposition, consider the principal’s problem when she knows

that the agent’s action is given by a1m(q, w). Here the principal’s Lagrangian is given by

wS, a, µ) = p(a)q + (bk + (1" b)z)" p(a)(h" b)wS " bwS + µ(p!(a)(h" b)wS " 1

the first-order condition with respect to a is given by p!q " p!(h " b)wS + µp!!(h " b)wS = 0 and the

first-order condition with respect to wS is given by "p(h" b)" b + µp!(h" b) = 0. Solving for µ and

38

substituting w = 1/p!(h" b) we get that a1"m (q) is given by:

p!q = 1" p!!(p + b/(h" b))

(p!)2(30)

Comparing Eq. (31) with Eq.(29) it follows a1"m < a$m as long as h < 1 because the RHS of (31) is

always greater than the RHS of Eq.(29). Furthermore since the RHS of (31) is decreasing in h, a1"m is

decreasing in h.

Proof of Proposition 8. Since se and sr are independent it follows that the joint distribution of !, se

and sr is given by a multivariate normal distribution with mean vector (0, 0, 0) and the corresponding

covariance matrix C. Given the assumptions on C, it follows that E[! | sr] = #20

#20+#2

esr and E[! | se, sr]

is given by

E[! | se, sr] =("2

0, "20

)*

+,"2

0 + "2e "2

0

"20 "2

0 + "2r

-

./

%1 *

+,se

sr

-

./ (31)

Straightforward calculation shows that

E[! | se, yr] =yr!"2

e

!"2e + !"2

r

+!se!"2

r

!"2r + !"2

e

where !se = #20

#20+#2

ese, !"2

e = #20#2

e

#20+#2

eand !"2

r = #20#2

r

#20+#2

r.

Consider now the biased case where # = 1. Here the advisee believes that yr = E[! | se, sr] and

hence takes an action y1e = yr. For # < 1 the advisee believes that with probability # it is the case that

yr = E[! | se, sr] and with probability 1 " # it is the case that yr = E[! | sr]. Hence it is always true

that y!e ! [min{y0

e , yr} , max{{y0e , yr}]. Furthermore as the probability # increases |y!

e " yr| decreases.

Proof of Corollary 4. Note first that "E(y!e " !)2 is decreasing in # by virtue of Proposition 8 since

the estimate of ! has the lowest variance given s1 and s2 if ye = E[! | se, yr]. Also for a fixed #,

E |yr " y!e | is decreasing in "r and increasing in "e. Hence if we fix "e < M < ), there always exists

sufficiently large "r such that "E(!se " !)2 > "E(y!e " !)2. Similarly, for a fixed "r > 0 there always

exists "e sufficiently small that "E(!se " !)2 > "E(y!e " !)2. It follows that k("e, "r, #) is increasing

39

in "e decreasing in "r and increasing in #.

Proof of Proposition 9. Simple calculations show that s2 dominates s1 iff 2h" 1 > #. Here the advisor

sends s2 iif c < (1 " #2)(h + 0.5)(1 " #) = k2(#, h). It follows that k2(#, h) is increasing in h and

decreasing in #. If 2h" 1 < #, the advisor sends s1 iif c < #3(h" 0.5) + #(0.5" #h) = k1(#, h).

40

References

[1] Alchian, Armen, and Harold Demsetz. 1972. ”Production, Information Costs and Economic Orga-

nization.” American Economic Review, 62(5): 777-95.

[2] Anderson, John, Marianne Jennings, Jordan Lowe, and Philip Reckers. 1997. ”The Mitigation of

Hindsight Bias in Judges’ Evaluation of Auditor Decisions.” Auditing: A Journal of Practice and

Theory, 16(2): 20–39.

[3] Banerjee, Abhijit. 1992. ”A Simple Model of Herd Behavior.” Quarterly Journal of Economics,

107(3): 797-817.

[4] Berlin, Leonard. 2002. ”Malpractice Issues in Radiology. Hindsight Bias.” American Journal of

Roentgenology, 175(3): 597-601.

[5] Biais, Bruno, and Martin Weber. 2007. ”Hindsight Bias and Investment Performance.” Mimeo

IDEI Toulose.

[6] Bukszar, Ed, and Connolly Terry. 1988. ”Hindsight Bias and Strategic Choice: Some Problems in

Learning From Experience.” Academy of Management Journal, 31(3): 628-641.

[7] Camerer, Colin, George Loewenstein, and Martin Weber. 1989. ”The Curse of Knowledge in Eco-

nomic Settings: An Experimental Analysis.” Journal of Political Economy, 97(5): 1234-1254.

[8] Camerer, Colin, and Ulrike Malmendier. 2007. ”Behavioral Economics of Organizations.” in: P.

Diamond and H. Vartiainen (eds.), Behavioral Economics and Its Applications. Princeton: Prince-

ton University Press.

[9] Caplan Robert, Posner Karen, Cheney Frederick. 1991. ”Effect of Outcome on Physicians’ Judg-

ments of Appropriateness of Care.” Journal of the American Medical Association, 265(15): 1957-

1960.

[10] Conlin, Mike, Ted O’Donoghue, and Timothy Vogelsang. 2007. ”Projection Bias in Catalog Or-

ders.” American Economic Review, 97(4): 1217-1249.

41

[11] DeMarzo, Peter, Dimitri Vayanos, and Jeffrey Zwiebel. 2003. ”Persuasion Bias, Social Influence,

and Uni-dimensional Opinions.” Quarterly Journal of Economics, 18(3): 909-968.

[12] Dewatripont, Mathias, and Patric Bolton. 2005. Contract Theory. Cambridge: The MIT Press.

[13] Fischhoff, Baruch. 1975. ”Hindsight%=Foresight: The Effect of Outcome Knowledge on Judgement

Under Uncertainty.” Journal of Experimental Psychology: Human Perception and Performance,

1(3): 288-299.

[14] Gilovich, Thomas, Kenneth Savitsky, and Victoria Medvec. 1998. ”The Illusion of Transparency:

Biased Assessment of Other’s Ability to Read Our Emotional States.” Journal of Personality and

Social Psychology, 75(2): 743-753.

[15] Kessler, Daniel, and Mark McClellan. 2002. ”How Liability Law Affects Medical Productivity.”

Journal of Health Economics, 21(6): 931-955.

[16] Kruger, Justin, Epley Nicholas, Jason Parker, and Zhi-Wen Ng. 2005. ”Egocentrism over E-mail:

Can People Communicate as well as They Think?” Journal of Personality and Social Psychology,

89(6): 925-936.

[17] Harley, Erin, Keri Carlsen, and Geoffrey Loftus. 2004. ”The “Saw-It-All-Along” Effect: Demon-

strations of Visual Hindsight Bias.” Journal of Experimental Psychology: Learning, Memory, and

Cognition, 30(5): 960-968.

[18] Harley, Erin. 2007. ”Hindsight Bias in Legal Decision Making.” Social Cognition, 25(1): 48-63.

[19] Hastie, Ried, David Schkade, and John Payne. 1999. ”Juror Judgments in Civil Cases: Hindsight

Effects on Judgments of Liability for Punitive Damages.” Law and Human Behavior, 23(5): 597-

614.

[20] Heath, Chip, and Dan Heath. 2007. Made to Stick: Why Some Ideas Survive and Others Die.

Random House.

[21] Hermalin, Benjamin. 1993. ”Managerial Preferences Concerning Risky Projects.” Journal of Law,

Economics, & Organization, 9(1): 127-35.

42

[22] Holmstrom, Bengt. 1979. ”Moral Hazard and Observability.” Bell Journal of Economics, 10(1):

74-91.

[23] Holmstrom, Bengt. 1999. ”Managerial Incentive Problems - A Dynamic Perspective.” Review of

Economic Studies, 66(1): 169-182.

[24] Innes, Robert. 1990. ”Limited Liability and Incentive Contracting with Ex-ante Action Choices.”

Journal of Economic Theory, 52(1): 45-67.

[25] Jackson, Rene, and Alberto Righi. 2006. Death of Mammography: How Our Best Defense Against

Cancer is Being Driven to Extinction. Caveat Press.

[26] Lazear, Edward. 2000. ”Performance Pay and Productivity.” American Economic Review, 90(5):

1346-61.

[27] Loewenstein, George, Ted O’Donoghue, and Matthew Rabin. 2003. ”Projection Bias in Predicting

Future Utility.” Quarterly Journal of Economics, 118(4): 1209-1248.

[28] Loewenstein, George, Don Moore, and Roberto Weber. 2006. ”Misperceiving the Value of Infor-

mation in Predicting the Performance of Others.” Experimental Economics, 9(3): 281-295.

[29] Mullainathan, Sendhil. 2002. ”A Memory-Based Model of Bounded Rationality.” Quarterly Jour-

nal of Economics, 117(3): 735-774.

[30] Newton, Elizabeth. 1990. ”Overconfidence in the Communication of Intent: Heard and Unheard

Melodies.” Unpublished Doctoral Dissertation, Stanford University, Stanford, CA.

[31] Rabin, Matthew. 2002. Inference by Believers in the Law of Small Numbers.” Quarterly Journal

of Economics, 117(3): 775-816.

[32] Rachlinski, Jeffrey. 1998. ”A Positive Psychological Theory of Judging in Hindsight.” The Univer-

sity of Chicago Law Review, 65(2): 571-625.

[33] Shapiro, Carl, and Joseph Stiglitz. 1984. ”Equilibrium Unemployment as a Worker Discipline

Device.” American Economic Review, 74(3): 433-444.

[34] Shavell, Steven. 1980. ”Strict Liability Versus Negligence.” Journal of Legal Studies, 9(1): 1-25.

43

[35] Spiegler, Roni. 2006. ”The Market for Quacks.” Review of Economic Studies, 73(4): 1113-1131.

[36] Studdert, David, Michelle Mello, William Sage, Catherine DesRoches, Jordon Peugh, Kinga Za-

pert, and Troyen Brennan. 2005. ”Defensive Medicine Among High-Risk Specialist Physicians in

a Volatile Malpractice Environment.” Journal of the American Medical Association, 293(2): 2609-

2617.

[37] Van Boven, Leaf, Gilovich Thomas, and Victoria Medvec. 2003. ”The Illusion of Transparency in

Negotiations.” Negotiation Journal, 19(2): 117-131.

44