eitan sapiro-gheiler mit september 24, 2021 arxiv:2109

arX

iv:2

109.

1153

6v1

[ec

on.T

H]

23

Sep

2021

PERSUASION WITH AMBIGUOUS RECEIVER PREFERENCES

Eitan Sapiro-Gheiler

MIT

September 24, 2021

Abstract. I describe a Bayesian persuasion problem where Receiver has a private

type representing a cutoff for choosing Sender’s preferred action, and Sender has

maxmin preferences over all Receiver type distributions with a known mean. Sender’s

utility from any distribution of posterior means is a function of its concavification; this

result leads Sender to linearize the prior distribution by inducing a truncated uniform

distribution of posterior means. When the prior belief about the state of the world

is binary, Sender’s unique optimal distribution is an upper-truncated uniform with an

atom at 0. When the prior belief about the state of the world is continuous and uni-

modal, one optimal distribution for Sender is a double-truncated uniform with an atom

at the end of the lower truncation. In both cases, the shape and support of the optimal

distribution differ qualitatively from the corresponding solution when Sender holds a

prior belief over Receiver types.

JEL Classification: D81, D82, D83

Keywords: Bayesian persuasion, private information, maxmin utility

Email: [email protected]

I thank Drew Fudenberg, Sylvia Klosin, Stephen Morris, Victor Orestes, Frank Schillbach, Dmitry

Taubinsky, Rafael Veil, Jaume Vives, and especially Alexander Wolitzky for helpful discussions and

comments. This material is based upon work supported by the National Science Foundation Graduate

Research Fellowship under Grant No. 1745302.

http://arxiv.org/abs/2109.11536v1

1. Introduction

Consider a politician who is proposing a new welfare program. She must decide how to

disclose information about the expected cost of this proposal, but does not know how

much spending voters will support. All voters have the same ex-ante beliefs about the

program’s potential cost, but some will only approve if, after hearing the politician’s

message, they expect the cost to be low, while others are willing to support even a large

government outlay. Rather than imposing a prior distribution over people’s preferences,

the politician wishes to be robust to the worst-case distribution she may face. In this

setting, what disclosure rule maximizes the share of voters who, after hearing the politi-

cian’s message, approve of the welfare program? How does this optimal rule differ from

the case where the politician faces a known distribution of citizen preferences?

I address and generalize those questions through a model of Bayesian persuasion (Kamenica and Gentzkow,

2011), where a Sender commits to a message distribution in each state of the world and

a Receiver uses Bayesian updating to form a posterior belief about the state based on

the message structure. To represent different preferences among voters, I use private

Receiver types denoting the cutoff above which Receiver chooses Sender’s preferred ac-

tion. Sender knows the mean and range of Receiver types, and has maxmin preferences

(Gilboa and Schmeidler, 1989) over all Receiver type distributions satisfying those con-

straints. Regardless of the true state of the world, Sender maximizes the probability of

inducing the favorable action. This model captures situations where all Receiver types

process information in the same way, but may have different preferences over outcomes.

In addition to the political spending example described above, a model of this style also

applies to a variety of other situations, such as disclosing information about product

quality (if potential customers respond to product descriptions in the same way, but

may be more or less picky about quality) or screening job candidates (if all firms have

a common prior about candidate quality and see the same resume, but have different

thresholds for hiring).

In the standard Bayesian persuasion setting with a binary state of the world and a

log-concave prior distribution over Receiver cutoffs,1 Sender communicates either “good

news,” which makes Receiver more confident, but not certain, that the state is good; or

“worst news,” which makes Receiver certain that the state is bad. Doing so allows Sender

to generate credible good news as often as possible. However, when I replace that prior

with my chosen form of ambiguity preferences, I find that Sender may also communicate

1The case of a log-concave prior includes common specifications such as a normally-distributed prior

or a uniform prior over a (possibly degenerate) sub-interval of [0, 1].

1

“bad news,” which (analogously to good news) makes Receiver less confident that the

state is good, but not certain that it is bad.2 Intuitively, when facing ambiguity Sender

is less concerned about targeting a particular degree of good news (i.e., good news that

raises Receiver’s belief to a certain cutoff) and more interested in covering a broad range

of possible cutoffs. In fact, I show that all degrees of bad news, and all degrees of good

news below a known upper bound, are equally likely to arise.

In a richer setting where the state of the world is continuous and Sender’s prior belief

about it is unimodal, then this solution is no longer feasible, since it places strictly

positive probability on worst news while the prior does not. However, I can restore

feasbility and preserve optimality by adding a lower-truncation region, so that Sender

conveys an interval of intermediate beliefs with equal probability, but never induces

Receiver to believe that the state of the world is especially good or bad.3 This double

truncation again contrasts sharply with the case of a log-concave prior distribution over

Receiver types, where Kolotilin et al. (2017) show that Sender’s optimal strategy is to

fully reveal low states and pool high ones, leading to an atom and upper truncation

but no distortion of the prior distribution for low states of the world. As in the binary-

state case, the potential for an adversarial Receiver type distribution encourages Sender

to linearize the prior distribution and make different posterior beliefs about the mean

state of the world equally likely; however, the additional constraints imposed by the

continuous state space force Sender to generate a narrower interval of posterior means

in order to remain credible while doing so.

To derive my results, I solve the model by backwards induction, reframing Sender’s

maxmin preferences as a zero-sum game in which Sender chooses a distribution of pos-

terior means G for Receiver and then Nature chooses a Receiver type distribution T to

minimize Sender’s payoff. In this formulation, the known mean Receiver type r∗ makes

Nature’s problem analogous to Bayesian persuasion. Specifically, Nature behaves as if

facing a “prior distribution” of Receiver types with support {0, 1} and mean r∗, and

chooses a Bayes-plausible “posterior distribution” of Receiver types T to maximize the

opposite of Sender’s payoff. Thus the result of Kamenica and Gentzkow (2011) means

that for each G, Nature’s utility is given by the concavification ofG and Sender’s utility is

correspondingly given by the convexification of 1−G. This result suggests that a concave

2In a binary-state, binary-action model, for any prior distribution over Receiver types, there is a

binary-support posterior distribution that achieves Sender’s best payoff. Thus Sender may generate

good news and bad news, or good news and worst news, but never all three.3To preserve the appropriate average posterior mean belief, lower truncation changes the exact pos-

terior mean beliefs that arise; however, those beliefs still form a closed interval.

2

cdf of posterior means is best for Sender, and indeed in the binary-state case Sender’s

unique optimal posterior cdf is piecewise linear and concave. In the continuous-state

case, the need for the distribution of posterior means to be a mean-preserving contrac-

tion of the prior F prevents Sender from choosing that same solution, but I show that

adding a lower truncation region to that same piecewise linear shape restores feasibility

while preserving optimality. In both cases, distributions in the optimal class but with

different parameters can be used to approximate other candidate optimal distributions

near the mean Receiver type r∗; this result allows me to prove uniqueness of the opti-

mal distribution in the binary-state case, and provide some conditions on other possible

optimal distributions in the continuous-state case.

2. Related Literature

This work builds directly on the Bayesian persuasion problem originally described in

Kamenica and Gentzkow (2011), and adopts a similar approach to existing work in ro-

bust mechanism design. In addition, my model resembles a particular type of Colonel

Blotto game. I describe each of those areas in turn.

In the baseline Bayesian persuasion model of Kamenica and Gentzkow (2011), Receiver

has no private information. Subsequent literature in this area is surveyed in detail

by Kamenica (2019) and Bergemann and Morris (2019); in this section I focus on the

two works most directly related to the model I propose, Kolotilin et al. (2017) and

Hu and Weng (2020).4 The former has an interval state space, Receiver types that enter

payoffs linearly, and a binary action, as in my model; however, it endows Sender with

a prior distribution over Receiver types. If that prior distribution is log-concave, then

the optimal distribution for Sender can be generated by upper censorship; the resulting

distribution of posterior means is essentially a truncated version of the prior where states

in some interval [α, 1] are replaced with an atom at β ∈ (α, 1). In the continuous-state

case of my model, Sender seeks to linearize the posterior distribution to avoid facing

a tailored Receiver type distribution in response, and must use a double truncation to

make sure this strategy respects Bayes-plausibility.

The model of Hu and Weng (2020) is most similar to the one considered here: it is

a binary-action model where Sender has maxmin preferences over Receiver types and

4Other works use maxmin preferences in Bayesian persuasion settings, but are much more dis-

tinct. In Kosterina (2020), possible Receiver type distributions are distortions of a “reference dis-

tribution;” in Dworczak and Pavan (2020), there is full ambiguity about Receiver’s posterior belief; and

in Laclau and Renou (2017) and Beauchene et al. (2019), Receiver has maxmin preferences.

3

maximizes the probability of inducing the favorable action. However, Receiver types

represent an ambiguous posterior about a binary state of the world rather than a payoff-

relevant characteristic which does not directly interact with beliefs about the state.

This model captures substantively different applications—e.g., voters with common ide-

ology who privately read outside news sources before listening to a politician’s speech,

rather than the equally-informed voters with different ideological positions in my model.

Working with belief-independent Receiver types also means that I am able to charac-

terize Receiver’s posterior distribution and thus provide a sharp testable prediction—all

posteriors in a known interior interval are equally likely. Methodologically, because my

formulation features a simpler interaction between Receiver’s type and Sender’s signal,

I am able to extend my approach and characterization of Sender’s optimal policy to a

continuous-state case.

This work also relates to a literature in robust mechanism design, and in particular

works in which maxmin preferences are paired with moment conditions.5 For example,

Wolitzky (2016) considers a bilateral trade model where each agent has a valuation in

[0, 1] and knows only the mean of the other agent’s type distribution. In that model,

agents’ worst-case beliefs have binary support. Here, the worst-case Receiver type dis-

tribution is binary-support as well, but Sender’s desire to induce indifference between

many such distributions means the optimal posterior distribution has interval support.

Two other works consider distinct variations of the moment-restricted mechanism de-

sign problem. In Carrasco et al. (2019), a principal with maxmin preferences offers a

surplus-maximizing contract to a privately informed agent. Similar to my model, the

agent’s type distribution has known mean and support [0, 1]. As in Hu and Weng (2020)

and my work, the optimal mechanism for the principal induces a payoff that is piecewise

linear in the agent’s type. The other work, Carrasco et al. (2018), considers a setting

where a seller with maxmin preferences faces an unknown distribution of buyer valua-

tions in R+. The seller knows the first N − 1 moments of the buyer type distribution

and an upper bound on the Nth moment. Similar to the convexification argument I use,

transfers for the optimal mechanism are given by the non-negative monotonic hull of a

degree-N polynomial.

5Maxmin preferences more broadly have been used to model robustness in settings such as monopoly

pricing (Bergemann and Schlag, 2011), auctions (Bose et al., 2006), and screening contracts (Auster,

2018). Other work has focused on general results such as payoff equivalence (Bodoh-Creed, 2012) or

implementability (de Castro et al. 2017 and Ollar and Penta 2017).

4

Another antecedent of my model is the Colonel Blotto game, first proposed by Borel

(1921), where opposing players A and B simultaneously choose how to allocate their

troops across finitely many battlefields. Bell and Cover (1980) introduces a “continuous”

version called the General Lotto game, where A and B choose distributions FA and

FB of troops over a unit interval of battlefields. Mean constraints E[FA] = a and

E[FB] = b represent the amount of troops each player commands. In both versions,

each player maximizes the probability that, at a uniformly drawn battlefield, they have

allocated more troops than their opponent; in my setting Sender does not care about a

uniformly drawn Receiver type but rather a draw from the worst-case distribution for

each disclosure policy. Despite this difference in objective functions, the solution to the

asymmetric General Lotto game with a ≥ b > 0, due to Sahuguet and Persico (2006), is

almost identical to the binary-state case of my model. In the General Lotto game, player

A uniquely selects FA = U [0, 2a] while player B uniquely selects a uniform distribution

over (0, 2a] with an atom at 0 to meet their stricter mean constraint. In the binary-state

version of my model, if the prior belief that the state is high, π, exceeds the mean Reciever

type, r∗, then Sender uniquely chooses the posterior distribution U [0, 2π], analogously to

player A. If instead π < r∗, then Sender uniquely chooses an upper-truncated uniform

distribution with an atom at 0, similar to player B, but.its upper bound need not equal

2π. The intuition is that Sender makes Nature indifferent between any distribution

of Receiver types whose support is a subset of Sender’s chosen posterior distribution.

Higher Receiver types are no more unfavorable, but are more costly given the mean

constraint, so Nature does not generate them. Thus Sender endogenously faces the

same uniform draw as in the General Lotto game, but can choose the upper bound on

that draw to trade off their own mean constraint with Nature’s. However, in the General

Lotto game battlefields in [2a, 1] may still be drawn regardless of either player’s choice,

so the upper bound on the support is driven only by the less-constrained player.

3. Model

There is one Sender (she) and one Receiver (he).6 Sender knows the state of the world

ω ∈ [0, 1], and both players share a common prior belief F ∈ ∆([0, 1]) about the state

with E[F (ω)] = π ∈ (0, 1). Only Receiver knows his private type r ∈ [0, 1], but the

mean Receiver type r∗ ∈ (0, 1) is common knowledge. Sender holds maxmin preferences

6Alternatively, the presence of one Receiver with multiple unknown types may be interpreted as a

population of Receivers, each with a private type, with which Sender communicates publicly; this is the

interpretation I use for the political spending example.

5

over the set of potential Receiver type distributions T in the set

T =

{

cdf T over [0, 1]

∣

∣

∣

∣

∫

r dT (r) = r∗}

.

After Sender communicates, Receiver chooses a binary action a ∈ {0, 1}. Specifically,

Receiver chooses the high action if and only if his posterior expectation of the state,

q = E[ω | Sender’s behavior], strictly exceeds r:

uR(a, ω, r) = a (ω − r).

The explicit functional form used here is for ease of exposition only. Whenever Re-

ceiver’s utility is a linear function of the state, his action depends only on the mean

of his posterior belief about the state, and my results still hold (under an appropriate

renormalization of the interval of Receiver types). Substantively, the decision to break

ties against Sender is equivalent to assuming that there is some Receiver type who is

not persuaded even by knowing with certainty that ω = 1, and is discussed in greater

detail in Appendix A.

Sender’s goal is to maximize the probability of inducing the high action a = 1 indepen-

dent of the true state ω and true Receiver type r:

uS(a, ω, r) = a.

I restrict Sender to the standard Bayesian persuasion tool of committing ex-ante to a

(Blackwell) experiment, i.e., a state-dependent signal distribution, and in particular do

not allow her to elicit Receiver’s type in order to capture the public-communication

interpretation of this model. Since Receiver’s choice of action depends only on the mean

q of posterior belief distribution, I can, as in Kolotilin et al. (2017) and other similar

works, directly consider Sender choosing a distribution of posterior means G such that

G is a mean-preserving contraction of the prior distribution F . The set of feasible

distributions of posterior means is therefore

G =

{

cdf G over [0, 1]

∣

∣

∣

∣

∫

q dG(q) = π and

∫ x

0

G(q) dq ≤

∫ x

0

F (q) dq ∀ x ∈ [0, 1]

}

.

I follow the literature in referring to these two constraints jointly as Bayes-plausibility,

and reserve the expression “mean restriction” to refer to the condition on the set T of

Receiver type distributions.

Using this formulation and Receiver’s decision rule, I rewrite Sender’s utility as

uS(q, r) = 1(q > r),

6

so that Sender’s full optimization problem is

maxG∈G

{

minT∈T

∫ ∫

1(q > r) dG(q) dT (r)

}

. (1)

The main difference from standard Bayesian persuasion with private information is the

presence of an endogenously-determined Receiver type distribution F . This specification

leads me to characterize the solution by reframing Sender’s maxmin preferences as a zero-

sum game, in which Sender designs a posterior distribution and then Nature adversarially

designs a type distribution; that game can then be solved by backwards induction.7

When supp(F ) = {0, 1}, so that the state ω is binary, the second part of the Bayes-

plausibility constraint is redundant and any distribution G satisfying E[G] = π is in

G. This simplification permits a clearer characterization of the optimal distribution of

posterior means, which I present in Section 4. As in the case of Bayesian persuasion with

a prior belief about Receiver’s type, the binary-state solution generically places an atom

at q = 0. Thus when F lacks an atom at 0, as in the case of unimodal F that I study in

Section 5, that solution will not be Bayes-plausible. However, I will show that a simple

lower truncation of the binary-state solution—setting the cdf to 0 in a neighborhood of

q = 0, then adjusting the rest of the distribution to leave its mean unchanged—is often

sufficient to restore Bayes-plausibility while preserving optimality.

4. The Binary-State Case

In this section, I fully characterize Sender’s optimal distribution of posterior means when

the prior distribution F has binary support, so that a distribution of posterior means

is the same as a posterior distribution (I use the latter expression for simplicity) and

the only restriction imposed on Sender is that E[G] = π for any posterior distribution

G. I focus on the case where supp(F ) = {0, 1}, but under appropriate assumptions, all

results extend to the case where supp(F ) = {α, β} for 0 ≤ α < β ≤ 1; I discuss that

generalization in Appendix A.

I approach the problem in three steps. First, I show, by analogy to a Bayesian persua-

sion problem over Receiver types, that Sender’s utility from a posterior distribution G is

given by the convexification of 1− G. This step allows me to immediately characterize

7The maxmin specification means that Sender moves first, but when F has binary support, the

maxmin and minmax formulations are essentially identical since Sender and Nature’s constraints are of

the same type. In Appendix C, I discuss the relationship between the maxmin and minmax solutions

and provide conditions for a saddle-point solution where Sender’s maxmin-optimal posterior distribution

is also minmax-optimal.

7

Sender’s multiple optimal posterior distributions when π > 1/2; I do so in Proposition 1.

Next, I describe upper-truncated uniform distributions, whose cdfs are piecewise linear

and concave—and use the convexification result to solve explicitly for Sender’s unique

optimal upper-truncated uniform distribution as a function of the mean Receiver type.

Finally, when π ≤ 1/2 I use these distributions to approximate any candidate opti-

mal distribution, and in doing so show that Sender’s optimal upper-truncated uniform

distribution is in fact uniquely optimal overall.

4.1. Characterizing the Distribution of Receiver Types. To characterize Sender’s

utility as a function of the mean Receiver type r∗ without using a specific Receiver type

distribution, Lemma 1 draws an analogy between the objective function of Equation (1)

and Bayesian persuasion over Receiver types.

Lemma 1. For a given posterior distribution G, let G : [0, 1] → [0, 1] be the concav-

ification of G, i.e., the infimum over the set of concave functions H : [0, 1] → [0, 1]

satisfying

H(q) ≥ G(q) ∀q ∈ [0, 1].

Then the following equality holds:

minT∈T

∫ ∫

1(q > r) dG(q) dT (r) = 1− H(r∗).

The result can equivalently be expressed through the convexification of 1 − G, i.e., the

supremum among convex functions that lower-bound 1 − G; this framing is used in

Hu and Weng (2020), which proves a similar result for a finite state space. Both their

proof and mine invoke the result of Kamenica and Gentzkow (2011) to characterize the

solution of the Bayesian persuasion problem without private information.

Proof. Manipulating the bounds of integration to rewrite Sender’s objective function

from Equation (1) gives∫

[0,1]

(∫

[0,1]

1(q > r) dG(q)

)

dT (r) =

∫

[0,1]

(∫

[r,1]

1 dG(q)

)

dT (r)

=

∫

[0,1]

(1−G(r)) dT (r).

Then the minimzation portion of the problem can be written as

maxT∈∆([0,1])

∫

G(r) dT (r) s.t.

∫

r dT (r) = r∗,

8

where I have dropped the constant, rewritten the min as a max, and explicitly included

the mean restriction to highlight the similarity to a Bayesian persuasion problem. In

this case, the Receiver type r fills the role of “posterior belief,” Nature’s utility from a

realized Receiver type is G(r), and the “prior” is the distribution with support {0, 1}

and mean r∗. This final point follows from the observation in Section 3 that when the

prior distribution has binary support, the Bayes-plausibility constraint is the same as

a mean restriction. Thus by Corollary 2 of Kamenica and Gentzkow (2011), Nature’s

utility is given by G. the concavification of G over the interval [0, 1], evaluated at the

prior mean r∗. Flipping the sign again, Sender’s utility is 1− G(r∗), or equivalently the

convexification of 1−G over the interval [0, 1] evaluated at r∗. �

This lemma does not require that F have binary support, and will apply unchanged in

Section 5. However, in this case it is immediately useful in solving for Sender’s optimal

posterior distribution when π > 1/2:

Proposition 1. Let π > 1/2. Then any feasible posterior distribution G ∈ G is optimal

for Sender if and only if it satisfies

G(q) ≤ q ∀ q ∈ [0, 1],

and more than one distribution satisfying this condition exists.

The details of the proof are in Appendix B, but the idea is straightforward. Letting

U denote the uniform distribution U [0, 1], Sender’s utility is always upper-bounded by

1 − U , which is both the convexification of 1 − U and the largest convex function on

[0, 1] passing through the point (1, 0). When π < 1/2, U is not Bayes-plausible, and

Sender must induce relatively more low posteriors than she would under U . However,

when π > 1/2, Sender can achieve the upper bound with any posterior distribution that

first-order stochastically dominates U . One distribution satisfying that condition is

G(q) =

0 q ∈ [0, 2π − 1),

(q + 1− 2π)/(2− 2π) q ∈ (2π − 1, 1].

G is a lower-truncated uniform distribution with no mass on posteriors q ∈ [0, 2π − 1)

and equal mass on all posteriors q ∈ [2π − 1, 1]. In Proposition 2, I show that when

π ≤ 1/2, Sender’s optimal posterior distribution is unique, but in this case there are

many other Bayes-plausible distributions whose concavifications equal U . For example,

setting π = n/(n + 1) and solving for n gives a cdf G(q) = qn that is Bayes-plausible

and satisfies G(q) ≤ q. Thus Sender’s optimal posterior distribution when π > 1/2 is

9

not unique. However, her maxmin utility as a function of the mean Receiver type r∗ is

uniquely given by U(r∗) = 1− r∗ regardless of which posterior distribution is chosen.

4.2. Upper-Truncated Uniform Distributions. When π ≤ 1/2, a distribution G

for which G = U would violate Bayes-plausibility. It is intuitive for Sender to respond

to this limitation by choosing a posterior distribution G with a concave cdf, since by

Lemma 1 any non-convexities in 1 − G tighten Bayes-plausibility without raising her

utility. Motivated by that idea, in this section I introduce and describe upper-truncated

posterior distributions (henceforth UTUs), a class of posterior distributions which are

piecewise linear with concave cdf. A UTU places mass x ≥ 0 on posterior q = 0, equal

mass on all posteriors q ∈ (0, rh] for some rh ≤ 1, and no mass on posteriors q ∈ (rh, 1].

The cdf of a UTU is therefore composed of an upward-sloping line from (0, x) to (rh, 1)

and a horizontal line from (rh, 1) to (1, 1), and is thus concave. Given this structure, I

can use Bayes-plausibility to solve for the unique value of rh corresponding to a given

x, so that a UTU is fully characterized by x. Thus I denote a UTU by Gx, and write

rh(x) for the upper bound on its support.

To solve for rh(x), I use Bayes-plausibility to write

π =

∫

q dGx(q) =

∫

1−Gx(q) dq =(1− x) rh(x)

2

⇔ rh(x) =2π

1− x,

where I integrate by parts to obtain the second integral, then use the shape of G to write

the area under 1−G(q) as a triangle with base rh(x) and height 1−x. Since I have already

characterized Sender’s optimal posterior distribution when π > 1/2 in Proposition 1, I

assume π ≤ 1/2; then this expression implies an upper bound of 1− 2π ≥ 0 on x, since

otherwise rh(x) would exceed 1, but allows x to take any value in [0, 1− 2π]. The UTU

G1−2π for which x meets the upper bound places equal mass on all posteriors q ∈ (0, 1];

it will play a key role in characterizing Sender’s optimal posterior distribution.

Figure 1 shows an example of three UTUs, including G1−2π. The figure suggests that no

UTU lies below another for all q ∈ [0, 1], and therefore that Sender’s optimal UTU may

vary with the model parameters. Lemma 2 formalizes this intuition by characterizing

Sender’s optimal UTU as a function of the mean Receiver type r∗:

Lemma 2. Let π ≤ 1/2. Then if r∗ ≥ 1/2, Sender’s unique optimal UTU is G1−2π, and

if r < 1/2, Sender’s unique optimal UTU is Gx∗ where x∗ = {1− π/r∗}+.

10

1

1

0

q

G1−2π

G0.2

G0

Figure 1. The cdfs of three upper-truncated uniform distributions when

π = 0.35. Note the differing size of the atoms at q = 0; G0 has no atom.

Proof. By construction, any UTU Gx is concave. Thus 1−Gx is convex, and is equal to

its convexification 1− Gx. By Lemma 1, the utility from a UTU Gx is

1− Gx(r∗) = 1−Gx(r

∗) =

{

(1− x)− r∗(1− x)2

2π

}

+

.

The first-order condition in x for the expression in brackets is

−1 + r∗1− x

π= 0 ⇔ xFOC = 1−

π

r∗.

The bracketed expression is increasing in x when x < xFOC and decreasing in x when

x > xFOC. Since x ∈ [0, 1− 2π], if r∗ < π the constrained optimal solution is x∗ = 0 and

if r∗ > 1/2 the constrained optimal solution is x∗ = 1−2π; otherwise the optimum is the

interior solution x∗ = xFOC = 1−π/r∗. Moving forward, I useGx∗ with x∗ = {1− π/r∗}+to denote the optimal UTU when r∗ < 1/2 and refer explicitly to G1−2π as the solution

to this maximization problem when r∗ ≥ 1/2. �

While Lemma 2 shows that a well-defined unique optimal UTU exists, it does not es-

tablish the optimality of UTUs among all posterior distributions with concave cdfs, or

indeed verify the intuition that Sender’s optimal posterior distribution has a concave

cdf; I do both in the next section.

11

4.3. Optimality of Upper-Truncated Uniform Distributions. The main result of

this section is that when π ≤ 1/2, Sender’s optimal posterior distribution is the optimal

UTU from Lemma 2.

Proposition 2. Let π ≤ 1/2. Then if r∗ ≥ 1/2, Sender’s unique solution G∗ to the

persuasion problem of Equation (1) is

G∗(q) = G1−2π(q) =

1− 2π q = 0,

1− 2π + 2πq q ∈ (0, 1].

If r∗ < 1/2, it is

G∗(q) = Gx∗(q) =

x∗ q = 0,

x∗ + q (1− x∗)2/(2π) q ∈ (0, 2π/(1− x∗)],

1 q ∈ (2π/(1− x∗), 1],

where x∗ = {1− π/r∗}+.

The full proof is in Appendix B; I provide an intuition here. Consider a posterior

distribution H that weakly improves on Sender’s utility from G∗; by Lemma 1, it must

be that 1 − H(r∗) ≥ 1 − G∗(r∗). Because 1 − H is convex, it can be lower-bounded by

the line L tangent to it at r∗, and more specifically by L+ since 1− H is weakly positive.

Let H ′ be a posterior distribution such that 1−H ′ = L+; the key step of the proof is to

show that H ′ violates Bayes-plausibility because its mean is too large. Then, because∫

q dH(q) =

∫

1−H(q) d(q)

≥

∫

1− H(q) dq ≥

∫

L+(q) dq =

∫

1−H ′(q) dq =

∫

q dH ′(q),

it must be thatH also violates Bayes-plausibility and is thus an invalid choice for Sender.

To show the violation of Bayes-plausibility, I exploit properties of the convexification

1− H and of UTUs. Because G∗ is optimal among UTUs, it is the case that

1− H(r∗) ≥ 1− G∗(r∗) ≥ 1− G1−2π(r∗) = 1−G1−2π(r

∗).

Therefore, the convexity of 1 − H ensures that 1 − H must lie above 1 − G1−2π on the

interval [0, r∗). To obey Bayes-plausibility, 1 − H must eventually lie below 1 −G1−2π,

so the slope of H at r∗ (and therefore of its tangent line L) must be less than that

of 1 − G1−2π. The lesser slope of L+ compared to 1 − G1−2π ensures that L+(0) >

1−G1−2π(0). Thus there exists an UTU, GxH, satisfying L+(0) = 1−GxH

(0). But since

12

1

1

0

uS

r∗

q

1−G∗

L+

1−GxH

Figure 2. Sender’s utility from the proposed optimal posterior distribu-

tion G∗ compared to the tangent line L+ and the distribution GxH.

L+(r∗) = 1− H(r∗) and 1− H(r∗) represents an improvement on Sender’s utility from

the best UTU, it must be that L+(r∗) > 1−GxH

(r∗), and thus the slope of L+ is greater

than that of 1−GxH.8 This relationship between 1−Gx∗ , L+, and 1−GxH

is shown in

Figure 2. Since UTUs satisfy Bayes-plausibility by construction, the area under L+ is

too large, producing the desired violation of Bayes-plausibility.

Proposition 2 shows that supp(G∗) is a closed interval with lower endpoint at 0 that con-

tains π in its interior, resulting in the “good news, bad news, worst news” interpretation

described in the introduction. When r∗ > π, so that Sender’s Bayes-plausibilty con-

straint is stronger than Nature’s mean restriction, the distribution has an atom at 0 and

worst news is realized with strictly positive probability. Regardless of whether that oc-

curs, the distribution G∗ is uniform over supp(G∗) \ 0: all degrees of good and bad news

are equally likely. This result also verifies the intuition that, when Bayes-plausibility

forces Sender to generate more low posteriors than under a uniform distribution, she

chooses a concave posterior distribution to avoid “wasting” posterior mass. Linearity of

the optimal posterior distribution is in turn a consequence of the specific form of Sender

and Receiver’s utility functions. Because Sender cares equally about all Receiver types,

8There is a slight subtlety here, since when r∗ < 1/2, it may be that GxH= G∗ and (if H matches

Sender’s utility from G∗) that L+(r∗) = 1−GxH

(r∗). However, 1−H must lie above 1−G∗ somewhere

in [0, 1] in order for H to be distinct, and I can show the violation of Bayes-plausibility using this fact.

13

and the share of Receiver types persuaded to choose action a = 1 increases linearly in

the posterior, Sender’s gain from a higher posterior belief is constant. Also as a result

of this linearity, Nature is indifferent between any choice of Receiver types in supp(G∗).

This result will play a key role in the minmax problem of Appendix C; similar indiffer-

ence results for either Nature or the designer appear commonly in maxmin mechanism

design, e.g., in Bose et al. (2006), Carrasco et al. (2019), and Brooks and Du (2021).

4.4. Comparative Statics of Sender’s Utility. Having characterized Sender’s opti-

mal posterior distribution, her utility follows from applying Lemma 1:

Corollary 1. If π ≤ 1/2, then Sender’s maxmin utility is given by

uS(π, r∗) =

1− r∗/(2π) r∗ ∈ (0, π),

π/(2r∗) r∗ ∈ [π, 1/2),

2π − 2πr∗ r∗ ∈ [1/2, 1).

If π > 1/2, then it is given by

uS(π, r∗) = 1− r∗.

A dramatic feature of this result is that, no matter how weak the Bayes-plausibility

constraint, Sender does no better than if Receiver acted on a uniformly drawn belief with

no further information. Regardless of the parameters, replacing the maxmin criterion

with any prior belief that does not first-order stochastically dominate U [0, 1] would result

in strictly higher utility for Sender; no prior belief would result in strictly lower utility.9

Comparative statics in the prior π and mean Receiver type r∗ follow naturally from

the closed-form expression in Corollary 1. Sender’s utility is continuous and strictly

decreasing in r∗ for fixed π, and continuous and weakly increasing in π for fixed r∗.10

Both comparative statics match the economic intuitions of the problem: if Receiver

has a higher prior belief that Sender’s welfare program will be under-budget, or Sender

knows average Receiver is more fiscally liberal, then Sender expects to convince more

Receivers to purchase her product. In fact, a low prior and low mean Receiver type

are “substitutes,” in that a high prior can compensate Sender for the harms of a high

9This result is a straightforward application of the usual concavification argument. In the proof of

Proposition 1, the uniform distribution generates the largest possible convexification; here it generates

the smallest possible concavification.10Specifically, her utility for fixed r∗ is strictly increasing when π ∈ (0, 1/2] and weakly increasing

when π ∈ (1/2, 1).

14

mean Receiver type, and a low mean Receiver type can compensate Sender for the

harms of a high prior. Given any two mean Receiver types r∗1 ≤ r∗2, it is clear from the

functional form of uS that there are two priors π1 ≤ π2 such that uS(π1, r∗1) = uS(π2, r

∗2).

Similarly, given priors I can choose mean Receiver types to get the same result.11 This

substitutability suggests that, in practice, it may be challenging to distinguish the effect

of a changing state of the world from that of changing Receiver attitudes.

5. The Continuous-State Case

In this section, I extend the insights of the binary-state model to the case where F

is a continuously differentiable and unimodal distribution over [0, 1] with F (0) = 0.

Specifically, I assume that, for some mode m ∈ (0, 1), the derivative f of F is strictly

increasing on [0, m) and strictly decreasing on (m, 1]. I also require that F does not

first-order stochastically dominate U [0, 1]. Otherwise, a direct analogue of Proposition

1 shows that, since the concavification of F is the cdf of U [0, 1], Sender cannot improve

on her utility from full disclosure.

The main difference between this setting and the binary-state case of Section 4 is that

Bayes-plausibility now constrains not only the expectation of any feasible distribution

of posterior means G, but also the integral of the cdf of G:∫ x

0

G(q) dq ≤

∫ x

0

F (q) dq ∀ x ∈ [0, 1].

I refer to this additional restriction as the “integral constraint” on G. Because of this

constraint, UTUs are no longer Bayes-plausible, since they have an atom at 0 while F

does not. To ensure that the UTU G0, which has no atom at q = 0, is not Bayes-

plausible, I assume that f(0) < 1/(2π) whenever the latter quantity is strictly positive.

One natural way to adapt UTUs is to use double-truncated uniform distributions, or

DTUs, whose cdfs are of the form

G(q) =

0 q ∈ [0, ℓ),

βq + y q ∈ [ℓ, (1− y)/β),

1 q ∈ [(1− y)/β, 1],

so that the distribution places no mass in the intervals [0, ℓ) (the lower truncation) and

[(1− y)/β, 1] (the upper truncation); Figure 3 shows an example of several DTUs.

11It is not true that given π1 and r∗1 ≤ r∗2 , I can find π2 such that uS(π1, r∗

1) = uS(π2, r∗

2). To see

why, note that uS(π2, r∗

2) ≤ 1 − r∗2 for any π2. Then there is π1 large enough that uS(π1, r∗

2) > 1− r∗2 ,

so that no π2 will compensate Sender for the difference in utility.

15

1

1

0

q

G1/40

G1/100

G1/41/5

Figure 3. The cdfs of three double-truncated uniform distributions when

π = 1/3. The blue and green DTUs G1/40 and G

1/100 have the same y-

intercept but different lower truncations; the blue and red DTUs G1/40 and

G1/41/5 have different y-intercepts but the same lower truncation.

A DTU clearly obeys the integral constraint for x ∈ [0, ℓ), but may violate it if the

atom at ℓ and the slope thereafter are too large; determining when this violation occurs

will be at the core of my characterization of optimal posterior distributions. Since the

additional parameter ℓ allows for a continuum of DTUs at each fixed intercept y (with

the slope β given as a function of y and ℓ), I first characterize the y-optimal DTU—the

utility-maximizing DTU among all Bayes-plausible DTUs with intercept y. When r∗ is

sufficiently small or sufficiently large, the y-optimal DTU has minimal slope among all

Bayes-plausible DTUs with intercept y. This result allows me to prove Proposition 3,

which establishes optimality of DTUs:

Proposition 3. Let r∗ ∈ (0, q1(ℓ∗0, 0)] ∪ [π, 1). Then no distribution of posterior means

gives Sender strictly higher utility than all double-truncated uniform distributions.

The value q1(ℓ∗0, 0) is a function of the lower-truncation length of the 0-optimal DTU, and

is defined in Corollary 2. As with Proposition 2 in the binary-state case, the core of the

result is a local approximation of any candidate optimal distribution by a piecewise linear

one. However, the presence of the lower truncation means I can no longer use a Bayes-

plausible DTU as a global bound on that approximating distribution. In the absence

16

of this bound, and since Sender’s utility depends only on the value of the concavified

posterior mean distribution at r∗, she is free to construct a non-DTU optimal distribution

by varying the mass placed in and around the lower truncation interval. In Section 5.3,

I show an example of one such distribution, and highlight its connection to the challenge

of characterizing optimal posterior mean distributions for intermediate values of r∗.

Towards proving Proposition 3, I characterize y-optimal DTUs in the following section.

I then prove the proposition, after which I return to the question of how the need for a

lower truncation region allows Sender greater flexibility than in the binary-state case.

5.1. Double-Truncated Uniform Distributions. As with the upper truncation and

atom size of the UTUs in Section 4.2, the requirement that any Bayes-plausible DTU

have mean π allows me to define β as a function of ℓ and y; I thus denote a generic DTU

by Gℓy.

12 When y ∈ [0, 1− 2π], any ℓ ∈ [0, π] generates a valid DTU, with the DTU Gy0

being the same as the UTU with atom y at posterior mean q = 0. When y ∈ (1− 2π, 1)

there is a value ℓminy > 0 which generates a DTU passing through the point (1, 1); this

value lower-bounds the valid choices of ℓ.

For any given mean Receiver type r∗ and intercept y, standard continuity arguments

applied to the cdf of Gℓy and its integral show that there exists some DTU G∗

y which

maximizes Sender’s utility among all DTUs with intercept y; this result is Lemma 7

in Appendix D. I call any such DTU y-optimal, to distinguish from a potential overall-

optimal DTU which maximizes Sender’s utility among all DTUs. Unlike the binary-state

case, it is not possible to solve analytically for G∗y without specifying a functional form

for F . However, appropriate sufficient conditions can ensure that G∗y is both unique and

slope-minimizing among Bayes-plausible DTUs with intercept y:

Lemma 3. Fix y ∈ [0, 1) and r∗ ∈ (0, 1). There is a unique and well-defined DTU Gsmy

that has minimal slope among all Bayes-plausible DTUs with intercept y. If y = 0 or

r∗ ∈ [π, 1), then the y-optimal DTU G∗y equals Gsm

y

The full proof is in Appendix D; I sketch it here. In the first case, setting y = 0 ensures

that the concavification of Gsm0 does not have a kink at q = ℓ; then, regardless of the

value of r∗, slope-minimization subject to Bayes-plausibility is best for Sender. In the

second case, setting r∗ ≥ π ensures that r∗ exceeds any valid choice of ℓ; then the kinked

concavification does not affect Sender’s utility.

12The explicit relationship, as well as other technical properties of DTUs, are presented alongside

relevant proofs in Appendix D.

17

This result allows me to simplify the integral constraint by identifying if and where it

binds. Let Qℓy be the set of interior points q ∈ (0, 1) where the line given by the uniform

portion of the DTU Gℓy intersects the prior F :

Qℓy = {q ∈ (0, 1) | β(ℓ, y) q+ y = F (q)} .

Under the sufficient condition on r∗ above, I can show that either the integral constraint

binds only at the minimum value of q in Qℓy, or it does not bind anywhere in (0, 1).

Lemma 4. Let Gsmy be the DTU with the minimal slope among all Bayes-plausibile

DTUs with intercept y, and let ℓsmy be its lower truncation. If y ∈ [0, 1 − 2π], then

q1(ℓ, y) = minQℓy is well-defined and ℓsmy satisfies

∫ q1(ℓsmy ,y)

0

F (q) dq =

∫ q1(ℓsmy ,y)

0

Gsmy (q) dq

and∫ x

0

F (q) dq >

∫ x

0

Gsmy (q) dq ∀x ∈ (0, q1(ℓ

smy , y)) ∪ (q1(ℓ

smy , y), 1).

If instead y ∈ (1 − 2π, 1), then either the condition above is satisfied, or ℓsmy equals the

minimum lower truncation length ℓmin

y .

The proof, which is in Appendix D, relies crucially assumptions about the shape of

the prior F . In particular, since F is strictly convex on [0, m) and strictly concave

on (m, 1], extending the linear portion of a DTU produces a line that intersects F at

most twice on (0, 1]. If that line lies weakly above F , then the corresponding DTU is

Bayes-plausible—in the interval [0, ℓ) the DTU lies below F , and in the interval [ℓ, 1]

the difference between the integral of F and that of the DTU decreases monotonically

and reaches 0 at q = 1. If instead the line intersects F twice, then the DTU may

increase too quickly in the interval (ℓ, 1] to be Bayes-plausible. Examining the geometric

relationship between F and such a DTU, Gℓy, shows that the difference between their

integrals is minimized at q1(ℓ, y). Thus a two-intersection DTU is Bayes-plausible if

and only if its integral is weakly less than that of F at q1(ℓ, y). When the relationship

holds with equality, the DTU is slope-minimizing among Bayes-plausible DTUs. When

y ∈ [0, 1− 2π], the smallest value of ℓ produces a UTU, which is not Bayes-plausible, so

there is always a DTU satisfying the given property. When instead y ∈ (1 − 2π, 1), it

may be that the minimal value of ℓ produces a DTU that lies weakly above F on [ℓ, 1]

and is therefore Bayes-plausible.

18

Lemma 4 can also be used as a key step in deriving the continuity of ℓ∗y in y, and thus in

providing sufficient conditions for the existence of an overall-optimal DTU (e.g., Lemma

8 in Appendix D). As an immediate corollary, it also allows a characterization of the

overall-optimal DTU when r∗ is small:

Corollary 2. Let ℓ∗0 be the length of the lower truncation for the 0-optimal double-

truncated uniform distribution G∗0, and let q1(ℓ

∗0, 0) be the smallest q ∈ (0, 1) that satisfies

β(ℓ∗0, 0) q = F (q).13 If r∗ ≤ q1(ℓ∗0, 0), then G∗

0 is uniquely optimal among all double-

truncated uniform distributions.

Proof. The proof is by contradiction, and resembles the proof of Proposition 2 in the

binary-state case. Fix r∗ and assume some other DTU G does weakly better than G∗0

for Sender. It must therefore have a smaller slope than G∗0: the intercept of G is larger

than that of G∗0, and G must intersect the horizontal line y = 1 at a larger value of q

than G∗0 or the concavification of G would be everywhere above that of G∗

0. Because of

its larger slope, G∗0 upper-bounds G after r∗ (where G lies weakly below G∗

0) and thus∫ 1

r∗G(q) dq <

∫ 1

r∗G∗

0(q) dq

⇒

∫ 1

q1(ℓ∗0 ,0)

G(q) dq <

∫ 1

q1(ℓ∗0,0)

G∗0(q) dq

⇒

∫ q1(ℓ∗0,0)

0

G(q) dq >

∫ q1(ℓ∗0,0)

0

G∗0(q) dq =

∫ q1(ℓ∗0,0)

0

F (q) dq.

The inequality in the first line is strict because F (q1(ℓ∗0)) < 1, so r∗ is not in the upper-

truncated region of G and there is some strict difference between G∗0 and G captured in

the integral. The first implication follows from the bound on r∗. The inequality in the

third line is because all DTUs have equal means, so

1− π =

∫ 1

0

G(q) dq =

∫ q1(ℓ∗0)

0

G(q) dq +

∫ 1

q1(ℓ∗0)

G(q) dq

=

∫ 1

0

G∗0(q) dq =

∫ q1(ℓ∗0)

0

G∗0(q) dq +

∫ 1

q1(ℓ∗0)

G∗0(q) dq.

The equality in the third line is by Lemma 4, since by Lemma 3 the DTU G∗0 has minimal

slope among Bayes-plausible DTUs with intercept 0. �

This result is similar to the optimal posterior distribution when π > r∗ in the binary-

state case (and indeed that optimal distribution is the DTU G00, which is only ruled out

13The existence of q1(ℓ∗

0, 0) is guaranteed by the proof of Lemma 4.

19

here by Bayes-plausibility). When r∗ is small relative to π, it is more costly for Nature

to generate high Receiver types than for Sender. Thus Sender focuses on the shape of

her optimal distribution near q = 0 at the cost of inducing fewer high posterior beliefs.

When the state is binary, this approach allows Sender to eliminate the atom at q = 0; in

this continuous-state case, Sender can reduce the size of the atom at q = ℓ. While in the

binary-state case the size of the atom increases with the value of r∗, the ability to vary

both ℓ and y in the continuous-state case means that such a result is more challenging

to establish, and may only hold after further restricting the prior distribution F .

5.2. Optimality of Double-Truncated Uniform Distributions. While a clearer

characterization of the optimal DTU outside the small-r∗ case is impeded by the greater

flexibility of the continuous-state problem, the simplified integral constraint in Lemma 4

can be directly combined with the bounding argument in Corollary 2 to prove Proposition

3. Recall that the proposition states that, for r∗ ∈ (0, q1(ℓ∗0)]∪ [π, 1)—i.e., small enough

for G∗0 to be the overall-optimal DTU or large enough to lie beyond the lower truncation

region—no distribution of posterior means gives Sender strictly higher utility than all

double-truncated uniform distributions. Before presenting the proof, I note that in some

cases q1(ℓ∗0) ≥ π, so that the restriction on r∗ is moot. A sharp characterization of when

this inequality is satisfied is not possible, as it depends greatly on the prior distribution

F , but it does arise in important cases like that of a truncated-normal F .

Proof. Let H be a candidate optimal distribution of posterior means. I approximate H ,

the concavification of H , by a tangent at r∗, which I call L(q); let L(0) = yL ∈ [0, 1) be

its intercept. Consider the yL-optimal DTU G∗yL. In order for H to do at least as well

for Sender as GyL, by Lemma 1 it must be that

1−H(r∗) ≥ 1− H(r∗) ≥ 1−G∗yL(r∗).

Thus L must have a weakly smaller slope than GyL, since otherwise L(r∗) > G∗yL(r∗)

and the above inequality is violated.

Recall that if yL ∈ [0, 1−2π], there are DTUs with any slope β ∈ ((1−yL)2/(2π), β(ℓ∗yL)],

and if yL ∈ (1 − 2π, 1), there are DTUs with any slope β ∈ [1− yL, β(ℓ∗yL)]. In the first

case, the slope of L cannot lie below that interval or it would have a weakly smaller

slope than the UTU with intercept yL; then the argument of Proposition 2 applies and

H is not Bayes-plausible. In the second case, L must have a slope that is weakly greater

than the lowest-slope DTU with intercept yL, or it would fail to pass through (1, 1), and

therefore so would H and H . Thus there is a DTU, GL, with the same slope as L.

20

Let r∗ ∈ (0, q1(ℓ∗0)]. By Corollary 2, if GL 6= G∗

0, then because GL(r∗) ≤ G∗

0(r∗), GL is

not Bayes-plausible. If instead r∗ ∈ [π, 1) and GL has a strictly smaller slope than GyL,

then by Lemma 3, GL is not Bayes-plausible.

In either case, given that GL violates Bayes-plausibility, H must violate it as well.

Because GL upper-bounds H beyond ℓ, it must be that∫ 1

q

H(t)dt ≤

∫ 1

q

H(t)dt ≤

∫ 1

q

GL(t)dt

for any q ∈ [ℓ, 1]. Since GL violates Bayes-plausibility, there is some qv ∈ [0, 1] where∫ qv

0

GL(t)dt >

∫ qv

0

F (t)dt,

and since the left-hand side equals 0 for any q ∈ [0, ℓ), it must be that qv ∈ [ℓ, 1]. Then

because H and GL have the same mean,∫ 1

0

H(t)dt =

∫ 1

0

GL(t)dt = 1− π

⇒

∫ qv

0

H(t)dt+

∫ 1

qv

H(t)dt =

∫ qv

0

H(t)dt+

∫ 1

qv

H(t)dt

⇒

∫ qv

0

H(t)dt ≥

∫ qv

0

GL(t)dt >

∫ qv

0

F (t)dt,

where the third line follows from the earlier upper bound on the integral of H . Therefore

H violates Bayes-plausibility and is not a valid distribution.

If GL has the same slope as GyL, then by construction H gives Sender the same utility

as GyL . Thus if there is a DTU that delivers Sender a strictly higher utility than GyL,

then clearly H is not optimal overall. If there is no such DTU, then GyL is optimal

among all DTUs and H also attains Sender’s maxmin utility. �

As in the binary-state case, Sender benefits from creating a large interval where poste-

rior means are uniformly distributed, since any binary-support Receiver type distribution

supported in this interval is equally bad. One way to generate a large interval that com-

plies with the integral constraint is by adding a lower truncation region to the UTUs of

Section 4. As in the binary-state case, the resulting DTU upper-bounds any alternative

distribution of posterior means in the interval [r∗, 1]. However, the lower truncation

region means that the concavification of a DTU lies above the DTU itself, unlike a UTU

which equals its concavification. Thus the DTU is no longer a global upper bound on

all candidate optimal distributions, and in fact Sender can even choose a distribution

21

whose concavification differs from that of a DTU around the lower-truncation region.

Such distributions are the focus of the following section.

5.3. Beyond Double-Truncated Uniform Distributions. Proposition 3 provides

only a mild condition on non-DTU optimal distributions: their slope at r∗ must equal

that of the overall-optimal DTU, if one exists. Using the tighter characterization of the

overall-optimal DTU in Corollary 2, I can strengthen the result in Proposition 3 for low

values of r∗ by giving a more explicit description of the concavification of H :

Corollary 3. Let r∗ ∈ (0, q1(ℓ∗0)]. Then, for any optimal distribution of posterior means

H, the concavification H of H is the same as the concavification G∗0 of the overall-optimal

double-truncated uniform distribution G∗0.

Proof. By the proof of Proposition 3, the slope of H at r∗ equals that of G∗0. In fact,

because G∗0 does not have a kink at ℓ∗0, it upper-bounds H on the whole interval [0, 1].

If H < G∗0 on any measurable subset of [r∗, 1] the proof of Corollary 2 shows that H

violates the Bayes-plausibility integral constraint at q1(ℓ∗0, 0).

If ℓ∗0 ≤ r∗, it is therefore true that H = G∗0 = G∗

0 on [r∗, 1]. Furthermore, G∗0 upper-

bounds H on [0, r∗] and G∗0(0) = H(0) = 0. Because G∗

0 is linear on [0, r∗] (i.e., it has

no kink at ℓ∗0) there is no smaller concave function that takes the same values at q = 0

and q = r∗; thus H = G∗0 on [0, r∗] as well.

If instead ℓ∗0 > r∗, then it is now the case that H = G∗0 = G∗

0 on [ℓ∗0, 0], since that is the

range where the latter equality holds. However, the upper-bounding relationship still

holds on [0, ℓ∗0], and thus the argument above still applies and H = G∗0 on [0, ℓ∗0]. �

While Corollary 3 provides an appealing reason for focusing on DTUs as opposed to other

maxmin-optimal posterior distributions, it unfortunately does not hold when the optimal

DTU is other than G∗0. To see why, consider some overall-optimal DTU with intercept

y > 0. Its concavification has a kink at q = ℓ, so Sender can slightly alter the shape

of the concavification without violating the slope constraint imposed by Proposition 3.

In particular, consider a distribution that places slightly positive mass in the interval

[ℓ − ε, ℓ), has a smaller atom at q = ℓ, and places slightly less mass than the DTU in

the interval (ℓ, ℓ + ε]. This distribution has a double kink, with changes in slope at ℓ

and ℓ+ ε, but is equal to the DTU for q ∈ (ℓ+ ε, 1]; Figure 4 shows an example of this

deviation. Whenever r∗ is above ℓ+ ε, the deviation delivers the same utility for Sender

22

0 1

1

q

G1/31/5

deviation

0.31 0.40.4

0.5

q

G1/31/5

deviation

Figure 4. A potential deviation (green) from the DTU G1/31/5 (blue);

where the concavification of either distribution differs from the cdf, it is

shown by dashed lines. The second panel focuses on the region where the

concavified deviation has a double kink (at both guidelines) as opposed to

the concavified DTU’s single kink (only at the first guideline).

despite having a different concavification. This alternative distribution highlights how

Proposition 3 only allows local approximation of optimal distributions by DTUs.

This deviation also sheds light on the difficulty of characterizing the optimal distribution

of posterior means when r∗ ∈ (q1(ℓ∗0, 0), π). In the binary-state case, a UTU can be used

to bound any alternative posterior distribution, and its concavification equals the original

cdf. However, the lower truncation of a DTU means that neither of those statements

are true. Even if the Bayes-plausibility constraint binds as in Lemma 4, the double-kink

modification in Figure 4 gives Sender a greater utility than the corresponding DTU for

r∗ ∈ (0, ℓ + ε). Without the slope-minimality result of Lemma 3, for some y-optimal

DTUs the Bayes-plausibility constraint may not bind at any x ∈ (0, 1), and a different

approach is needed to characterize the optimal distribution of posterior means.

6. Conclusion

In a binary-state, binary-action model of Bayesian persuasion, if Sender holds a prior

belief over Receiver’s possible cutoffs for choosing her preferred action, she is able to

design the optimal posterior to take advantage of the shape of that prior. In particular,

for most frequently-used posteriors, Receiver either gets good news or news, and is never

less confident that the state of the world is good without being certain that it is bad.

Even when the state is continuous, the optimal distribution of posterior means leaves

23

Receiver well-informed about low states of the world, revealing all states below a certain

threshold. However, when Sender instead has mean-constrained maxmin preferences, I

show that she chooses not to be so transparent about low states. In the binary-state

case, while the bad state is generically revealed with strictly positive probability, Sender

sometimes gives Receiver bad news instead of only worst news. In the continuous-state

case, one class of optimal distributions truncates the prior distribution at the bottom

as well as the top, and linearizes it by making all intermediate posterior means equally

likely. Both of these choices are induced by the potential for an adversarial Receiver

type distribution chosen in the style of an information design problem—thus any non-

concavities in the cdf of posterior means can be punished by Nature, and Sender does her

best to make Nature indifferent between many possible worst-case Receiver types. This

maxmin setting shows how some of the starker results in Bayesian persuasion—namely,

the binary support of the optimal distribution in the binary-state case, and full revelation

of low states in the continuous-state case—are not robust to ambiguity, even if they

persist across many different forms of uncertainty.

References

Auster, S. (2018): “Robust contracting under common value uncertainty,” Theoretical

Economics, 13, 175–204.

Beauchene, D., J. Li, and M. Li (2019): “Ambiguous Persuasion,” Journal of

Economic Theory, 179, 312–365.

Bell, R. M. and T. M. Cover (1980): “Competitive Optimality of Logarithmic

Investment,” Mathematics of Operations Research, 5, 161–166.

Bergemann, D. and S. Morris (2019): “Information Design: A Unified Perspec-

tive,” Journal of Economic Literature, 57, 44–95.

Bergemann, D. and K. Schlag (2011): “Robust monopoly pricing,” Journal of

Economic Theory, 146, 2527–2543.

Bodoh-Creed, A. L. (2012): “Ambiguous beliefs and mechanism design,” Games and

Economic Behavior, 75, 518–537.

Borel, E. (1921): “La theorie du jeu et les equations integrales a noyan symetrique,”

in Comptes Rendus de l’Academie des Sciences, vol. 173, 1304–1308, translation by

L. J. Savage (1953): “The theory of play and integral equations with skew symmetric

kernels,” Econometrica, 21, 97-100.

Bose, S., E. Ozdenoren, and A. Pape (2006): “Optimal auctions with ambiguity,”

Theoretical Economics, 1, 411–438.

24

Brooks, B. and S. Du (2021): “Optimal Auction Design With Common Values: An

Informationally Robust Approach,” Econometrica, 89, 1313–1360.

Carrasco, V., V. F. Luz, N. Kos, M. Messner, P. Monteiro, and H. Moreira

(2018): “Optimal selling mechanisms under moment conditions,” Journal of Economic

Theory, 177, 245–279.

Carrasco, V., V. F. Luz, P. Monteiro, and H. Moreira (2019): “Robust

mechanisms: the curvature case,” Economic Theory, 68, 203–222.

de Castro, L. I., Z. Liu, and N. C. Yannelis (2017): “Ambiguous implementation:

the partition model,” Economic Theory, 63, 233–261.

Dworczak, P. and A. Pavan (2020): “Preparing for the Worst But Hoping for the

Best: Robust (Bayesian) Persuasion,” Working Paper.

Gilboa, I. and D. Schmeidler (1989): “Maxmin expected utility with non-unique

prior,” Journal of Mathematical Economics, 18, 141–153.

Hu, J. and X. Weng (2020): “Robust Persuasion of a Privately Informed Receiver,”

Economic Theory, forthcoming.

Kamenica, E. (2019): “Bayesian Persuasion and Information Design,” Annual Review

of Economics, 11, 249–272.

Kamenica, E. and M. Gentzkow (2011): “Bayesian Persuasion,” American Eco-

nomic Review, 101, 2590–2615.

Kolotilin, A., T. Mylovanov, A. Zapechelnyuk, and M. Li (2017): “Persua-

sion of a Privately Informed Receiver,” Econometrica, 85, 1949–1964.

Kosterina, S. (2020): “Persuasion with Unknown Beliefs,” Working Paper.

Laclau, M. and L. Renou (2017): “Public Persuasion,” Working Paper.

Ollar, M. and A. Penta (2017): “Full Implementation and Belief Restrictions,”

American Economic Review, 107, 2243–2277.

Sahuguet, N. and N. Persico (2006): “Campaign spending regulation in a model

of redistributive politics,” Economic Theory, 28, 95–124.

Wolitzky, A. (2016): “Mechanism design with maxmin agents: Theory and an appli-

cation to bilateral trade,” Theoretical Economics, 11, 971–1004.

25

Appendix A: General Support for Receiver Types

In this appendix, I generalize the model of Section 4 to allow Receiver types to lie in

an arbitrary interval [α, β] ⊆ [0, 1], and prove analogues of all results presented in that

section. The minmax problem presented in Appendix C is not discussed here, but it

also generalizes by following the same approach.

A1: Adapting the Modeling Assumptions. For this more general case, I redefine

the set of possible Receiver type distributions as

T =

{

cdf T over [α, β]

∣

∣

∣

∣

∫

r dT (r) = r∗}

.

I also assume that r∗ ∈ (α, β) and π ∈ [α, β]. The restriction on r∗ naturally extends the

assumption r∗ ∈ (0, 1) from the baseline model in Section 3. With regard to π, the case

π > β is trivial; since the prior is high enough to convince any Receiver type, Sender

provides no information. When π < α, no Receivers are convinced by the prior, and

Bayes-plausibility is so restrictive that UTUs may not be well-defined. I choose to forgo

discussion of that case.

I also redefine the set of feasible posterior distributions as

G =

{

cdf G over [0, β]

∣

∣

∣

∣

∫

q dG(q) = π

}

.

This choice is less straightforward than the redefinition of T and merits some discussion.

The mapping from Blackwell experiments to Bayes-plausible posterior distributions is

unchanged, but I impose the further restriction that those distributions have support

in [0, β] rather than [0, 1]. This choice ensures that there always exists a Receiver type

r = β unconvinced by any feasible posterior, analogous to the Receiver type r = 1 in the

baseline model. The presence of such a type is key and is closely related to the choice

to break ties against Sender.

In general, if tie-breaking is in Sender’s favor, the minimum utility over distributions in

T may not be well-defined. To see why, consider the simple case where α = 0, β = 1,

r∗ = π < 1/2, and Sender chooses a binary-support posterior distribution that generates

posterior q = 0 with probability 1/2 and posterior q = 2π with probability 1/2. With

tie-breaking against Sender, then the worst-case Receiver type distribution places equal

mass on types r = 0 and r = 2π.14 With tie-breaking is in Sender’s favor, then type

r = 0 is convinced by posterior q = 0, while type r = 2π is convinced by posterior

14Posterior q = 0 persuades no Receiver types, so Nature’s goal is to generate a Receiver type

unconvinced by q = 2π as frequently as possible. The lowest (and thus most frequently possible given

26

q = 2π, so this distribution no longer minimizes Sender’s utility. However, the sequence

of distributions Tε with support {ε, 2π + ε} approaches the utility in the tie-breaking-

against-Sender case as ε → 0. Such a construction can be used to ensure that there is no

substantive distinction between the two cases except when Sender places nonzero mass

at q = 1, generating a posterior equal to the upper bound on Receiver types. Then if

tie-breaking is against Sender, type r = 1 is unconvinced by even that posterior, but if

tie-breaking is in Sender’s favor, there is no way to generate a Receiver type r = 1 + ε

who is unconvinced. There is thus a discontinuous increase in Sender’s infimum utility

from placing posterior mass on q = 1 compared to q = 1− ε with ε → 0.

If β < 1 and Sender is permitted to choose any posterior in [0, 1], then there is a similar

discontinuity even when tie-breaking is against Sender: inducing posterior q = β + ε

convinces all Receiver types, but inducing posterior q = β does not. To rule out this

discontinuity, I require that the support of Sender’s chosen posterior distribution lie

in [0, β]. Otherwise, for many reasonable parameter values Sender chooses a posterior

distribution with support {0, β + ε} to exploit the discontinuity. While this choice is

a natural outcome of a model without the upper bound on Sender’s feasible posterior

distributions, it represents an extreme response to ambiguity by Sender, and limits the

amount of insight gained into the robustness of various properties of Bayesian persuasion.

A2: Upper-Truncated Uniform Distributions in the General Setting. With the

choice sets for Nature and Sender properly defined, I can characterize Sender’s optimal

posterior distribution for the general case [α, β] ⊆ [0, 1]. The argument is precisely

analogous to the one used in Section 4; an analogue of Lemma 1 now describes Sender’s

maxmin utility from posterior distribution G as the convexification of 1 − G over the

interval [α, β], and all subsequent results follow by restricting attention to that interval.15

The appropriate analogues of the UTUs defined in Section 4.2 now place mass x ≥ 0

on posterior q = 0, equal and strictly positive mass on all posteriors q ∈ [α, α + rh] for

some rh ≤ β − α, and no mass on posteriors in (α+ rh, 1]. As in the baseline model, rh

is fully determined by x and Bayes-plausibility, so I can again write Gx for a UTU with

the mean constraint) such type is r = 2π; to make it occur as frequently as possible, Nature chooses

r = 0 as the other Receiver type.15Since posteriors q ∈ (0, α) surely convince no Receivers and tighten Bayes-plausibility compared

to placing the corresponding mass at q = 0, there is no reason for Sender to choose any points for the

support of G outside of {0} ∪ [α, β].

27

mass x on posterior q = 0. The expression for a generic Gx is

Gx =

x q′ = [0, α),

x+ (q−α)(1−x)2

2π−2α(1−x)q′ ∈ [α, β],

1 q ∈ (β, 1],

and for the Gx that has full support on [α, β], it is

G1−2π/(α+β) =

1− 2π/(α+ β) q = [0, α),

1− 2π(β − q)/(β2 − α2) q ∈ [α, β],

1 q ∈ (β, 1].

Both are solved for in the same manner as in Section 4.2. In particular, to derive the

expression for Gx, I again use Bayes-plausibility to solve for rh(x):

π =

∫

1−Gx(q) dq = (1− x)α +(1− x) rh(x)

2

⇔ rh(x) = 2

(

π

1− x− α

)

.

The decreasing portion of 1 − Gx is a line through (α, 1− x) and (α + rh(x), 0), which

gives the decomposition of the area under 1 − Gx into a rectangle and triangle in the

expression above. The maximum value of rh(x) is β − α, attained when Gx has full

support on [α, β]; therefore x may take any value in [0, 1 − 2π/(α + β)], rather than

[0, 1− 2π] as in Section 4.2.

The condition 2π ≤ α + β is thus analogous to the condition π ≤ 1/2 in Proposition 2;

when it is satisfied, the uniform distribution over the interval of Receiver types with an

atom at q = 0 is Bayes-plausible. If it is violated, then analogously to Proposition 1,

any distribution G ∈ G satisfying

G(q) ≤ 1−β − q

β − α

is optimal for Sender, because the function

U(q) =

1 q = [0, α),

(β − q)/(β − α) q ∈ [α, β],

0 q ∈ (β, 1],

upper-bounds Sender’s utility and is a feasible convexification on [α, β] for a Bayes-

plausible distribution G satisfying the condition above.

28

Assuming that 2π ≤ α + β, I can prove the analogue of Lemma 2 and solve for the

optimal choice of x, which I again label x∗. To do so I compute the partial derivative in

x of 1−Gx(r∗), which is

∂

∂x(1−Gx(r

∗)) =1

2α

(

π2(r∗ − α)

(α(x− 1) + π)2− r∗ − α

)

.

Its roots are

1−π

α

(

1±

√

r∗ − α

r∗ + α

)

,

both of which are well-defined since 0 < r∗ − α < r∗ + α. The derivative is positive in

between the roots, the smaller of which is surely negative since π ≥ α, and therefore it

is the case that

x∗ =

{

1−π

α

(

1−

√

r∗ − α

r∗ + α

)

}

+

.

When r∗ ≤ ((π − a)2 + π2)/(2π − a), then x∗ = 0.

Given the expression for G1−2π/(α+β), I can rearrange the inequality

1−π

α

(

1−

√

r∗ − α

r∗ + α

)

< 1−2π

α + β

to get the condition r∗ < (α2 + β2)/(2β), which describes where Gx∗ is optimal among

UTUs. On the complementary range for r∗, G1−2π/(α+β) is optimal among UTUs.

A3: Optimality and Sender’s Maxmin Utility. The proof that, when 2π ≤ α+β,

the optimal UTU is uniquely optimal among all feasible posterior distributions G ∈ G is

the same as that of Proposition 2, making use of the fact that the convexification needed

to compute Sender’s utility is only over the interval [α, β].

Sender’s maxmin utility in this general case is therefore given by

1−G0(r∗) = 1−

(r∗ − α)

2(π − α)

when r∗ ≤ ((π − a)2 + π2)/(2π − a) so that Gx∗ is optimal and x∗ = 0, by

1−Gx∗(r∗) =π(α2 + r∗(

√

(r∗ − α)(r∗ + α)− r∗))

α2√

(r∗ − α)(r∗ + α)

when r∗ ∈ [((π − a)2 + π2)/(2π − a), (α2 + β2)/(2β)) so that Gx∗ is still optimal but

x∗ > 0, and by

1−G1−2π/(α+β)(r∗) =

2π (β − r∗)

β2 − α2

otherwise.

29

It follows from the appropriate partial derivatives that Sender’s maxmin utility is once

again monotonically decreasing in r∗ for fixed π and monotonically increasing in π for

fixed r∗. The two new comparative statics are with respect to the interval bounds

α and β. Examining the partial derivatives, where Gx∗ is optimal, Sender’s utility

is strictly increasing in α and constant in β. Where G1−2π/(α+β) is optimal, Sender’s

utility is strictly increasing in α and strictly increasing in β regardless of the value of

x∗. The threshold for optimality of G1−2π/(α+β) is strictly increasing in both α and β,

but since Sender’s utility is continuous, the change in threshold has no effect on the

overall comparative statics. Thus Sender’s utility is strictly increasing in α and weakly

increasing in β. The former is clear from noting that increasing α restricts the set of

Receiver type distributions without altering Sender’s choice set, but the latter is a more

substantive result since, given the assumption that Sender generates no posterior above

β, increasing β expands both Nature’s and Sender’s choice sets.

30

Appendix B: Omitted Proofs for Section 4

In this appendix, I prove Propositions 1 and 2, which fully characterize Sender’s optimal

posterior distributions. I begin with the former:

Proof. For an arbitrary cdf G, it must be that G(1) = 1; therefore 1− G(1) = 0, because

1− G is weakly positive and bounded above by 1−G. Let U be the uniform distribution

over [0, 1] (i.e., the distribution with cdf U(q) = q). Then because U is linear,

1− U(q) = 1− U(q) = 1− q.

Furthermore, adapting Lemma 5 from Appendix A shows that for any concave function

H : [0, 1] → [0, 1] satisfying H(1) = 1,

1− U(q) ≥ 1−H(q) ∀ q ∈ [0, 1].

Therefore by Lemma 1, Sender’s utility is at most 1− U(r∗).

Sender can attain this bound, since∫

q dU(q) =

∫

1− q dq =1

2,

so Bayes-plausibility does not prevent Sender from choosing a posterior distribution

G ∈ G satisfying

G(q) ≤ q ∀ q ∈ [0, 1].

For such a posterior distribution, it must be that G = U because U is concave, lies weakly

below G, and is weakly greater pointwise than any other candidate concavification. Thus

G attains Sender’s best possible maxmin utility, and is optimal. To confirm existence

and non-uniqueness, I describe examples in Section 4.1.

Of course, if G(qℓ) > qℓ for some qℓ ∈ [0, 1], then clearly G(qℓ) > U(qℓ) = U(qℓ). But

then, again adapting Lemma 5 from Appendix A, it must also be that 1 − G(q) <

1 − U(q) ∀ q ∈ (0, 1), so because the utility given by U can be attained, G is strictly

suboptimal for Sender. �

Next, to prove Proposition 2, I first proving two lemmas that describe the relationship

between the UTU G1−2π and the function 1 − H derived from an arbitrary posterior

distribution H . The first establishes that if, for some posterior distribution H , the

function 1−H falls below 1−G1−2π at some mean Receiver type q, Sender’s utility from

H remains below her utility from G1−2π for all higher Receiver types:

31

Lemma 5. Let H be a cdf on [0, 1]. Then if there is q ∈ [0, 1) such that

1−H(q) < 1−G1−2π(q),

then it is also the case that

1− H(q′) < 1−G1−2π(q′) ∀ q′ ∈ [q, 1).

Proof. The proof is by contradiction. Assume there is q such that

1− H(q) ≤ 1−H(q) < 1−G1−2π(q),

but that there is q′ ∈ [q, 1) such that

1− H(q′) ≥ 1−G1−2π(q′).

Since 1 − H(q) < 1 − G1−2π(q) but 1 − H(q′) ≥ 1 − G1−2π(q′), it must be that there

is q1 ∈ [q, q′] where the slope of 1 − H is strictly greater than that of 1 − G1−2π. But

because H and G1−2π are cdfs and 1− H is weakly positive,

1− H(1) = 0 = 1−H(1) = 1−G1−2π(1),

so there must be q2 ∈ [q′, 1] where the slope of 1−H is weakly less than that of 1−G1−2π .

Then q1 ≤ q2 but the slope of 1−H at q1 is strictly greater than at q2, violating convexity

of 1− H , and thus concavity of H . �

In the proof of Proposition 1, I reference this result as showing that, for any concave

function H : [0, 1] → [0, 1] with H(1) = 1, the function 1 − U(q) = 1 − q upper-

bounds 1 − H . Any such H is a valid cdf with H = H , so I can set π = 1/2 so that

1 − G1−2π = 1 − U and apply the lemma to conclude that 1 − H(0) ≥ 1 − U(0) = 1,

so H(0) = 0. In order not to be upper-bounded, there must also be q ∈ (0, 1) where

1−H(q) > 1− U(q). But then, as in the proof of the lemma, the slope of 1 −H must

be strictly greater than that of 1− U somewhere in [0, q] and strictly less somewhere in

[q, 1], violating convexity of 1−H and thus concavity of H .

I can use this result again for the only if portion of Proposition 1: if G(qℓ) > U(qℓ) for

some qℓ ∈ [0, 1], then 1− G(q) < U(q) ∀ q ∈ (0, 1), so G delivers lower utility for Sender

than distributions that first-order stochastically dominate U . Given the assumption,

Lemma 5 immediately shows the result for q ∈ [qℓ, 1), and it follows for q ∈ (0, qℓ]

by applying the previous upper-bounding argument to convex functions in [0, qℓ] with

endpoint (qℓ, 1− qℓ) (instead of range [0, 1] and endpoint (1, 0)).

32

The next lemma describes features of 1 − H when the posterior distribution H weakly

improves on Sender’s utility from G1−2π:

Lemma 6. If H 6= G1−2π is a cdf such that

1− H(r∗) ≥ 1−G1−2π(r∗) and

∫

q dH(q) = π,

then the slope16 of 1− H at r∗ is strictly less than the slope of 1−G1−2π at r∗.

Proof. I first show that there is qd ∈ (r∗, 1] such that

1− H(qd) ≤ 1−H(qd) < 1−G1−2π(qd).

Note that for H to be distinct from G1−2π, there must be some posterior qd ∈ [0, 1] where

1−H(qd) 6= 1−G1−2π(qd). It cannot be the case that

1−H(q) ≥ 1−G1−2π(q) ∀ q ∈ [0, 1] and 1−H(qd) > 1−G1−2π(qd).

If that is the case, then because H is a cdf, it is right-continuous, and therefore fixing

ε > 0 there is δ(ε) > 0 such that

1−H(q′) > 1−H(qd)− ε ∀ q′ ∈ [qd, qd + δ(ε)).

Since the slope of 1 − G1−2π is no greater than 0, setting ε ∈ (0, G1−2π(qd) − H(qd))

ensures that

1−H(q′) > 1−G1−2π(qd) ≥ 1−G1−2π(q′) ∀ q′ ∈ [qd, qd + δ(ε)).

Therefore there is a non-degenerate interval where 1−H > 1−G1−2π, and by assumption

1 − H ≥ 1 − G1−2π everywhere on [0, 1], so integrating the inequality gives a violation

of Bayes-plausibility:∫

q dH(q) =

∫

1−H(q) dq >

∫

1−G1−2π(q) dq =

∫

q dG1−2π(q) = π.

Thus by contradiction there must be qd ∈ [0, 1] such that

1− H(qd) ≤ 1−H(qd) < 1−G1−2π(qd).

By Lemma 5, since 1− H(r∗) ≥ 1−G1−2π(r∗), there is no q ∈ [0, r∗) where 1−H(q) <

1−G1−2π(q). Thus it must be that

1−H(q) ≥ 1−G1−2π(q) ∀ q ∈ [0, r∗],

16Because 1 − H is convex, it is continuous on (0, 1) and its left and right derivatives are always

well-defined. The function 1 − G for any UTU G is also continuous with well-defined left and right

derivatives. When referring to the slope or to a tangent line I consider the right derivative.

33

and therefore qd ∈ (r∗, 1].

The claim now follows by the argument in Lemma 5. Since 1 − H(r∗) ≥ 1 − G1−2π(r∗)

and 1− H(qd) < 1−G1−2π(qd), there is q′ ∈ [r∗, qd] where the slope of 1− H is strictly

less than that of 1−G1−2π. But since H is concave, 1− H is convex and its slope cannot

increase as q decreases; the slope of 1− H at r∗ must therefore be strictly less than that

of 1−G1−2π at r∗. �

The implication is vacuous for r∗ ≤ 1/2, where there are no posterior distributions

that meet the conditions; however, even in that case the result is central to a proof by

contradiction. With these two lemmas in hand, I now prove Proposition 2.

Proof. The proof is by contradiction. Let H be a proposed alternative posterior distri-

bution that delivers weakly greater maxmin utility for Sender than G∗. By Lemma 1

(to define the utility from each posterior distribution) and Lemma 2 (optimality of G∗

among UTUs), it is the case that

1− H(r∗) ≥ G∗(r∗) = 1−G∗(r∗) ≥ 1−G1−2π(r∗).

Consider the line L that is tangent to 1 − H at r∗.17 Because H is convex and weakly

positive (recall that the line ℓ(q) = 0 is convex and lower-bounds 1 − H), it is lower-

bounded by L+(q) = max {L(q), 0}. Furthermore, by Lemma 6, the slope of L is less

than that of 1−G1−2π, so it must be that

1 ≥ 1−H(0) ≥ 1− H(0) ≥ L+(0) > 1−G1−2π(0) = 2π.

In Section 4.2, I showed that for any x ∈ [0, 1 − 2π], there is a corresponding UTU Gx

with Gx(0) = x. Thus since 1 − L+(0) ∈ [0, 1 − 2π], there an UTU—call it GxH—such

that 1 −GxH(0) = 1 − xH = L+(0). If GxH

6= G∗, then because G∗ is uniquely optimal

among UTUs, it must be that

L+(r∗) = 1− H(r∗) ≥ 1− G∗(r∗) > GxH

(r∗) = 1−GxH(r∗).

Then, because L and 1 − GxHintersect at q = 0 but L is greater than 1 − GxH

at

q = r∗, it must be that the slope of L is strictly greater than the slope of the strictly

downward-sloping portion of 1−GxH; therefore in fact

L+(q) ≥ 1−GxH(q) ∀ q ∈ [0, 1] and L+(q

′) > 1−GxH(q′) ∀ q′ ∈ (0, r∗].

17Recall that if r∗ is a kink point of 1− H , I use the right derivative of 1− H to define the slope.

34

This relationship is shown in Figure 2. Integrating the expression and using the fact

that L+ lower-bounds 1− H , which in turn lower-bounds 1−H , it is the case that∫

q dH(q) =

∫

1−H(q) dq ≥

∫

1− H(q)dq ≥

∫

L+(q) dq

>

∫

1−GxH(q) dq =

∫

q dGxH(q) = π.

The first and penultimate equalities are both from integration by parts, and the fi-

nal equality is because all UTUs (including GxH) are Bayes-plausible by construction.

Therefore H violates Bayes-plausibility and is not a valid alternative distribution.

Even when GxH= G∗, it is still the case that, whenever

L+(r∗) = 1− H(r∗) > 1− G∗(r∗) = 1−G∗(r∗),

the slope of L is greater than the slope of the strictly downward-sloping portion of 1−G∗.

In this case, L+(q) > 1−G∗(q) ∀q ∈ (0, r∗] and∫

q dH(q) =

∫

1−H(q) dq ≥

∫

1− H(q)dq ≥

∫

L+(q) dq

>

∫

1−G∗(q) dq =

∫

q dG∗(q) = π,

just as before. Thus H again violates Bayes-plausibility.

If instead GxH= G∗ but now L+(r

∗) = 1 − G(r∗), it must be the case that L and the

strictly downward-sloping portion of 1−G have the same slope, so in fact

L+(q) = 1−G∗(q) ∀ q ∈ [0, 1].

Then then there are two possible cases. The first is trivial:

1−H(q) = L+(q) = 1−G∗(q) ∀ q ∈ [0, 1],

so that H is not a deviation at all. In the second, there must be some q ∈ [0, 1] so

that 1 − H(q) > L+(q); recall that L+ lower-bounds 1 − H , and thus the direction of

the inequality is known. Because H is a cdf, it is right-continuous, and therefore fixing

ε > 0 there is δ(ε) > 0 such that

1−H(q′) > 1−H(q)− ε ∀ q′ ∈ [q, q + δ(ε)).

Since the slope of L+ is no greater than 0, setting ε ∈ (0, 1−H(q)−L+(q)) ensures that

1−H(q′) > L+(q) ≥ L+(q′) ∀ q′ ∈ [q, q + δ(ε)).

35

Therefore there is a non-degenerate interval where 1 − H > L+, and 1 − H ≥ L+

everywhere on [0, 1], so integrating the inequality gives∫

q dH(q) =

∫

1−H(q) dq >

∫

L+(q) dq =

∫

1−G∗dq =

∫

q dG∗(q) = π,

as desired. Having covered both the case GxH6= G∗ and the case GxH

= G∗, I have

shown that in all cases H violates Bayes-plausibility and therefore, by contradiction, G∗

is uniquely optimal. �

36

Appendix C: Sender’s Minmax Utility

In Section 3, I formulated Sender’s problem using maxmin preferences, so that the

worst-case Receiver type distribution is chosen in response to Sender’s selected posterior

distribution. In this appendix, I solve the minmax analogue of Equation (1) for the

binary-state case of Section 4. Just as the condition π ≤ 1/2 is key in determining

whether Bayes-plausibility is strict enough to produce a unique solution for Sender, the

condition r∗ ≤ 1/2 ensures the same for Nature, and leads to the following result relating

the maxmin and minmax problems:

Corollary 4. Let T and G be the sets of feasible distributions for Nature and Sender

respectively, as defined in Section 3. Also let

maxG∈G

{

minT∈T

∫ ∫

1(q > r) dG(q) dF (r)

}

and

minT∈T

{

maxG∈G

∫ ∫

1(q ≥ r) dG(q) dF (r)

}

be Sender’s maxmin and minmax problems, respectively. If π ≤ 1/2 and r∗ ≤ 1/2, then

Sender’s maxmin-optimal posterior distribution is also minmax-optimal.

The full proof is in Appendix C4, but I provide some intuition here. The conditions on

π and r∗ ensures that both Sender and Nature are heavily restricted in their choice of

optimal distribution and uniquely choose upper-truncated uniform distributions with an

atom at 0, which I call T ∗ and G∗ respectively. When two distributions of this form have

the same upper and lower bounds on their support, they are best responses to each other,

so Sender’s maxmin-optimal UTU also maximizes her utility from Nature’s minmax-

optimal Receiver type distribution. Some algebra shows that in fact the conditions in

the corollary directly imply equality of both bounds. When the conditions are violated,

T ∗ and G∗ no longer need to have coinciding bounds on their support. Then Sender

may strictly improve her minmax utility by either compressing all posterior mass below

supp(T ∗) to q = 0, or moving all posterior mass above supp(T ∗) to max {supp(T ∗)};

both choices allow her to generate higher posteriors more frequently.

In Appendix C4, I also provide a partial converse in Corollary 5; when the given condi-

tion is violated, Sender’s maxmin-optimal posterior distribution is not minmax-optimal

unless π = r∗. When π > 1/2, Proposition 1 shows that many posterior distributions

are maxmin-optimal for Sender. If π = r∗, then the maxmin-optimal lower-truncated

uniform posterior distribution is also minmax-optimal.

37

C1: The Minmax Problem. Let the sets of feasible Receiver type and posterior belief

distributions be the same as in Section 4,

T =

{

cdf T over [0, 1]

∣

∣

∣

∣

∫

r dT (r) = r∗}

and G =

{

cdf G over [0, 1]

∣

∣

∣

∣

∫

q dG(q) = π

}

.

Then the minmax analogue of Equation (1) is

minT∈T

{

maxG∈G

∫ ∫

1(q ≥ r) dG(q) dF (r)

}

.

Given the use of a weak inequality rather than a strict one as in Equation (1), a brief note

on tie-breaking is in order. As discussed in Appendix A1, tie-breaking for and against

Sender are generally equivalent when the posterior q = 1 and Receiver type r = 1 do not

arise with strictly positive probability, since with unfavorable tie-breaking Sender can

replicate her utility with favorable tie-breaking by moving mass from some posterior q′

to q′+ ε. However, when Sender is attempting to persuade Receiver type r = 1, it is not

possible to induce posterior q = 1 + ε, so this approach fails. To accommodate the fact

that Nature “moves first” in the minmax setting, I break ties in Sender’s favor, so that

Nature cannot gain from putting strictly positive probability on r = 1 and thus altering

the structure of Sender’s best response.18

C2: The Minmax Receiver Type Distribution. I begin with an approach analo-

gous to Lemma 1 that characterizes the max portion of the minmax problem without

reference to a particular posterior distribution G. First, I rearrange the interior max

problem to get a Bayesian persuasion problem:

maxG∈G

∫ ∫

1(q ≥ r) dG(q) dT (r) = maxG∈G

∫

[0,1]

∫

[0,1]

1(q ≥ r) dT (r) dG(q)

= maxG∈G

∫

[0,1]

∫

[0,q]

1 dT (r) dG(q)

= maxG∈G

∫

[0,1]

T (r) dG(q)

= maxG∈∆([0,1])

∫

T (r) dG(q) s.t.

∫

q dG(q) = π

The integral in the third equality represents the mass of T (r) strictly below q; thus it

evaluates to is T (q) rather than T (q)−T (0). Furthermore, the tie-breaking choice means

that there is no need to subtract the mass placed exactly on posterior q = q′ from the

18Analogously, in the maxmin setting I break ties against Sender to avoid solutions where Sender

places strictly positive mass on q = 1.

38

expression. This choice is key in obtaining a result that mirrors Lemma 1; specifically,

the expression contains only the cdf T evaluated at each posterior q ∈ supp(G).

The max portion of Sender’s minmax preferences now resembles the expression in Lemma

1, so by the same argument I can write

maxG∈G

∫ ∫

1(q ≥ r) dG(q) dT (r) = T (π),

Flipping the sign, Nature maximizes the value of the convexification of −T at π. It is

clear that this problem is exactly analogous to Sender’s maxmin problem, which was

to maximize the value of the convexification of 1 − G at r∗. Then, as in the proof

of Proposition 1, if r∗ > 1/2 then Nature can choose any distribution F that first-

order stochastically dominates the uniform distribution U(r) = r. If r∗ ≤ 1/2, then as

in Proposition 2 Nature will choose an upper-truncated uniform distribution, defined

analogously to Sender’s choice but with r∗ and π interchanged. Specifically, for π ≥ 1/2,

Nature chooses

T ∗(r) =

1− 2r∗ r = 0,

1− 2r∗ + 2r∗r r ∈ (0, 1],

and if π < 1/2, then Nature chooses

T ∗(r) =

y∗ r = 0,

y∗ + r (1− y∗)2/(2r∗) r ∈ (0, 2r∗/(1− y∗)],

1 r ∈ (2r∗/(1− y∗), 1],

where y∗ = {1− r∗/π}+.

C3: Indifference between Binary-Support Posterior Distributions. It is clear

that, by the earlier concavification argument, Sender can always attain her minmax-

optimal utility from choosing a particular binary-support distribution G ∈ G with

supp(G) ⊂ supp(T ∗). However, given the shape of T ∗, Sender may in fact be indifferent

between all such distributions. To show this result, consider an arbitrary binary-support

posterior distribution G satisfying supp(G) = {a, b} ⊂ supp(T ∗). Let G place mass m

on q = q1 and mass 1−m on q = q2; by Bayes-plausibility,

mq1 + (1−m) q2 = π ⇔ m =q2 − π

q2 − q1.

Assuming that r∗ ≤ 1/2, so that T ∗ is unique, I can explicitly solve for Sender’s utility

when the Receiver type distribution is T ∗ and the posterior distribution is G. When

39

π ≥ 1/2, her utility is

uS = (2r∗ − 2r∗q1)q2 − π

q2 − q1+ (2r∗ − 2r∗q2)

(

1−q2 − π

q2 − q1

)

= 2r∗(1− π),

which is independent of both q1 and q2. Likewise, if π > 1/2, her utility is

uS =

(

1− y∗ − q1(1− y∗)2

2r∗

)

q2 − π

q2 − q1+

(

1− y∗ − q2(1− y∗)2

2r∗

)(

1−q2 − π

q2 − q1

)

=(1− y∗)(2r∗ − π(1− y∗))

2r∗.

Again, the expression is independent of q1 and q2. Thus when r∗ ≤ 1/2 Sender is

indifferent between all binary-support distributions G with supp(G) ⊂ supp(T ∗). Since

some distribution of this type is minmax-optimal for Sender, any such distribution must

be minmax-optimal. By mixing uniformly over these distributions, Sender can in fact

choose a distribution in the same style as Nature’s, with the mass of the atom at q = 0

determined by Bayes-plausibility (i.e., a UTU, as defined in Section 4.2); the indifference

property above means that such a distribution is minmax-optimal.

C4: Minmax-Optimal Posterior Distributions. Having established properties of

the minmax Receiver type distribution and minmax-optimal posterior distribution, I

now prove Corollary 4, as well as its partial converse Corollary 5. To do so, I consider

four possible cases based on the relationship between π or r∗, and 1/2. These cases cover

all possible values of those model parameters. Throughout I let T ∗ denote the minmax

Receiver type distribution and G∗ denote the maxmin-optimal posterior distribution.

(1) If π > 1/2 and r∗ > 1/2, then (by Proposition 1) Sender’s maxmin-optimal

posterior distribution is any G∗ ∈ G that first-order stochastically dominates

the uniform distribution U [0, 1], and the analogous result is true for T ∗ as well.

It need not be the case that Sender’s maxmin-optimal posterior distribution is

minmax-optimal. Even if both G∗ and T ∗ are lower-truncated uniform distribu-

tions, if π < r∗ then min {supp(G∗)} < min {supp(T ∗)} since Bayes-plausibility

is stricter than the mean restriction. But then any q ∈ (0,min {supp(T ∗)})

convinces no Receivers, so G∗ is not minmax-optimal for Sender.

(2) If π > 1/2 and r∗ ≤ 1/2, then Sender’s maxmin-optimal posterior distribution

is unchanged from case (1), but Nature’s optimal Receiver type distribution is

an upper-truncated uniform with an atom at r = 0. If G∗ is a lower-truncated

uniform distribution, any posterior q ∈ (max {supp(T ∗)} , 1) convinces all Re-

ceivers, so Sender can improve her minmax utility by moving all of that mass to

max {supp(T ∗)} and generating high posteriors more frequently. Since Sender

40

cannot improve her maxmin utility by choosing another shape for G∗, no other

maxmin-optimal distribution G∗ can be minmax-optimal.

(3) If π ≤ 1/2 and r∗ > 1/2, then Sender’s maxmin-optimal posterior distribution

is a UTU, while Nature’s optimal Receiver type distribution is any T ∗ ∈ F

that first-order stochastically dominates U [0, 1]. Therefore all choices of T ∗ have

min {supp(T ∗)} > 0. Since any q ∈ (0,min {supp(T ∗)}) convinces no Receivers,

a minmax-optimizing Sender would be better off by moving mass from that

interval to q = 0; thus G∗ is not minmax-optimal for any T ∗.

(4) If π ≤ 1/2 and r∗ ≤ 1/2, then both Sender and Receiver’s optimal distributions

are upper-truncated uniforms with atoms at 0. The supports of both distribu-

tions are the same. When r∗ < π, 1− π/r∗ < 0 so x∗ = 0 and

max {supp(G∗)} =2π

1− 0= 2π,

while 1− r∗/π > 0 so y∗ = 1− r∗/π and

max {supp(T ∗)} =2r∗

1− (1− r∗/π)= 2π.

Analogously, when r∗ > π, max {supp(G∗)} = max {supp(T ∗)} = 2r∗, and of

course when r∗ = π the Bayes-plausibility constraint and mean restriction are

exactly the same, so (abusing notation slightly) G∗ = T ∗. Therefore in all cases

a minmax-optimizing Sender can choose a uniform mixture over binary-support

posterior distributions G ∈ G satisfying supp(G) ⊂ supp(T ∗); that mixture

generates G∗. Thus regardless of the relationship between r∗ and π, Sender’s

maxmin-optimal posterior distribution is also minmax-optimal.

Case (4) proves Corollary 4; I now state a partial converse:

Corollary 5. If π > 1/2 or r∗ > 1/2, and π 6= r∗, then Sender’s maxmin-optimal

posterior distribution is not minmax-optimal.

This result completes the characterization of when the maxmin problem has a saddle-

point solution. It is not a “full” converse because, in the case π = r∗ > 1/2, the minmax-

optimality of Sender’s maxmin-optimal posterior distribution depends on which maxmin-

optimal posterior distribution is chosen (recall that, by Proposition 1, the maxmin-

optimal posterior distribution is not unique in this case).

41

Proof. Cases (2) and (3) show that if only one of π or r∗ exceeds 1/2, then no maxmin-

optimal posterior distribution is minmax-optimal. Case (1) provides the rest of the

result. If π = r∗ > 1/2, it is possible for G∗ and T ∗ to be lower-truncated uniform

distributions with coinciding supports. Then, by the argument of case (4), Sender’s

maxmin-optimal lower-truncated uniform posterior distribution is also minmax-optimal.

However, clearly both Sender and Nature can choose other optimal distributions, and

it may no longer be that a maxmin-optimal G∗ is also minmax-optimal (e.g., if their

supports are no longer equal) even though Sender’s maxmin and minmax utilities for the

corresponding optimal posterior distributions remain equal.19 The equality of utilities

fails in case (1) if π 6= r∗, since (by inspecting the explicit solution for T ∗ in Appendix

C2) Sender’s minmax utility is strictly increasing in π for any r∗, while Sender’s maxmin

utility is constant in π when π > 1/2. Therefore outside of case (4), Sender’s maxmin

utility may equal her minmax utility only when π = r∗, and in that case the equality

only holds for particular choices of G∗ and T ∗. �

As is the case with all other results, this argument can be extended to the case where

Receiver types lie in a general interval [a, b] ⊆ [0, 1]. In that case, Sender’s maxmin-

optimal posterior distribution is certainly minmax-optimal if 2π ≤ α+β and 2r∗ ≤ α+β,

it may be minmax-optimal (depending on the exact forms of G∗ and T ∗) if π = r∗, and

it is not minmax-optimal in any other case.

19An exact characterization of which pairs (G∗, T ∗) imply that G∗ is minmax-optimal is challenging,

since it depends on which binary-support posterior distributions produce equal and optimal utility for

Sender under T ∗ and whether those distributions can be mixed to form G∗. Having the same support

is a necessary but insufficient condition. For example, if T ∗ is log-concave with π small enough, then

there is a unique minmax-optimal binary-support posterior distribution for Sender while G∗ need not

have binary support.

42

Appendix D: Proofs for Section 5

Here I include omitted proofs for the continuous-state case described in Section 5.

D1: Properties of DTUs. To begin, I describe DTUs in more detail.The uniform

portion of the DTU (between the lower and upper truncations) has slope β, which I

refer to as the slope of the DTU. The line L(q) = βq + y, which forms that uniform

portion, intersects the vertical axis at y; I refer to this value as the intercept of the

DTU. To derive a relationship between β, y, and ℓ, I use the fact that Bayes-plausibility

requires that E[G(q)] = π. This condition immediately imposes the restriction that

ℓ ∈ [0, π]; using simple geometry to compute the integral of a DTU’s cdf and set it equal

to 1− π (as in Section 4.2) shows that

β(ℓ, y) =(π − yℓ)−

√

(π − ℓy)2 − ℓ2 (1− y)2

ℓ2.

This expression is continuously differentiable for ℓ ∈ (0, π] and y ∈ [0, 1). Fixing ℓ,

β(ℓ, y) is injective and decreasing in y. Fixing y, β(ℓ, y) is injective and increasing in

ℓ, attaining a maximum of β(π, y) = (1 − y)/π. While β(0, y) is not defined using the

expression above, the limit from the right exists:

limℓ→0+

β(ℓ, y) = limℓ→0+

(1− y)2

(π − yℓ) +√

(π − yℓ)2 − ℓ2 (1− y)2=

(1− y)2

2π.

I thus define β(0, y) = (1− y)2/(2π) explicitly. For y ∈ [0, 1− 2π], β(0, y) is the slope of

the UTU with intercept y. When y > 1 − 2π, there is no corresponding UTU; instead,

the lower bound of interest is β(ℓ, y) = 1 − y, the slope that satisfies G(1) = 1.20

The assumption y > 1 − 2π implies 1 − y ∈ ((1 − y)2/(2π), (1 − y)/π), so the lower

bound is attained at an interior ℓ ∈ (0, π); I call this value ℓminy . Because the function

β(ℓ, y) − (1 − y) is continuously differentiable, the Implicit Function Theorem ensures

that I can write ℓminy as a continuously differentiable function of y.

The concavification of a DTU is easy to compute: so long as the slope of the line through

(0, 0) and (ℓ, β(ℓ)ℓ+ y) is weakly less than β(ℓ, y), the concavification will be

Gℓy(q) =

(β(ℓ, y) ℓ+ y)/ℓ q ∈ [0, ℓ),

Gℓy(q) q ∈ [ℓ, 1].

That condition is simply

β(ℓ, y) ℓ+ y

ℓ= β(ℓ, y) +

y

ℓ≥ β(ℓ, y),

20This is the desired lower bound because any cdf H over [0, 1] must satisfy H = 1, and I wish to

use DTUs to upper-bound other feasible probability distributions.

43

which always holds since y ≥ 0 and ℓ ≥ 0. Thus the concavification of a DTU is

composed of two upward-sloping line segments with a kink at ℓ and a constant line

segment in the region of the upper truncation.

D2: y-Optimal DTUs. Given a value of the mean Receiver type r∗ and a fixed inter-

cept y, I show the existence of a well-defined and unique DTU that provides Sender’s

highest utility among all DTUs with an intercept of y. Since y is fixed, for this section

I drop the dependence on y from all functions.

Lemma 7. Given r∗ ∈ (0, 1) and y ∈ [0, 1), there is a well-defined DTU G∗y with lower

truncation length ℓ∗y that maximizes Sender’s utility among all Bayes-plausible DTUs

with intercept y.

Proof. Let Vy ⊂ [0, π] be the set of ℓ such that a DTU with lower truncation ℓ and

intercept y is Bayes-plausible. I first show that Vy is closed; since it is clearly also

bounded, Vy is therefore compact. To do so, I define the function

v(x, ℓ) =

∫ x

0

F (q) dq −

∫ x

0

Gℓy(q) dq

for some DTU Gℓy with intercept y and lower truncation ℓ. This function captures the

value of the Bayes-plausibility integral constraint for Gℓy at x ∈ [0, 1]. Clearly v(0, ℓ) = 0,

and v(1, ℓ) = 0 because E[F ] = E[Gℓy] = π.

At any x, the integral of Gℓy on [0, x] is continuous in ℓ. This result is obvious for x 6= ℓ

(since Gℓ(q) is continuous in ℓ at those points) and holds for x = ℓ because the left

and right limits as x → ℓ are both 0. Therefore v(x, ℓ) is also continuous in ℓ for fixed

x, since it depends on ℓ only through that integral. If Gℓy is not Bayes-plausible, then

(since it satisfies E[Gℓy] = π by construction) there must be some xneg ∈ (0, 1) for which

v(xneg, ℓ) < 0. Because v(xneg, ℓ) is continuous in ℓ, there is ε > 0 such that for any ℓ′

in a ε-neighborhood of ℓ, v(xneg, ℓ′) < 0. Therefore any Gℓ′

y is not Bayes-plausible, so

U ⊂ [0, π], the set of ℓ where Bayes-plausibility fails, is open. Since Vy = [0, π] \ U , it

must be that V is closed.

By Lemma 1, Sender’s utility from a DTU is given by

uS(r∗, ℓ) = 1−

(β(ℓ) + y/ℓ) r∗ + y ℓ < r∗,

Gℓy(r

∗) ℓ ≥ r∗.

This function is continuous in ℓ on [0, π]. Since β(ℓ) is continuous in ℓ on [0, π], each

of the two piecewise portions of uS are clearly continuous in ℓ; it remains only to check

44

the case ℓ = r∗. But because the left and right limits as ℓ → r∗ exist (by continuity

of each piecewise portion) and are equal (by construction of uS), uS is continuous at

ℓ = r∗ as well. Therefore the image of V under uS must be compact, and thus contains

a well-defined maximum, which is attained by some (possibly multiple) ℓ ∈ V . �

I now provide the formal proof of Lemma 3, which gives sufficient conditions for when

G∗y is slope-minimizing among all DTUs with intercept y:

Proof. Fix y ∈ (0, 1). By Lemma 8, the set Vy of values of ℓ such that Gℓy is Bayes-

plausible is closed, and the function β(ℓ, y) is continuous and monotonic in ℓ for fixed

y, so there is a unique ℓsm ∈ Vy such that β(ℓsm, y) = infℓ∈Vyβ(ℓ, y).

Now I show that either of the conditions provided in the lemma are sufficient for the

slope-minimizing DTU to be optimal. First fix y = 0. Then β(ℓ) + y/ℓ = β(ℓ), so

uS(r∗, ℓ) = 1 − G(r∗); that is, there is no kink at ℓ in Sender’s utility from DTUs with

intercept 0. Thus Sender’s utility from Gℓy is strictly greater than her utility from Gℓ′

y

if and only if β(ℓ) < β(ℓ′). By Lemma 7, there exists a DTU G∗0 with lower truncation

length ℓ∗0 that maximizes Sender’s utility among all Bayes-plausible DTUs with intercept

0. No other Bayes-plausible DTU can have a strictly smaller slope, since then it would

deliver a strictly higher utility. But no other Bayes-plausible DTU can have the same

slope, β(ℓ∗0), since there can be no ℓ′ 6= ℓ∗0 where β(ℓ) = β(ℓ∗0). Therefore all other

Bayes-plausible DTUs have strictly larger slope, and so G∗0 satisfies both (1) and (2).

If instead r∗ ∈ [π, 1), then similarly uS(r∗, ℓ) = 1 −G(r∗); since ℓ ∈ [0, π], r∗ surely lies

weakly above ℓ. The argument is then the same; a DTU is utility-maximizing if and

only if it is slope-minimizing, Lemma 7 guarantees the existence of a utility-maximizing

DTU, and the injectivity of the map from ℓ to β(ℓ) guarantees uniqueness. �

D3: Overall-Optimal DTUs. Let Ur∗ be the set of utilities attained by any y-optimal

DTU:

Ur∗ ={

uS(r∗, ℓ, y) |Gy

ℓ = G∗y for some y ∈ [0, 1)

}

,

where I restore the dependence on y in uS, since y is no longer fixed. That set is a

subset of [0, 1], and is therefore bounded, so supUr∗ , Sender’s supremum utility over all

y-optimal DTUs (and thus over all DTUs) is well-defined and contained in the closure of

Ur∗ . Further restrictions on F and r∗ provide sufficient conditions for Ur∗ to be closed,

and thus for the maximum to exist. In order to state these sufficient conditions, I first

prove Lemma 4. In this proof, I again drop the dependence on y from all functions since

y is fixed, but note important changes in the argument for different values of y.

45

Proof. By Lemma 3, there exists a unique minimal-slope Bayes-plausible DTU with

intercept y.

Because of the shape of F , the equation L(q) = F (q) has at most two solutions with

q ∈ (0, 1]. In particular, if the slope of L is such that it lies completely above F in (0, 1],

then there are no solutions in that interval; if the slope of L is such that it is tangent to

F , then there is one;21 and if the slope of L is less than that of the tangent to F through

y, there are two.

Consider a DTU Gℓy with slope β(ℓ). If L(q) ≥ F (q) ∀ q ∈ (0, 1]—that is, L is either

tangent to F at a point qt or lies entirely above F—then this DTU satisfies Bayes-

plausibility. The function v(x, ℓ), which gives the value of the Bayes-plausibility integral

constraint for Gℓy at some x ∈ [0, 1], is weakly decreasing whenever L(x) ≥ F (x).22 Thus

v(x, ℓ) is weakly decreasing for all x ∈ (ℓ, 1). Since v(1) = 0, it must therefore be that

v(x, ℓ) ≥ 0 ∀ x ∈ (ℓ, 1); of course v(x, ℓ) ≥ 0 ∀ x ∈ [0, ℓ], so in fact v(x, ℓ) ≥ 0 everywhere

in [0, 1] and Bayes-plausibility is satisfied.

If instead L intersects F twice in (0, 1], then the argument is more subtle. In particular,

let q1 be the smallest q ∈ (0, 1] such that β(ℓ) q + y = F (q), and let q2 be the largest.23

By the Implicit Function Theorem, since the function β(ℓ) q + y − F (q) is continuously

differentiable in all variables, I can write q1 and q2 as continuous functions of ℓ. Note

that because of this definition, q1 and q2 are both well-defined (and satisfy q1 = q2) if

β(ℓ) q + y is tangent to F , as well as for all smaller values of ℓ.

If q1(ℓ) > ℓ, then Gℓy(q) < F (q) for q ∈ (0, ℓ) ∪ (q1, q2), but Gℓ

y(q) > F (q) for q ∈

[ℓ, q1) ∪ (q2, 1) (there is equality at q ∈ {0, q1, q2, 1}). Therefore if

v(q1(ℓ), ℓ) =

∫ q1(ℓ)

0

F (q) dq −

∫ q1(ℓ)

0

Gℓy(q) dq ≥ 0 (2)

then v(q) ≥ 0 ∀ q ∈ [0, 1] and Bayes-plausubility is satisfied. Given the relationship

between Gℓy and F , and what it implies about the increasing and decreasing behavior of

v(x, ℓ), it is clear that

v(q1(ℓ), ℓ) = minq∈(0,1)

v(q).

Therefore if a DTU violates Bayes-plausibility, it must be because v(x, ℓ) < 0 for some

x ∈ (0, 1), which in turn implies that v(q1(ℓ), ℓ) < 0. Thus Equation (2) is a necessary

21There is at most one value of ℓ such that β(ℓ) q + y is tangent to F in (0, 1].22When L(x) > 1, Gℓ

y(x) = 1 rather than following L(x), but since the line y = 1 is an upper bound

on F as well, the upper truncation does not affect the behavior of v(x, ℓ).23Clearly, given the shape of F , F (q) > L(q) in the interval (q1, q2).

46

and sufficient condition for a DTU to be Bayes-plausible so long as q1(ℓ) is well-defined.

Furthermore, if the inequality is strict for some ℓ, then because v(q1(ℓ), ℓ) is continuous

in ℓ, it is also strict for ℓ − ε. Finally, if q2(ℓ) = 1, then either y = 1 − 2π and ℓ = 0,

or y ∈ (1− 2π, 1) and ℓ = ℓminy . In the former case, I will show that any UTU intersects

F twice, so it must be that q1(ℓ) < 1. Thus v(x, ℓ) is strictly increasing in (q1, q2) and

is negative at x = q1, so Gℓy is not Bayes-plausible. In the latter case, if β(ℓmin

y ) q + y

intersects F twice, then the same argument applies and Gℓy is not Bayes-plausible. Thus

if q1 6= q2 for some Bayes-plausible DTU, it must be that q1 < q2 < 1.

If q1(ℓ) ≤ ℓ, then Gℓ(q) < F (q) ∀ q ∈ (0, q1)∪(q1, q2), with equality at q1 only if q1(ℓ) = ℓ.

It is then clear that v(x, ℓ) > 0 on (0, q2), and since v(x, ℓ) is strictly decreasing on (q2, 1)

with v(1, ℓ) = 0, it must be that v(q) > 0 ∀ q ∈ (0, 1); thus Gℓ satisfies Bayes-plausibility.

However, as ℓ → 0, it cannot be that q1(ℓ) ≤ ℓ. To see why, assume that for some valid

ℓi, L intersects F twice (so that q1 and q2 are distinct and well-defined). Then, for

ℓ ∈ [0, ℓi], the function β(ℓ) q+ y will intersect F twice. If y > 0, then because F (0) = 0

there is ε > 0 so that β(ℓ) q + y lies strictly above F in [0, ε) for any valid choice of ℓ;

thus q1(ℓ) > ε. If instead y = 0, then because f(0) < 1 − 2π, it must be that for any ℓ,

there is ε > 0 small enough that F (ε) < (1 − 2π) ε ≤ β(ℓ) ε by convexity of F . Thus it

is again true that q1(ℓ) > ε. In either case, taking ℓ < ε24 ensures that ℓ < q1(ℓ).

This result helps show that the cases where q1(ℓ) ≤ ℓ tend not to generate the lowest

Bayes-plausible slope. Note that whenever q1(ℓ) ≤ ℓ, it is true that β(ℓ) ℓ + y ≤ F (ℓ);

thus the latter is a necessary condition for the former. Since β(ℓ) ℓ+ y is continuous in

ℓ, if there is some point where β(ℓ) ℓ+ y ≤ F (ℓ), then I can use the result above about

small ℓ to apply the Intermediate Value Theorem and find a value of ℓ ∈ (0, π] where

β(ℓ) ℓ+ y = F (ℓ) but ℓ− ε < q1(ℓ− ε) for any ε > 0 sufficiently small. Furthermore, I

can show that Gℓ−εy is Bayes-plausible for ε sufficiently small. When q1(ℓ) = ℓ, it must

be that v(q1(ℓ), ℓ) > 0 since Gℓy(q) < F (q) ∀ q ∈ (0, ℓ). By continuity of v(q1(ℓ), ℓ) in ℓ,

it must be that v(q1(ℓ− ε), ℓ− ε) > 0 if ε is sufficiently small. Since ℓ− ε < q1(ℓ − ε),

Equation (2) is a necessary and sufficient condition for Bayes-plausibility of Gℓ−εy , and

therefore Gℓ−εy is Bayes-plausible and has a smaller slope than Gℓ

y.

Having established sufficient conditions for when Bayes-plausibility is satisfied, I can

now use them to obtain a more precise characterization of ℓ∗y. I begin with the case

y ∈ [0, 1 − 2π] and show that ℓ∗y satisfies v(q1(ℓ∗y), ℓ

∗y) = 0. When y ∈ [0, 1 − 2π],

the lowest permissible slope for a DTU is (1 − y)2/(2π), the slope of the UTU with

24Of course, this choice may not be valid for y > 1− 2π, since the lower bound on the set of valid ℓ

is strictly above ℓ = 0; if so, I cannot rule out that q1(ℓ) ≤ ℓ for the minimum permissible ℓ.

47

intercept y. Therefore the line L(q) = q (1 − y)2/(2π) + y must intersect F twice in

(0, 1]. Otherwise the UTU given by G(q) = min {L(q), 1} would like weakly above F on

the interval [0, 1] and strictly above F on some measurable subset of [0, 1]; thus G could

not have the same mean as F , contradicting the construction of all UTUs. Furthermore,

the line L(q) = q (1−y)/π+y lies everywhere above F (by the same reasoning; this line

corresponds to the maximum permissible slope for a DTU, and thus must lie above F for

the mean of that DTU to equal the mean of F ). Therefore by continuity of β(ℓ) in ℓ and

continuity of f , there exists a value ℓt ∈ (0, π) where the line L(q) = β(ℓ) q+y is tangent

to F . The point of tangency must be interior, as β(ℓt) · 1+ y = 1 only if ℓt = 0, in which

case the line β(ℓt) q + y forms part of a UTU and (as argued above) cannot be tangent

to F . Therefore, for ε > 0 sufficiently small, ℓt− ε > 0, the line β(ℓt− ε) q+ y intersects

F twice, and both intersections are bounded strictly below 1. As shown when discussing

the case q1(ℓ) ≤ ℓ above, the constraint in Equation (2) does not bind for Gℓty , so it does

not bind for Gℓt−εy , and the latter DTU is therefore Bayes-plausible. Thus it must be

that for the y-optimal DTU G∗y, the line β(ℓ

∗y) q+y intersects F twice in (0, 1). As shown

earlier, given this double-intersection property and that y ∈ [0, 1−2π], it cannot be that

q1(ℓ∗y) ≤ ℓ∗y. Therefore the necessary and sufficient condition for Bayes-plausibility in

Equation (2) applies, and implies that either v(q1(ℓ∗y), ℓ

∗y) = 0 or v(q1(ℓ

∗y), ℓ

∗y) > 0. To

show that the first property holds, consider the UTU corresponding to ℓ = 0. It is not

Bayes-plausible25 and intersects F twice, so it must be that v(q1(0), 0) < 0. Because

v(q1(ℓ), ℓ) is a continuous function of ℓ that takes both positive and negative values for

ℓ ∈ [0, π], the Intermediate Value Theorem implies that there is a well-defined minimum

value of ℓ, which I call ℓm, for which v(q1(ℓm), ℓm) = 0. Since v(q1(ℓ), ℓ) < 0 for any

ℓ < ℓm, and I have shown that v(q1(ℓ∗y), ℓ

∗y) ≥ 0, it must therefore be that ℓ∗y = ℓm, and

therefore that v(q1(ℓ∗y), ℓ

∗y) = 0.

Next, I show that if y ∈ (1−2π, 1), then either ℓ∗y = ℓminy or v(q1(ℓ

∗y), ℓ

∗y) = 0. Assume that

β(ℓminy ) q+ y intersects F twice; otherwise clearly G

ℓminyy is Bayes-plausible and ℓ∗y = ℓmin

y .

Assume also that the smallest ℓ for which v(q1(ℓ), ℓ) = 0, which I label ℓ0y,26 satisfies

ℓ0y > ℓminy ; otherwise clearly G

ℓ0yy is both Bayes-plausible and slope-minimizing, so again

ℓ∗y = ℓminy . If ℓ∗y ∈ (ℓmin

y , ℓ0y), then it must be that β(ℓ∗y) q + y intersects F twice, because

β(ℓ0y) q + y does. By the definition of ℓ0y, v(q1(ℓ∗y), ℓ

∗y) 6= 0. Clearly that expression

25Any UTU with y = 0 has an atom at 0 while F does not. If y = 0, the restriction that f(0) < 1/(2π)

ensures that the UTU is not Bayes-plausible, since there is ε > 0 such that the UTU places more mass

in the interval [0, ε] than does F .26If no such ℓ exists, I let ℓ0y = π.

48

cannot be strictly positive, or by continuity there would be ε > 0 small enough so that

ℓ∗y − ε is both a valid choice of ℓ (i.e., greater than ℓminy ) and generates a Bayes-plausible

DTU. It must therefore be strictly negative, which means that q1(ℓ∗y) ≤ ℓ∗y; otherwise

G∗y would not be Bayes-plausible. But then the proof that q1(ℓ) ≤ ℓ cannot occur for

small ℓ implies that there is ε > 0 small enough so that ℓ∗y − ε > ℓminy and G

ℓ∗y−εy is

Bayes-plausible, which contradicts the slope-minimizing property of ℓ∗y. Thus it cannot

be true that ℓ∗y ∈ (ℓminy , ℓ0y), so it must be that either ℓ∗y = ℓmin

y or ℓ∗y = ℓ0y; the latter

implies that v(q1(ℓ∗y), ℓ

∗y) = 0. �

Using this characterization, I can prove a sufficient condition on F for Ur∗ , the set of

utilities attained by y-optimal DTUs, to be compact, and thus for Sender to have a

well-defined overall-optimal DTU:

Lemma 8. Let r∗ ∈ [π, 1] and f(1) > 0. Then Sender’s maximum utility over all double-

truncated uniform distributions is well-defined, and is attained by a double-truncated

uniform distribution G∗.

Proof. I first show that ℓ∗y is continuous in y at any y ∈ [0, 1). Given the restriction on r∗,

Sender’s utility from a y-optimal DTU G∗y is given by 1−G∗

y = 1−(β(ℓ∗y, y) r∗+y). Thus

continuity of ℓ∗y in y ensures that Sender’s maximum utility over DTUs with intercept y

is continuous in y. I can then provide sufficient conditions for the intercept of a potential

overall-optimal DTU to lie in a compact set. The continuity condition implies that Ur∗

is compact, so that it contains its closure. Therefore there is some DTU that attains

Sender’s supremum utility over all DTUs.

To show continuity, I first work with y ∈ [0, 1 − 2π), where the argument is most

straightforward. Since in that range v(q1(ℓ∗y, y), ℓ

∗y) = 0 by Lemma 4, and the proof of

that lemma shows that ℓ∗y is the minimal ℓ where the property holds, I can apply the

Implicit Function Theorem to write ℓ∗y as a continuous function of y.

When y ∈ (1− 2π, 1), then Lemma 4 implies that either v(q1(ℓ∗y, y), ℓ

∗y) = 0 or ℓ∗y = ℓmin

y .

In particular, ℓ∗y is either the minimum permissible ℓ or, if that choice does not deliver

a Bayes-plausible DTU, the minimum ℓ satisfying v(q1(ℓ, y), ℓ) = 0. Because both ℓminy

and the minimal ℓ satisfying v(q1(ℓ, y), ℓ) = 0 are continuous in y, the minimum over

those two choices is also continuous in y. Thus ℓ∗y is continuous in y for y ∈ (1− 2π, 1).

49

All that remains is to show that ℓ∗y is continuous in y at y = 1 − 2π. The continuity of

ℓminy in y ensures that the function

u(y) = β(ℓminy ) ℓmin

y + y − F (ℓminy )

is also continuous in y. Because u(y) > 0 for any y ∈ [0, 1−2π], as shown in the proof of

why q1(ℓ) > ℓ for small enough ℓ, it must be that for δ > 0 sufficiently small, u(y′) > 0

for any y′ ∈ (1−2π, 1−2π+δ). Since the line β(0, 1−2π) q+(1−2π) intersects F twice

in (0, 1], it must therefore be that for δ > 0 sufficiently small and y′ ∈ (1−2π, 1−2π+δ),

so do the lines β(ℓminy′ , 1− 2π) q+ (1− 2π), β(ℓmin

y′ , y′) q+ (1− 2π), and β(ℓminy′ , y′) q+ y′.

Because the last intersects F twice in (0, 1], and both intersections occur at values

q > ℓminy′ , the proof of Lemma 4 shows that v(q1(ℓ

∗y′ , 0), ℓ

∗y′) = 0 and ℓ∗y′ is the minimal

value of ℓ such that this property holds. Therefore, by the continuity of the minimal

value of ℓ satisfying this equation, ℓ∗ is continuous in y at y = 1− 2π.

Having shown continuity of ℓ∗y in y, I use the second part of the lemma statement to

show that the set of possibly overall-optimal DTU intercepts is compact. Given that

f(1) > 0, there must be y ∈ (0, 1) such that 1 − y > f(1). Then for any intercept

y ≥ y, the DTU with minimal permissible slope lies above F on (0, 1), and is therefore

Bayes-plausible. Since any DTU with intercept y > y surely lies above the slope-minimal

DTU with intercept y for all q ∈ [π, 1], no DTU with intercept in (y, 1) can be optimal

among all DTUs. Thus the intercept of the overall-optimal DTU lies in the compact set

[0, y]. �

50

eitan sapiro-gheiler mit september 24, 2021 arxiv:2109

Documents