stigma, optimal income taxation, and the optimal …webfac/auerbach/e231_f06/zhiyongan_jo… ·...

STIGMA, OPTIMAL INCOME TAXATION, AND THE

OPTIMAL WELFARE PROGRAM∗

ZHIYONG AN Department of Economics

University of California at Berkeley 549 Evans Hall - #3880

Berkeley, CA 94720-3880 Phone: 510-846-3153

Email: [email protected]

ABSTRACT

In this research I integrate the theory of welfare stigma (Moffitt, 1983) and the theory of optimal income taxation. I study optimal income taxation and an optimal welfare program within a unified framework, taking welfare stigma into account. In the framework, I assume that the government has two policy instruments: general income taxation and a welfare program. I assume that individuals are heterogeneous along two dimensions: wages (skill) and welfare stigma. Each individual is assumed to take the income tax schedule and the parameters of the welfare program as given and make labor supply decisions to maximize his utility. The government is assumed to choose both an optimal income tax schedule and an optimal welfare program so as to maximize a social welfare function subject to its own revenue budget constraint and individuals’ behavioral response to the income tax schedule and the parameters of the welfare program. Within the unified framework, I use both theoretical analysis and numerical simulation to answer some questions that are important for public policy. Theoretical analysis shows that: (1) it can be optimal for the government to offer both general income taxation and a welfare program; and (2) the more intensely people suffer from welfare stigma, the higher the welfare benefit should be. Numerical simulations show that: (1) it is optimal for the government to offer both a negative income tax schedule and a welfare program; and (2) the actual welfare program might be less generous than the optimal welfare program.

∗ I thank George A. Akerlof, Robert M. Anderson, Alan J. Auerbach, John M. Quigley, and Brian D. Wright for valuable discussions and advice. Comments from Kim M. Bloomquist are highly appreciated.

I. INTRODUCTION

In this research, I integrate the theory of welfare stigma (Moffitt, 1983) and the theory of

optimal income taxation. In order to do so, I study optimal income taxation and an optimal

welfare program within a unified framework, taking welfare stigma into account. Within the

unified framework, I answer the following questions: If we introduce welfare stigma into optimal

income taxation, how will it change the optimal income tax schedule? Given reasonable

benchmark parameters, what do the optimal income tax schedule and the optimal welfare

program look like? Should there be a negative income tax? How negative should it be? Should

there be a generous welfare program? How generous should it be? Whether US tax policy

actually differs from the optimum with benchmark parameters? The answers to these questions

are important for public policy.

In order to answer the above questions, I first build a general model. The framework of

the general model is as follows. Unlike the traditional literature on optimal income taxation,

which assumes the government has only one instrument, i.e., general income taxation, I assume

the government has two instruments, namely, general income taxation and a welfare program. In

addition, unlike the traditional literature on optimal income taxation, which assumes individuals

are heterogeneous only along one dimension, i.e., wages (skill), I assume individuals are

heterogeneous along two dimensions, namely, wages and welfare stigma. Each individual is

assumed to take the income tax schedule and the parameters of the welfare program as given and

make two decisions to maximize his utility: (1) whether to self-select into the general income

taxation or the welfare program; and (2) conditional on self-selecting into the general income

taxation, how much labor to supply. If an individual self-selects into the welfare program, then in

addition to the utility derived from income and leisure, he will also incur a utility loss due to

1

welfare stigma (Moffitt, 1983). However, if an individual self-selects into the general income

taxation, there is no extra utility loss, because it is generally believed that people do not attach

stigma to general income taxation, whether it is positive or negative1. Intuitively, other things

being equal, individuals with high welfare stigma tend to self-select into the general income

taxation, while individuals with low welfare stigma tend to self-select into the welfare program.

The government is assumed to choose both an optimal income tax schedule and an optimal

welfare program so as to maximize a social welfare function subject to its own budget constraint

and individuals’ behavioral response to the income tax schedule and the parameters of the

welfare program. My objective is to answer the questions raised above within this framework. As

far as I know, this is the first attempt to study optimal income taxation and an optimal welfare

program within a unified framework while taking welfare stigma into account.

I use two approaches to tackle this problem. First, I simplify the general model to do

theoretical analysis. This simpler model expands the model in Akerlof’s “tagging” paper (1978).

Although this model is simple, the key insight and the framework of the general model are

carried over so that the conclusions should hold in a richer economic context. Second, I specify

the general model with benchmark parameters and then do numerical simulations, following the

approach of Fair (1971). The benchmark parameters are chosen according to their normal values.

Theoretical analysis shows that it can be optimal for the government to offer both general

income taxation and a welfare program. The more intensely people suffer from welfare stigma,

the higher the welfare benefit should be.

Numerical simulations show that it is optimal for the government to offer both a negative

income tax schedule and a welfare program, which verifies the theoretical conclusion that it can

1 This is common assumption in the literature on optimal income taxation. It is implicit in Mirrlees and Fair’s modeling (Mirrlees, 1971; Fair, 1971).

2

be optimal for the government to provide both general income taxation and a welfare program. In

addition, my simulation results imply that the actual welfare program might be less generous

than the optimal welfare program.

The organization of this paper is as follows. Section II reviews the relevant literature. The

literature review serves as the background for my research. Section III presents the general

model. Section IV builds a simpler version of the general model and presents the theoretical

analysis. In Section V, I first describe the algorithm to solve the general model numerically.

Then I specify the parameters of the general model according to their normal values. Finally, I

report my simulation results. Section VI concludes.

II. LITERATURE REVIEW

Mirrlees (1971) and Fair (1971) broke ground with their research on optimal income

taxation. In their models, the government is assumed to have only one instrument at hand,

namely, general income taxation. Individuals are assumed to be heterogeneous only along one

dimension, i.e., wages (skill). Each individual is assumed to take the income tax schedule as

given and choose his labor supply to maximize his own utility. The government is assumed to

choose the optimal income tax schedule by maximizing a social welfare function subject to its

own revenue budget constraint and individuals’ behavioral response to the chosen tax schedule.

The basic trade-off captured in their models is between equity and efficiency. Intuitively, due to

progressive income taxation, the government can redistribute income from rich people to poor

people. The redistribution of income can improve distributional equity because poor people have

higher marginal utility than rich people. However, labor supply is elastic. The progressive

3

income taxation will change people’s labor supply and thus result in deadweight loss (efficiency

cost). Because Mirrlees (1971) assumed a continuum of unbounded skills in his model, it is very

difficult to reach general conclusions. In contrast, Fair (1971) assumed a limited discrete number

of individuals in his model, thus laying a good foundation for his numerical simulations. His

simulations show that the average tax rate should increase with income (skills). In other words,

the optimal tax schedule should be progressive. Therefore, Fair (1971) goes beyond Mirrlees

(1971). In terms of methodology, this paper is indebted to Fair.

Since then, the theory of optimal income taxation based on the original Mirrlees-Fair

framework has been considerably developed. For example, Sadka (1976) and Seade (1977)

showed that if the skill distribution is bounded above, the marginal tax rate should be zero at the

top. This conclusion intuitively makes sense: If the marginal tax rate that applies to the top

people is changed from a positive number to zero, the top people will work more so that his

welfare will increase. Meanwhile, tax revenue is not reduced. Thus, the change will result in a

Pareto improvement. Symmetrically, Seade (1977) showed that if everybody works and labor

supply is bounded above zero, then the bottom marginal tax rate should also be zero. Diamond

(1998) investigated a special case of the Mirrlees-Fair general optimal income taxation model

with quasi-linear utility preferences. Quasi-linear preferences imply that there is no income

effect. Diamond showed that in this case the optimal marginal tax rates follow a U-shaped

pattern. Akerlof (1978) showed that redistribution should not only be based on income, but also

on other observable characteristics such as age, female head of household, etc., that are

correlated with skills. The intuition is that by “tagging” the government can focus limited

resources on the “tagged” poor so that the total social welfare can be increased. Saez (2002)

focused his attention on optimal income transfers for low-income people. Unlike Mirrlees who

4

assumed that labor supply is only along the intensive margin, Saez modeled labor supply

responses of low-income people along two dimensions: (1) the extensive margin (whether to

supply labor or not); and (2) the intensive margin (intensity of work after deciding to supply

labor). His analysis shows that the nature of labor supply response is very important for the

optimal tax schedule: (1) if the extensive margin dominates, then the optimal transfer program is

characterized by a low guaranteed income and subsidies for wage income, similar to the Earned

Income Tax Credit (EITC); and (2) if the intensive margin dominates, then the optimal transfer

program is characterized by a high guaranteed income and high phase-out tax rate, similar to a

classical Negative Income Tax (NIT).

Especially interesting to me, several recent papers attempt to introduce psychological

concepts into the study of optimal income taxation. For example, Ireland (1998, 2001)

introduced status-seeking into the study of optimal income taxation. Status-seeking means that

an individual cares about the spectators’ view of his utility. If an individual cares about the

spectators’ view of his utility, he needs to send out signals to reveal his status. In order to send

out signals, people are driven to over-consume some positional or visible goods. An extreme

case is to “burn money” in public. In order to meet this extra expenditure needs due to status-

seeking, people’s labor supply decisions are distorted, i.e., they have to over-supply labor and

under-consume leisure compared with the no-status-seeking equilibrium. As income taxes act as

a disincentive to supply labor and thus can be used to offset the distortion to labor supply caused

by status-seeking, status-seeking adds one more reason to tax income and implies that the

optimal tax schedule should be steeper and thus redistribution will be increased. Corneo (2002)

studied optimal income taxation when people care about their relative rank in the distribution of

income. If people care about their relative rank in the distribution of income, they have an extra

5

incentive to climb the “income ladder” and thus contribute to an over-supply of labor. In other

words, people’s caring about their relative rank in the distribution of income distorts their labor

supply decision. Meanwhile, when an individual climbs the “income ladder”, he exerts a

negative externality on some other people because those people’s relative ranks have worsened

and thus those people’s utilities are reduced due to his climbing. Therefore, an income tax may

improve efficiency for the same reasons that a Pigouvian taxation does. In other words, if people

care about their relative rank in the distribution of income, introducing a progressive income tax

can result in a Pareto improvement: On one hand, the negative externality can be corrected. On

the other hand, the distorted labor supply can be offset.

But so far no research on optimal income taxation has taken into account the stigma

factor (Moffitt, 1983). In order to account for the low take-up rate of welfare programs2, Moffitt

proposed the idea of stigma, “disutility arising from participation in a welfare program per se.”3

He said that there might be both a “flat” stigma that arises from the participation in welfare

programs itself and a “variable” stigma that varies with the size of the benefit. The “flat” stigma

and the “variable” stigma have different implications for people’s participation decisions. If only

a “flat” stigma exists, then an individual will participate in welfare programs if the utility gain

from the welfare benefit is greater than the utility loss due to the “flat” stigma. If only a

“variable” stigma exists, then the individual will only participate if his utility increases from the

welfare benefit. However, Moffitt’s empirical estimation shows that stigma appears to arise

mainly from the flat component, i.e., a fixed cost associated with participation in the program. 2 According to Moffitt (1983), the participation rate in the Aid to Families with Dependent Children (AFDC) was estimated to be only about 69 percent in 1970 and the participation rate in the Food Stamp Program was only 38 percent. 3 It is interesting to note that the concept of “stigma” can also be described using the terminology of “identity” (Akerlof and Kranton, 2000, 2002, and 2005): (1) people are not identified with welfare programs; (2) the ideal behavior associated with this nonidentification is nonparticipation in welfare programs; and (3) if people participate in welfare programs, they will lose an extra utility because their actual behavior deviates from the ideal behavior.

6

On the other hand, it is generally believed that people do not attach stigma to general

income taxation, whether it is positive or negative.

Clearly, Moffitt’s work shows that stigma plays an important role in people’s utility

functions and thus distorts people’s labor supply decisions and welfare. Because stigma plays an

important role in people’s utility functions and their labor supply decisions, it must also have

important implications for the optimal income tax schedule.

The above literature review motivates me to integrate the theory of welfare stigma and

the theory of optimal income taxation. Optimal income taxation and an optimal welfare program

are thus studied within a unified framework. The following questions are answered within this

unified framework: If we introduce stigma into optimal income taxation, how will it change the

optimal income tax schedule? Given reasonable benchmark parameters, what do the optimal

income tax schedule and the optimal welfare program look like? Should there be a negative

income tax? How negative should it be? Should there be a generous welfare program? How

generous should it be? Whether US tax policy actually differs from the optimum with benchmark

parameters? Clearly, the answers to these questions have important implications for public

policy.

III. THE GENERAL MODEL

In my model, the government is assumed to have two policy instruments at hand: (1)

general income taxation and (2) a welfare program. This assumption is different from that of the

traditional literature on optimal income taxation, which assumes that the government has only

general income taxation as its policy instrument. The general income taxation is assumed to be a

7

linear income tax schedule and is characterized by two parameters ( )tk, , where the parameter k

is a lump sum tax and the parameter t is a constant marginal tax rate. The welfare program is

characterized by a single parameter: the welfare benefit . If an individual enrolls in the welfare

program, he will get welfare benefit b . Thus, the government chooses three parameters: , ,

and b .

b

k t

An individual is characterized by two parameters: (1) his wages and (2) his welfare

stigma

iw

iφ . His welfare stigma is characterized by a single parameter iφ . This is consistent with

Moffitt’s empirical estimation, which shows that welfare stigma appears to arise mainly from the

flat component, i.e., a fixed cost associated with participation in the welfare program. { }iw and

{ }iφ are generated from two random distributions. In other words, I assume that individuals are

heterogeneous along two dimensions: (1) wages and (2) welfare stigma. This assumption is also

different from that of the traditional literature on optimal income taxation, which assumes that

individuals are heterogeneous only along one dimension, i.e., wages. I assume that the size of the

population is . N

An individual who is characterized by ( )iiw φ, is assumed to take the linear income tax

schedule and the welfare program parameter b as given and make two decisions to

maximize his utility. First, he must decide whether to self-select into the welfare program or the

general income taxation. If he decides to enroll in the welfare program, he does not supply any

labor and his only income is the welfare benefit b . Thus, his utility is given by

( tk, )

( ) ii Lbf φ−= ,U ,

where L is his endowment of time. Note that here I take Moffitt’s empirical estimation into

account. His estimation shows that welfare stigma is a fixed cost associated with participation in

the welfare program. Thus I subtract iφ from the individual’s utility ( )Lbf , generated from

8

income and leisure b L . His leisure is his endowment of time L because his labor supply is

zero if he enrolls in the welfare program. On the other hand, if the individual decides to self-

select into the general income taxation, he must make a second decision: how much labor to

supply. Conditional on self-selecting into the general income taxation, he chooses his optimal

labor supply by solving the following optimization problem: { }

( )( )iiiLLLkLwtf

i

−−− ,1max ,

where is his labor supply, ( )iL kLwt ii −−1 is his after-tax income, and iLL −

( )iiw

is his leisure.

Following the tradition, I assume individuals do not attach stigma to general income taxation,

whether it is negative or positive. Taken together, an individual with parameters φ, is

assumed to take as given and solve the following optimization problem to maximize his

utility:

( btk ,,

{ }

)

( )( ) ( )

−−−− iiii LbfLLkLwt φ,,,max

( )

Lf

i

1max

{ }

. For convenience, I denote

( ) ( )

−−− iiiiL

LbfLLkLwt φ,,,max −fi

1=iU max max .

( )maxmaxmax2

max ...,,...,,, Ni UUU

( ) itwnomeTaxatioGeneralInci1

(*:1(

( )n

i

1USW

=

N

i

omeTaxatio

( b

i

)

Welfarei :

( Welfare:

GeneralInci :1

The government is assumed to choose both an optimal income tax schedule and an

optimal welfare program, i.e., an optimal set of parameters ( )btk ,, , so as to maximize a social

welfare function subject to its own budget constraint and

individuals’ behavioral response to the income tax schedule and the welfare program parameters.

I assume that the government must balance its budget. The government’s budget constraint can

thus be written as ∑ ∑ , where

is an indicator function that is equal to 1 if individual chooses

general income taxation and to 0 if individual chooses welfare, and 1 is also an

indicator function that is equal to 1 if individual i chooses welfare and to 0 if individual i

=

=N

i 11

i

+i kL )) )

9

chooses general income taxation. is the

government’s net tax income, and ∑ is the government’s total welfare expense.

( )∑=

+N

iii kLtwnomeTaxatioGeneralInci

1))(*:1(

( )bWelfare:

k

( )

( )

=

N

ii

11

{ }( ) ( )

( ) ( )( ) ∑=

=+

−−−−

N

iii

iiii

N

bWelfareikLtwnomeTaxatio

N

LbfLLkLwtf

U

1

max

:1*

,,,,1

...,

φ

L

i

GeneralInc

U

i

max

...,,2,

max

,

In summary, the government needs to find the optimal set of parameters by

solving the following optimization problem:

( )bt,,

{ }

∑=

=

=

N

i

i

btk

iB

i

UAts

UUSW

1

max

max2

max1,,

:1)(

1

max)(..

...,,,max

( )

where constraint (A) represents individuals’ behavioral response to the income tax schedule and

the welfare program parameters, and constraint (B) is the government’s budget constraint. { }iw

and { }iφ are generated from two random distributions.

IV. THEORETICAL ANALYSIS

In Section III, I have presented the general model. However, Stern (1976) suggests that it

is very difficult to explicitly solve an optimal income taxation model even if people assume a

linear tax schedule4. Like Stern, I also assume a linear tax schedule in the general model. In

principle, my model is more complicated than Stern’s model because I assume that the

government has two instruments instead of one and I assume that individuals are heterogeneous

4 Stern assumed a linear tax schedule in his optimal income taxation model. However, he did not do any theoretical analysis. Instead, he resorted to a numerical method to solve his model.

10

along two dimensions instead of one. Therefore, in order to make the problem tractable and

solvable so that theoretical analysis is feasible, I simplify the general model in this section. This

simplified model is a variant of the model in Akerlof’s “tagging” paper (1978). In his model,

Akerlof assumed that there are two types of labor: skilled labor and unskilled labor. Skilled labor

can take both difficult and easy jobs, whereas unskilled labor can only take easy jobs. With only

two types of labor, Akerlof could derive explicit solutions. In addition, although there are only

two types of labor, the key insight and the general framework of Mirrlees-Fair are carried over so

that his conclusions hold in a richer economic context. This model therefore is a useful starting

point.

I expand Akerlof’s model as follows. First, I assume one-half of the population is made

up of skilled people and the other one-half of the population is made up of unskilled people. I

assume all the skilled people have welfare stigma φ . However, I assume ( )α−1 of the unskilled

people have welfare stigma φ while α of the unskilled people have no welfare stigma at all. In

summary, I assume there are three types of people: (1) skilled people with stigma (21 of the

whole population); (2) unskilled people with stigma (2

1 α− of the whole population); and (3)

unskilled people without stigma (2α of the whole population).

Skilled people have three options: (1) take a difficult job; (2) take an easy job; or (3) be

on welfare. These people’s utility function is given by

( ) ( ) ( ){ }φδδ −−+−− butqutqu EEEDDD ,,max . I assume ( ) 00 =u , ( ) 0.' >u , and ( ) 0." <u . If a

skilled person takes a difficult job, his output is and his disutility of doing the job is Dq Dδ . As

the economy is assumed to be competitive, his pre-tax income is also . His utility depends on Dq

11

his after-tax income and his disutility of doing the difficult job and is given by ( ) DDD tqu δ−− . I

assume ( ) 0>− DDqu δ . I assume if an individual takes an easy job, whether he is a skilled

person or an unskilled person, his output is and thus his pre-tax income is . His disutility

of doing the easy job is

Eq Eq

Eδ whether he is skilled or unskilled. Thus his pre-tax utility is given by

( )Equ Eδ− and his after-tax utility is given by ( ) EEE tqu δ−+ . I assume ( )Equ 0<− Eδ . Finally,

if a skilled person chooses to be on welfare, his income is the welfare benefit b and he does not

supply any labor. However, because he attaches stigma φ to being on welfare by assumption, his

utility is given by ( ) φ−bu . I assume if a skilled person is indifferent between taking a difficult

job, taking an easy job, and being on welfare, then he chooses to take the difficult job. I assume

if a skilled person is indifferent between taking an easy job and being on welfare, then he

chooses to take the easy job.

( ){ }δ−+ tqu EEmax

( EE tq +

u

( ) ( ){ }buE ,qumax

Unskilled people with stigma have two options: (1) take an easy job or (2) be on welfare.

These people’s utility function is given by ( ) φ−buE , . If an unskilled person

with stigma takes an easy job, his utility is given by ) Eu δ− . If an unskilled person with

stigma chooses to be on welfare, his utility is given by ( ) φ−b . I assume if an unskilled

individual with stigma is indifferent between taking an easy job and being on welfare, then he

chooses to take the easy job. In this model, I follow Akerlof (1978) and assume that unskilled

people cannot work at difficult jobs. An alternative but equivalent assumption is that they can

work at difficult jobs but that the disutility of doing so is so high that in equilibrium they will not

choose to.

Unskilled people without stigma have two options: (1) take an easy job or (2) be on

welfare. These people’s utility function is given by tEE δ−+ . If an unskilled

12

person without stigma takes easy jobs, his utility is given by ( ) EEE tqu δ−+ . If an unskilled

person without stigma chooses to be on welfare, his utility is given by ( )bu because he does not

attach stigma to being on welfare. I assume if an unskilled individual without stigma is

indifferent between taking an easy job and being on welfare, then he chooses to be on welfare.

( ){ }

−buE ,, φδ

tt ED ,,

(q

) φδ −>− EE

α

The social welfare function is given by:

( ) ( )( ) ( ){ }( ) ( ){ }

−+−−+

−+−−=

butqubutqu

tqutquSW

EEE

EEE

EEDDD

,max,,max

,maxmin

δφδ

δ5.

Each individual is assumed to take ( )btt ED ,, as given and make optimal labor supply

decisions to maximize his utility. The government is assumed to choose ( )b to maximize

subject to its own revenue budget constraint and individuals’ behavioral response to

.

SW

ttD ,( )bE ,

Throughout the theoretical analysis, I assume that under equilibrium it is optimal for an

individual to stay in the “system”. In other words, we assume ) 0* >−− DDD t δu ,

( ) 0* >−+ EEE tqu δ , and ( ) 0* >−φbu . Zero is the utility outside the “system”.

Because ( ) 0>− DDqu δ and ( ) 0<− EEqu δ by assumption, there are two possible cases

for this problem: (1) ( ) (δ >>− D qu0Dqu ; and (2)

( ) EDDqu ( )Equ δφδ −>−>>− 0 .

Proposition 1: If ( ) ( ) φδδ −>−>>− EEDD quq 0u , and if is close to one, then the

equilibrium is a separating equilibrium, and the optimal conditions are given by:

5 The standard social welfare function assumed in the optimal income taxation literature is a “sum” weighted by population. The “min” social welfare function is chosen so that it is feasible to derive the optimal solutions to the model. One justification for the “min” social welfare function is that it turns out that the intuition underlying the key conclusions does not depend on this specific assumption.

13

(1) ( ) ( )** butq EEE =−δu + ,

(2) ( ) ( ) EEEDDD tqutq δδ −+=− **u − , and

(3) ( )*t = . **1 btED αα +−

Proof: See Appendix 1.

Proposition 2: If ( ) ( ) φδδ −>−>>− EEDD quq 0u , and if α is close to zero, then the

equilibrium is either a separating equilibrium or a pooling equilibrium, and the optimal

conditions are given either by for the separating equilibrium:

(4) ( ) ( )** butq EEE =−δu + ,


(6) ( )*t = , **1 btED αα +−

or by for the pooling equilibrium:

(7) ( ) ( ) φδ −<−+ ** butq EEEu ,

(8) ( ) ( ) φδ −=− ** butq DDD −u , and

(9) * bt = . *D


Proposition 3: If ( ) ( ) EEDD ququ δφδ −>−>>− 0 , and if α is close to one, then the


(10) ( ) ( )** butq EEE =−δu + ,


(12) ( )*t = . **1 btED αα +−


14

Proposition 4: If ( ) ( ) EEDD ququ δφδ −>−>>− 0 , and if α is close to zero, then the

equilibrium is a pooling equilibrium, and the optimal conditions are given by:

(13) ( ) ( ) φδ −<−+ ** butq EEEu ,

(14) ( ) ( ) φδ −=− ** butq DDD −u , and

(15) * bt = . *D


Proposition 5: For both cases, whether α is close to one or close to zero, it can be optimal for

the government to provide both general income taxation and a welfare program.

Proof: Proposition 5 is obvious from Propositions 1, 2, 3, and 4.

Proposition 5 intuitively makes sense. As the government has two concerns (namely,

income redistribution and welfare stigma) to address, the government should use two

instruments.

Proposition 6: For both cases, if α is close to zero, then the equilibrium is likely to be a pooling

equilibrium so that all the unskilled people are on welfare, and the optimal conditions are given

by:

(16) ( ) ( ) φδ −<−+ ** butq EEEu ,

(17) ( ) ( ) φδ −=− ** butq DDD −u , and

(18) * bt = . *D

Proof: Proposition 6 is obvious from Propositions 2 and 4.

Proposition 6 also intuitively makes sense. As α decreases, the unskilled people’s

capability of “producing” utility is decreasing on average, which implies that the burden on the

skilled people will increase. Therefore, conditional on a separating equilibrium, as α decreases,

15

*Dt will increase. When α decreases enough, t will increase enough so that skilled people will

switch from difficult jobs to easy jobs. In order to prevent skilled people from switching from

difficult jobs to easy jobs, when

*D

α is small enough, the optimal condition is changed from

( ) DDD t δ−− *qu = ( )*bu under a separating equilibrium to ( ) ( ) φδ −=−− ** butqu DDD under a

pooling equilibrium so that the government can extract more from the skilled people while still

keeping the skilled people taking difficult jobs.

0*

>φ

( ) φ−*b+ t ( ) δ =−− * utqu DDD

) ( ) φδ −=− ** bub D

0>

φ

Proposition 7: Conditional on a pooling equilibrium so that all the unskilled people are on

welfare, then ddb .

Proof: If the equilibrium is a pooling equilibrium so that all the unskilled people are on welfare,

then the optimal conditions are given by ( ) δ <−* uqu EEE , ( ) φ−*b ,

and . The optimal conditions imply that ** btD = ( −qu D , which further

implies that *

φddb .

In summary, Proposition 5 says that it can be optimal for the government to offer both

general income taxation and a welfare program. Proposition 6 says that if most of the unskilled

people suffer from welfare stigma (i.e., if α is close to zero), then it is likely to be optimal for

the government to put all the unskilled people on the welfare program (i.e., the equilibrium is

likely to be a pooling equilibrium so that all the unskilled people are on welfare). In addition,

conditional on a pooling equilibrium so that all the unskilled people are on welfare, Proposition 7

says that the more intensely people suffer from welfare stigma (i.e., the higher ), the higher the

welfare benefit should be (i.e., the higher ). *b

16

V. SPECIFICATION OF THE GENERAL MODEL AND NUMERICAL SIMULATIONS

The theoretical analysis of the simpler model in Section IV shows that it can be optimal

for the government to provide both general income taxation and a welfare program. I expect this

conclusion will carry over to the simulation analysis of the more complex general model.

However, the theoretical analysis cannot answer the following important questions: (1) given

reasonable benchmark parameters, what do the optimal income tax schedule and the optimal

welfare program look like?; (2) should there be a negative income tax?; (3) how negative should

it be?; (4) should there be a generous welfare program?; (5) how generous should it be?; and (6)

whether US tax policy actually differs from the optimum with benchmark parameters? But

answering these questions is very important for public policy. In this section, I answer these

questions by doing numerical simulations.

Recall that the government needs to solve the following optimization problem to find

both the optimal income tax schedule and the optimal welfare program, i.e., the optimal set of

parameters . ( )btk ,,

{ }( )

{ }( )( ) ( )

( ) ( )( ) ∑∑==

=+

=

−−−−=

N

i

N

iii

iiiiLi

Nibtk

bWelfareikLtwnomeTaxatioGeneralInciB

Ni

LbfLLkLwtfUAts

UUUUSW

i

11

max

maxmaxmax2

max1,,

:1*:1)(

...,,2,1

,,,,1maxmax)(..

...,,...,,,max

φ

( )

where constraint (A) represents individuals’ behavioral response to the income tax schedule and

the welfare program parameters, and constraint (B) is the government’s budget constraint. { }iw

and { }iφ are generated from two random distributions.

17

In this section, I first describe the algorithm to solve the general model numerically. Then

I specify the general model with reasonable benchmark parameters according to their normal

values. Finally, I report my simulation results.

V.A. The Algorithm

The algorithm to solve the general model is as follows.

Step 1: Choose a set of parameters ( )btk ,, .

Step 2: Taking the chosen set of parameters ( )btk ,, as given, I solve constraint (A) to get

each individual’s optimal labor supply: (1) whether to self-select into the general income

taxation or into the welfare program; and (2) conditional on self-selecting into the general

income taxation, how much labor to supply. I also calculate each individual’s optimal utility

taking ( as given. )btk ,,

Step 3: With the chosen set of parameters ( )btk ,, and each individual’s labor supply, I

can check whether constraint (B) is satisfied or not. If constraint (B) is satisfied, then the chosen

set of parameters is feasible and I keep it. I also keep each individual’s optimal labor

supply and optimal utility. If constraint (B) is not satisfied, then the chosen set of parameters

is infeasible and I discard it.

( btk ,, )

)( btk ,,

Step 4: By three-dimension grid searching, I can iterate Step 1 ~ Step 3 to find all the

feasible sets of parameters.

Step 5: For each feasible set of parameters, I calculate the government’s objective value

because I have calculated each individual’s optimal utility.

18

Step 6: By comparing the objective values, I can find the set of parameters ( that

generates the largest objective value. This set of parameters

)btk ,,

( )btk ,, is the solution to the

government’s optimization problem.

V.B. Specification of the General Model

I need to specify the general model from eight aspects: (1) the size of the population ;

(2) the endowment of time

N

L ; (3) the distribution of wages { }iw ; (4) the form of the individual

utility function ( )ii LLYf −, ; (5) the form of the social welfare function

( )maxmax ...,, Ni UUmax2

max1 ...,,, UUSW ; (6) the distribution of welfare stigma { }iφ ; (7) the

correlation coefficient between wages and welfare stigma; and (8) the government’s budget

constraint.

The Size of the Population

Fair (1971), in his simulation, essentially used 50 individuals to approximate a random

distribution. Since then, computing power has increased greatly. Therefore, I assume that there

are 500 individuals in my numerical simulations. I think that it might be sufficient for me to use

500 individuals to approximate the lognormal distribution of wages { }iw and the lognormal

distribution of welfare stigma { }iφ . Thus I assume the size of the population in my

numerical simulations.

500=N

The Endowment of Time

19

I am considering weekly time endowment. Thus, the time endowment L is equal to

24*7=168 hours. Thus I take 168=L in my numerical simulations.

The Distribution of Wages

According to the 2003 Current Population Survey (CPS), mean weekly work time is

around 39 hours. The mean annual wages or salary earned is $33,612. The standard deviation of

annual wages or salary earned is $43,636. If there are 52 weeks in one year, then the mean

weekly wages or salary is 646$52612,33$ ≈ and the standard deviation of the weekly wages or

salary is 839$52636,43 ≈$ . The mean hourly wages or salary is ( ) 6.16$39*52612,33 ≈$ and

the standard deviation of the hourly wages or salary is ( ) 5.21$39*52636,43$ ≈ .

Thus, I assume follows a lognormal distribution with mean 16.6 and standard

deviation 21.5 in my numerical simulations.

{ }iw

The Form of the Individual Utility Function

Stern (1976) specified a Constant Elasticity of Substitution (CES) utility function in his

simulation. That is, he assumed ( ) ( ) ( )( ) 1111,

−−−

−−+=−

εε

εε

εε

γγ iiii LLYLLYf , where Y is

individual ’s income and

i

i iLL − is his leisure. Auerbach et al. (1983) also specified a CES

utility function in their simulation. Although Fair (1971) specified a Cobb-Douglas utility

function in his simulation, a Cobb-Douglas utility function is a special case of a CES utility

function with 1=ε . Adopting the convention, I also specify a CES utility function in my

20

simulation. One advantage of specifying a CES utility function is that I can explicitly solve each

individual’s optimal labor supply.

I need to further specify ε and γ . The parameter ε is individuals’ elasticity of

substitution between income and leisure. The calculations by Stern (1976) show that ε is around

0.5. His most favored specification is 4.0=ε . Based on the work by Ghez and Becker (1975),

Heckman (1974), Rosen (1976), MaCurdy (1981) and Hausman (1981), Auerbach et al. (1983)

specified 8.0=ε in their simulation. Because Fair (1976) specified a Cobb-Douglas utility

function, he essentially specified 1=ε in his simulation. As a compromise, I specify 5.0=ε in

my benchmark simulation.

The parameter γ measures how much individuals value income relative to leisure. It

depends on the choice of labor units. Because I take the time endowment 168=L

168/

hours per

week in my simulations, and because the average labor supply is about 40 hours per week, this

implies that people spend about 25 percent of their time endowment working ( 40 25.0≈ ).

In my benchmark simulation, the parameter γ is set so that in the absence of government

intervention the individual with the mean wages ($16.6 per hour) would work for 0.25 of the

time endowment in the case of 5.0=ε . This suggests a value of 65.0≈γ . Thus, I specify

65.0=γ in my benchmark simulation.

Therefore, if an individual with wages and welfare stigma iw iφ decides to enroll in the

welfare program, his utility is given by ( ) ( )[ ] ii Lb φ+=−− 11 *35.0*0U because in this case

his income is the welfare benefit b , his leisure is his endowment of time

−−1

65.

L , and he incurs an

extra utility loss iφ due to his welfare stigma (Moffitt, 1983). If this individual decides to self-

select into the general income taxation and supply labor , then his utility is given by iL

21

( )( ) ( )[ 111 *35.01*65.0−−− −+−−= iiii LLkLwtU

( ) kLwt ii −−1

] because in this case his after-tax income is

, his leisure is iLL − , and there is no stigma attached to general income taxation.

iw ,

{ }( )( ) (

Li

.0maxmax − −i k 1

( )=iU max max − i Lwt

U max2 ,

U max2 ,

max2 ...,,

Thus, an individual with parameters ( )iφ is assumed to take as given and

solve the following optimization problem to maximize his utility:

( btk ,,

)[ ] ( ) ( )[ ]−−L

1*

−++− −

ii bLLwt φ1

11

35.0*65.0,*35.01*65 −−−

iL1

.

)

And I denote

{ }( ) ( )[ ] ( ) ( )[ ]

−+−+−

−−−−−−

iiiLLbLLk

i

φ111

111 *35.0*65.0,*35.01*65.0max

.

The Form of the Social Welfare Function

Fair (1971) argued that if people were given an opportunity to choose the social welfare

function, it is likely that many, if not most, would prefer one that has equal weights for all.

Therefore, Fair proposed two social welfare functions:

(1) ( ) ∑=

=N

iiNi UUUUSW

1

maxmaxmaxmax1 ...,,...,, ,

i.e., the social welfare is the sum of individuals’ utilities; and

(2) ( ) ∏=

=N

iiNi UUUUSW

1

maxmaxmaxmax1 ...,,...,, ,

i.e., the social welfare is the product of individuals’ utilities.

Fair used the second social welfare function in his numerical simulations. I follow Fair

and assume that in my numerical simulations. ( ∏=

=N

iiNi UUUUUSW

1

maxmaxmaxmax1 ...,,, )

22

The Distribution of Welfare Stigma

Moffitt’s estimation (1983) shows that stigma appears to arise mainly from the flat

component, i.e., a fixed cost associated with participation in the program. His estimation shows

that the mean of the flat stigma is around 0.65. This implies that on average, if we multiply the

income by 1.72 and divide the labor supply by 1.72 simultaneously, we can compensate for the

utility cost of stigma. Moffitt’s estimation also shows that the ratio of the standard deviation to

the mean of flat stigma is around 0.14.

Because the mean weekly wages or salary is around $646 and the mean weekly work

time is around 39 hours, if I apply Moffitt’s results in my CES utility function, then the mean of

stigma employed in my simulation is around

( ) ( )[ ] ( ) ( )[ ] 87.6439168*35.0646*65.072.1/39168*35.0646*72.1*65.0111111 ≈−+−−+−−−−−−

08.914.0*87.64

.

The standard deviation of stigma is around ≈ .

In summary, I assume { }iφ follows a lognormal distribution with mean 64.87 and

standard deviation 9.08 in my numerical simulations.

The Correlation Coefficient between Wages and Welfare Stigma

There is little direct empirical evidence on the correlation coefficient between wages and

welfare stigma. But it seems that there is a positive correlation between them. Thus, I arbitrarily

take the correlation coefficient between wages and welfare stigma to be 0.5 in my numerical

simulations.

The Government’s Budget Constraint

23

The actual budget constraint is made slightly more complicated than the one specified

with constraint (B). I assume that 15% of the total output under no government interference (i.e.,

no income tax and no welfare) is used to run the government. The number 15% is chosen by

referring to government spending in the United States6. Let be individual i ’s optimal labor

supply under no government interference. With the CES utility function, it can be shown that

freeiL

−

+

−

=εε

γγ

γγ

11i

iifree

iw

ww

LL , where 5.0=ε , 65.0=γ , and 168=L

>freeiL1

. Then, the cost to

run the government can be written as , where ( )∑=

N

i

freeii Lwgov

1)0(=ense 0 **15.exp_

( )01 >freeiL is an indicator function that is equal to 1 if and to 0 if otherwise. Thus, the

actual budget constraint used in my numerical simulation is

0>freeiL

( ) ( ) ( )( )∑∑ ∑== =

>+=+N

i

freeii

freei

N

i

N

iii LwLbWelfareikLtwnomeTaxatioGeneralInci

11 1*01*15.0:1))(*:1(

instead of

( )∑ ∑= =

=+N

i

N

iii bWelfareikLtwnomeTaxatioGeneralInci

1 1:1))(*:1( .( )

V.C. Simulation Results

My simulation based on the benchmark parameters shows that the optimal k , , and

are around -$170 per week, 0.32, and $387 per week, respectively. About 35 percent of the

individuals choose to be on welfare. Annually, the welfare benefit is around $20K given there

t b

6 According to Rosen (2002), the federal government expenditure as a percentage of gross domestic product (GDP) was about 15 percent in 1999 and the government expenditure (federal, state, and local) as a percentage of GDP was about 25 percent.

24

are 52 weeks in one year. Because is negative, the benchmark simulation shows that the

government should provide both a negative income tax schedule and a welfare program. Because

the optimal weekly welfare benefit is around $387, the benchmark simulation implies that the

actual welfare program might be less generous than the optimal welfare program

k

7.

t

In the benchmark simulation, I specify 5.0=ε . In order to check the robustness of the

results, I run one more simulation around 5.0=ε . I find that the basic conclusions are

qualitatively robust: (1) the government should provide both a negative income tax schedule and

a welfare program; and (2) the actual welfare program might be less generous than the optimal

welfare program. In more detail, when 4.0=ε (Stern’s most favored specification (1976)), my

simulation shows that the optimal , , and are around -$87 per week, 0.36, and $425 per

week, respectively. About 45 percent of the individuals choose to be on welfare.

k b

The results of the sensitivity analysis seem to intuitively make sense. When ε is

increased from 0.4 to 0.5, the labor supply becomes more elastic. Therefore, the marginal tax rate

is decreased from 0.36 to 0.32 so as to encourage labor supply. In addition, is decreased

from -$87 to –$170 and b is reduced from $425 to $387 so as to encourage switching from the

welfare program to general income taxation. Individuals choosing to be on welfare as a

percentage of the whole population is reduced from about 45 percent to about 35 percent.

t k

In summary, the numerical simulations show that it is optimal for the government to

provide both a negative income tax schedule and a welfare program, which verifies the

theoretical conclusion that it can be optimal for the government to provide both general income

7 The actual welfare benefit varies by family status, income level, state, and program. A good solid description of the key programs, Food Stamp Program, Temporary Assistance for Needy Families (TANF), Medicaid, and etc., is in the Green Book available online at http://waysandmeans.house.gov/Documents.asp?section=813, which shows benefits level.

25

http://waysandmeans.house.gov/Documents.asp?section=813

taxation and a welfare program. In addition, the numerical simulations imply that the actual

welfare program might be less generous than the optimal welfare program.

VI. CONCLUSION

Moffitt’s work shows that stigma plays an important role in people’s utility functions and

thus distorts people’s labor supply decisions and welfare. Because stigma plays an important role

in people’s utility functions and their labor supply decisions, it must also have important

implications for the optimal income tax schedule. However, the traditional literature on optimal

income taxation does not take this important factor into account.

In this research, I integrate the theory of welfare stigma and the theory of optimal income

taxation. To meet this challenge, I am driven to study optimal income taxation and optimal

welfare program within a unified framework while taking welfare stigma into account. In my

framework, the government has two instruments at hand: general income taxation and a welfare

program. Individuals are heterogeneous along two dimensions, namely, wages and welfare

stigma. Each individual is assumed to take the income tax schedule and the parameters of the

welfare program as given and make his optimal labor supply decisions to maximize his utility:

(1) whether to enroll in the welfare program or self-select into the general income taxation; and

(2) conditional on self-selecting into the general income taxation, how much labor to supply. The

government is assumed to choose both an optimal income tax schedule and an optimal welfare

program so as to maximize a social welfare function subject to its own budget constraint and

individuals’ behavioral response to the income tax schedule and the welfare program parameters.

26

Within the unified framework, I use both theoretical analysis and numerical simulation to

provide answers for some questions that are important for public policy. Theoretical analysis

shows that: (1) it can be optimal for the government to offer both general income taxation and a

welfare program; and (2) the more intensely people suffer from welfare stigma, the higher the

welfare benefit should be. Numerical simulations show that: (1) it is optimal for the government

to offer both a negative income tax schedule and a welfare program, which verifies the

theoretical conclusion that it can be optimal for the government to provide both general income

taxation and a welfare program; and (2) the actual welfare program might be less generous than

the optimal welfare program.

27

APPENDIX 1: PROOF OF PROPOSITIONS 1, 2, 3, AND 4

In order to prove Propositions 1, 2, 3, and 4, I need to prove eight lemmas first.

Lemma 1: Conditional on that under equilibrium skilled people choose to take difficult jobs and

all unskilled people choose to take easy jobs, the optimal conditions are given by:

(1) ( ) ( )*1

*1 butq EEE >−+ δu ,

(2) ( ) ( ) EEEDDD tqutq δδ −+=− *1

*1u − , and

(3) * tt = . *11 ED

The maximum value of the social welfare function is ( ) DDD tqu δ−−= *11SW , where satisfies *

1Dt

( ) ( ) EDEDDD tqutqu δδ −+=−− *1

*1 , i.e., ( )( ) EEDDD qt −+−− δδ*

1D quut = −1*1 .

Proof: In order to induce skilled people to take difficult jobs, the government should set

parameters such that ( ) ( ) EEEDDD tqutqu δδ −+≥−− 11 and ( ) ( ) φδ −≥−− 11 butqu DDD . In

order to induce unskilled people with stigma to take easy jobs, the government should set

parameters such that ( ) ( ) φδ −≥−+ 11 butq EEE

( )

u . In order to induce unskilled people without

stigma to take easy jobs, the government should set parameters such that

( )1 butqu EEE >−+ 1δ . Taken together, in order to induce skilled people to take difficult jobs

and all unskilled people to take easy jobs, the government should set parameters such that

( )1 butqu EEE >−+ ( )1δ and ( ) ( ) EEEDDD tqutqu δδ −+≥−− 11

11 ED tt =

. The government’s budget

constraint implies that .

Conditional on that ( ) ( )11 butqu EEE >−+ δ , ( ) ( ) EEEDDD tqutqu δδ −+≥−− 11 , and

, in order to maximize the social welfare function, the government should extract as

much as possible from the skilled people without inducing them to switch from difficult jobs to

11 ED tt =

28

easy jobs. Therefore, the government should set parameters such that

( ) ( ) EEEDDD tqutqu δδ −+=−− *1

*1 . We also have ( ) ( )*

1*

1 butq EEE >−+ δu so that all unskilled

people choose to take easy jobs under equilibrium. The budget constraint implies that t .

Thus, we have proved the optimal conditions.

*1

*1 ED t=

( ) (*1 utq EEE >−δ ) ( ) EEED tqu δδ −+=− *

1*

1Et=

( Dqu

( ) EEE tqu δ−+ *1 ( Eq

( ) ( ) EEEDDD tqutqu δδ −+=−− *1

*1 ( ) DDD tquSW δ−−= *

11


*1 ) ( )EEDD tqut δ −+=− *

1*

1*

1*

1 ED t=


*1 ( )( DDDD tquu −−= − δ*

11*

1

( ) δ−+ *2tq EEE

( ) δ−*2tq DDD

*2

*2D

( )DD tquSW δ−−= *22 2

( ) ( ) φδ −=−− *2

*2 DDDD tutqu ) )φδ +− DDt *

2

( ) ) EDD tq δ−− 2 2 ( ( ) φ−− 22tq DD

With )*1bu + , ( DD tqu − *

1 , and t , the

utility of skilled people is

*1D

) Dδ−D*

1t− , the utility of unskilled people with stigma is

, and the utility of unskilled people without stigma is ) Eδ−Et+ *1u . Because

, we have , where t satisfies *1D

by ( EδDqu − and t .

is equivalent to ) Eq−E+δt .

Thus, we have proved Lemma 1.

Lemma 2: Conditional on that under equilibrium skilled people choose to take difficult jobs and

all unskilled people choose to be on welfare, the optimal conditions are given by:

(4) ( ) φ−< *2buu ,

(5) ( ) φ−= *2bu−u , and

(6) bt = .

The maximum value of the social welfare function is D , where t satisfies *D

, i.e., (( −= −DD quut 1*

2 .


parameters such that ( EED tquu δ +≥− and ) δ ≥− buDu . In

order to induce unskilled people with stigma to be on welfare, the government should set

29

parameters such that ( ) ( ) φδ −<−+ 22 butq EEE

( )2


stigma to be on welfare, the government should set parameters such that

( )2 butqu EEE ≤−+ δ . Taken together, in order to induce skilled people to take difficult jobs

and all unskilled people to be on welfare, the government should set parameters such that

( ) ( ) φδ −2<−+ 2 butqu EEE and ( ) ( ) φδ −≥−− 2buD2tq DD

22 btD =

( )

u . The government’s budget

constraint implies that .

( ) φδ −<−+ 22 butq EEE ( )− 2tqu DD

(qD

( ) ( ) φ−< *2buδ−+ *

2tqu EEE

*2

*2 btD =

( ) ( ) φδ −<− *2

*2 buEEtqE ( ) (δ =−*

2 buDD− tqu D

( ) DDD tq δ−− *2u

( ) ( ) ( )*2

*2 bu<−φ*

2 butqu DDD =−− δ ( D tquSW −=2

( ) ( ) φ−*2Dδ =−− *

2 DDD tutqu ( ) (δ =−*2 ut DD−qD

( ) ( ) φ−*2Dδ =−− *

2 DDD tutqu (( −−DD tquu *

21=D

*2

Conditional on u , ( ) φδ −≥− 2buD , and t 22 bD = ,

in order to maximize the social welfare function, the government should extract as much as

possible from the skilled people without inducing them to switch from difficult jobs to welfare.

Therefore, the government should set parameters such that ) ( ) φ−*2δ =−− *

2 but DDu . We also

have so that all unskilled people choose to be on welfare under

equilibrium. The budget constraint implies that . Thus, we have proved the optimal

conditions.

With +u , ) φ−*2 , and , the utility

of skilled people is

*2

*2 btD =

, the utility of unskilled people with stigma is ( ) φ−*2bu , and

the utility of unskilled people without stigma is ( )*2bu . Because

, we have ) DD δ−*2 , where t satisfies *

D2

by ) φ−*2bu and t . *

2b=*2D

is equivalent to ) )φδ− D +t .


30

Lemma 3: Conditional on that under equilibrium skilled people choose to take difficult jobs,

unskilled people with stigma choose to take easy jobs, and unskilled people without stigma

choose to be on welfare, then the optimal conditions are given by:

(7) ( ) ( )*3

*3 butq EEE =−+ δu ,

(8) ( ) ( ) EEEDDD tqutq δδ −+=− *3

*3u − , and

(9) ( )*t = . *3

*33 1 btED αα +−

The maximum value of the social welfare function is ( ) DDD tquSW δ−−= *33 , where t satisfies *

3D

( ) ( )( )( ) ( )( )DDD tq δ−− *31 .EEDDDD uuqtquut αδδα +−+−−−= −− 1*

31*

3


parameters such that ( ) ( ) EEEDDD tqutqu δδ −+≥−− 33 and ( ) ( ) φδ −≥−− 33 butq DDDu . In

order to induce unskilled people with stigma to take easy jobs, the government should set

parameters such that ( ) ( ) φδ −≥−+ 33 butq EEE

( )


stigma to be on welfare, the government should set parameters such that

( )3 butqu EEE ≤−+ 3δ . Taken together, in order to induce skilled people to take difficult jobs,

unskilled people with stigma to take easy jobs, and unskilled people without stigma to be on

welfare, the government should set parameters such that ( ) ( ) ( )33 butqu EEE ≤−+3bu ≤− δφ and

( ) DDD qutqu ( ) EEE t δδ −+ 3≥−− 3

( ) 333 1 btt ED

. The government’s budget constraint implies that

αα +−= .

Conditional on that ( ) ( ) EEEDDD tqutqu δδ −+≥−− 33 ,

( ) ( ) ( 333 butqubu EEE ≤−+≤− )δφ , and ( ) 33 1 btt ED 3αα +−= , in order to maximize the social

welfare function, the government should extract as much as possible from the skilled people

31

without inducing them to switch from difficult jobs to easy jobs. Therefore, the government

should set parameters such that ( ) ( ) EEEDDD tqutqu δδ −+=−− *3

*3 . It can be proved by

contradiction that ( ) ( )*3bu

*3Dt

*3tq EEE =−+ δ

*3E

u . Otherwise, if the government reduces b by a certain

amount, increases t by a certain amount, and reduces by a certain amount, the social

welfare can be improved. The budget constraint implies that

*3

( ) **3tEα +− 3bα*

3 1D =t . Thus, we

have proved the optimal conditions.

( ) ( )*3bu=*

3tqu EEE −+ δ ( ) ( ) Eδ−EE tq + *3DDD utqu δ =−− *

3

*3bα ( ) DDD tq δ−− *

3

( ) EδEE tq −+ *3

( ) ( ) ( )*3

*3 butqu EEE =−+ δ*

3tq DDD =−− δ ( Dqu=3

) ( ) ( )*3

*3 buEE =−δ*

3 tqu EDD +=−δ ( )( Dt *3Dqu −−1ub =*

3

) ) EEDD q−+− δδ*3 ( )( Dδ−DD tq − *

3uu= −1*3

) ) EEDD q−+− δδ*3 ( ) *

3*

3*

3 1 btt ED αα +−=

( ) ) ) ( )( )DDDEE tquuq δα −−+− − *3

11 .DDD tq δδ +−− *3

( ) ( ) φδ −<−+ 4tqu EE 4buE

With , , and

, the utility of the skilled people is ( ) *3

*3 1 tt ED α +−= u , the utility of unskilled

people with stigma is u , and the utility of unskilled people without stigma is

( )*3bu . Because u , we have ) DDt δ−− *

3SW .

Because ( tqu D − , we have )Dδ− and

(( DE tquut −= −1*3 . Substitute )b and

(( DE tquut −= −1*3 into , and we obtain

( ) ((D uut α−= −1*3


Lemma 4: It is impossible for the government to set parameters such that under equilibrium

unskilled people with stigma choose to be on welfare and unskilled people without stigma

choose to take easy jobs simultaneously.

Proof: In order to induce unskilled people with stigma to be on welfare, the government should

set parameters such that . In order to induce unskilled people without

stigma to take easy jobs, the government should set parameters such that

32

( ) ( 44 butqu EEE >−+ )δ . As ( ) ( ) φδ −<−+ 44 butq EEEu and ( ) 44 butqu EEE > ( )−+ δ cannot be

satisfied simultaneously, it is impossible for the government to set parameters such that under

equilibrium unskilled people with stigma choose to be on welfare and unskilled people without

stigma choose to take easy jobs simultaneously.

( ) ( ) φδδ −>−> EEqu− DDq α

*2

*3 DD t< 3SW 1SW 2SW 3SW

( )( )( ( )( )DDD tquu δα −−+ − *3

1EDDD tquu δδ −+−−− *

31

D =*3 1

( ) 00 = ( ) 001 =− ( ).1−u

( )( ( )( ) ) ( )( DDEE tquuq −>− − *3

1DDDE utquuq δ +−−> − *

31

EDDD tquu δδ +−−− *3

1

( ) EEqu δ< ( )1 −−E qu δ 0

( )( ) EE q−δDDD tquut −−< − δ*3

1*3 )( ) EEDD qt −+−− δδ*

1D uu= −1*1

*1

*3 DD tt <

( )( ) ( )(δ −<− −DDD tquu *

31

3≈ −DD quu 1*

3

( )( )φδ +− D2= −DD quut 1*

2

*2

*3 DD tt <

( ) ( φδδ −>>>− EDD quq 0 α

*1

*3 DD t< *

2*

3 DD t<


Lemma 5: If > 0u , and if is close to one, then we have t

and t , which further imply that is the largest among , , and .

*1

*3 DD t<

Proof: We have ( ) )Eqα−

( ).

t by Lemma

3. Because u and u is concave by assumption so that u and is convex,

we have

) ( )Dδ−δ− −1

because so that >E . Therefore, we have

D + . Because ( Dqt by Lemma 1,

it can be proved by contradiction that .

If α is close to one, we have )φδ +−− DDt *t .

Because − Dt * by Lemma 2, it can be proved by contradiction that

.

Thus, we have shown that if )−Eu , and if is close to one,

then we have t and . t

33

Because ( ) DDD tqu δ−− *11SW = by Lemma 1, ( ) DDD tquSW δ−−= *

22 by Lemma 2, and

( ) Dδ−3

2 3SW

DD tquSW −= *3

1SW SW

by Lemma 3, and imply that is the largest

among , , and .

*1

*3 DD tt < *

2*

3 DD tt < 3SW


Lemma 6: If ( ) ( ) φδδ −>−>>− EEDD ququ 0 , and if α is close to zero, then we have t ,

but we cannot compare t and , and we cannot compare and t , which further imply

that either or is the largest among , , and .

*1

*3 DD t<

*1D

*2Dt

3SW 2SW SW

*3Dt

3SW

*2D

1 2SW

Proof: In the proof of Lemma 5, we have shown that t . *1

*3 DD t<

By Lemma 1, we have ( ) ( ) EDEDDD tqutq δδ −+=−− *1

*1u . By Lemma 2, we have

( ) ( ) φδ −=−− *2

*2 DDDD tutqu . Although ( ) φδ −>− EEq

*1D

*2D

u , the utility contribution of

decreases with t . Therefore, we cannot compare t and t .

Eq

*1D

We have ( ) ( )( )( ) ( )( )DDDEEDDDD tquuqtquu δαδδα −−+−+−−− −− *3

1*3

1*3 1t = by Lemma

3. If α is close to zero, ( )( ) EEDDDD qtquu −+−−≈ − δδ*3

1*3t , which is equivalent to

( ) ( ) EDE tqu δ−+= *3DDqu δ−− Dt *

3 . Therefore, when α is close to zero, we cannot compare

and t , either.

*3Dt

*D2


22 by Lemma 2, and

( ) Dδ−3

3SW SW

DD tquSW −= *3

*3Dt

*2D

by Lemma 3, therefore t and the facts that we cannot compare

and t imply that either or is the largest among , , and .

*1Dt

1SW 2SW

*3D <

2 3SW


34

Lemma 7: If ( ) ( ) EEDD ququ δφδ −>−>>− 0 , and if α is close to one, then we have

, which further implies that . *1

*2

*3 DDD ttt << 12 SWSW >3SW >

Proof: We have ( ) ( ) EDEDDD tqutq δδ −+=−− *1

*1u by Lemma 1. Because and ( ) 00 =u ( ).u is

concave by assumption, and because ( ) EEqu δφ −>− by assumption, we have

( ) ( ) ( ) ( ) φδ −<−+< *1

*1 DEED tuqutuδ−+ *

1 EDE tqu . Thus, we have ( ) ( ) φ−D tutq δ <−*1 DD −*

1Du .

Because ( ) ( ) φδ −=− *2

*2 DDD tut−Dqu

*1

*2 DD tt <

by Lemma 2, it can be proved by contradiction that

.


1*3

1*3 1t = by Lemma

3. If α is close to one, then ( )( ) ( )( )φδδ +−−<−−≈ −−DDDDDDD tquutquu *

31*

31*

3t . Because

( )( )φδ +− DDt *2

*2

*3 DD tt <−= DD qut *

2−u 1 , it can be proved by contradiction that .

Thus, we have shown that if ( ) ( ) EEDD ququ δφδ −>−>>− 0 , and if α is close to one,

then we have t . *1

*2

*3 DDD tt <<


22 by Lemma 2, and

( ) Dδ−3*DDD tquSW −= *

3 by Lemma 3, t implies that . *1

*23 DD tt << 1SW23 SWSW >>


Lemma 8: If ( ) ( ) EEDD ququ δφδ −>−>>− 0 , and if α is close to zero, then we have t

and t , which further imply that is the largest among , , and .

*1

*2 DD t<

*3

*2 DD t< 2SW 1SW 2SW 3SW

Proof: In the proof of Lemma 7, we have shown that . *1

*2 DD tt <


1*3

1*3 1t = by Lemma

3. If α is close to zero, we have ( )( ) EEDDDD qtquut −+−−≈ − δδ*3

1*3 , which is equivalent to

35


*3 . Because ( ) ( ) φδ −=−− *

2*

2 DDDD tutqu by Lemma 2, we can

show that t following the same argument in the proof of Lemma 7. *3

*2 DD t<

( ) EED qu δφδ −>−>> 0

*1

*2 DD t< **

2 DD tt <

( )DD tqu − *11 ( )DD tquSW −−= *

22

( ) DDD tquSW δ−−= *33

1SW 2SW 3SW

*1Dt *

3*

2 DD tt <

( ) φδδ −>>− DDq 0 E α

( )*tq EEE −δ

( ) ) EEt δ−*DDD tq δ−*

*1 tED α +−

12** 12

12

13 =CCC

4** 12

12

11 =CCC

Thus, we have shown that if ( )Dqu − , and if α is close to zero,

then we have t and . 3

Because Dδ−SW = by Lemma 1, Dδ by Lemma 2, and

by Lemma 3, t and imply that is the largest

among , , and .

*2D < 2SW


Proposition 1: If ( )−> Equu , and if is close to one, then the


(10) ( )*bu=u + ,

(11) ( Equ +=u − , and

(12) ( )*t = . *bα

Proof: Because skilled people have three options (take a difficult job, take an easy job, or be on

welfare), unskilled people with stigma have two options (take an easy job, or be on welfare), and

unskilled people without stigma have two options (take an easy job, or be on welfare), there are

possible equilibrium outcomes in principle. However, it is impossible to be

optimal so that under equilibrium skilled people choose to take easy jobs or be on welfare

because the government will then have no resource to subsidize unskilled people. Thus, we only

need to consider possible equilibrium outcomes.

36

By Lemma 4, it is impossible for the government to set parameters such that under

equilibrium unskilled people with stigma choose to be on welfare and unskilled people without

stigma choose to take easy jobs simultaneously. Thus, we only need to consider three possible

equilibrium outcomes: (1) skilled people choose to take difficult jobs, and all unskilled people

choose to take easy jobs; (2) skilled people choose to take difficult jobs, and all unskilled people

choose to be on welfare; and (3) skilled people choose to take difficult jobs, unskilled people

with stigma choose to take easy jobs, and unskilled people without stigma choose to be on

welfare.

Conditional on the first possible equilibrium outcome, the maximum social welfare is

by Lemma 1. Conditional on the second possible equilibrium outcome, the maximum social

welfare is by Lemma 2. Conditional on the third possible equilibrium outcome, the

maximum social welfare is by Lemma 3.

1SW

2SW

3SW

If ( ) ( ) φδδ −>−>>− EEDD ququ 0 , and if α is close to one, then by Lemma 5, is

the largest among , , and . The optimal conditions in Lemma 3 are therefore the

overall optimal conditions. Thus, we have shown that if

3SW

1SW 2SW 3SW

( ) ( ) φδδ −>−>>− EEDD quq 0u , and if

α is close to one, the optimal conditions are given by ( ) ( )*bu=*tq EEE −+ δu ,

( ) ( ) EEED tqu δδ −+=− *DD tq − *u , and ( ) *tE

*bα+*tD 1 α−= . Under the optimal parameters,

skilled people choose to take difficult jobs, unskilled people with stigma choose to take easy

jobs, and unskilled people without stigma choose to be on welfare. The equilibrium is a

separating equilibrium.

Thus, we have proved Proposition 1.

37

Proposition 2: If ( ) ( ) φδδ −>−>>− EEDD quq 0u , and if α is close to zero, then the

equilibrium is either a separating equilibrium or a pooling equilibrium, and the optimal

conditions are given either by for the separating equilibrium:

(13) ( ) ( )** butq EEE =−δu + ,


(15) ( )*t = . **1 btED αα +−

or by for the pooling equilibrium:

(16) ( ) ( ) φδ −<−+ ** butq EEEu ,

(17) ( ) ( ) φδ −=− ** butq DDD −u , and

(18) * bt = . *D

Proof: Following the same argument as in the proof of Proposition 1, we only need to consider

three possible equilibrium outcomes: (1) skilled people choose to take difficult jobs, and all

unskilled people choose to take easy jobs; (2) skilled people choose to take difficult jobs, and all

unskilled people choose to be on welfare; and (3) skilled people choose to take difficult jobs,


choose to be on welfare.





1SW

2SW

3SW

If ( ) ( ) φδδ −>−>>− EEDD quq 0u , and if α is close to zero, then by Lemma 6, either

or is the largest among , , and . Therefore, either the optimal conditions 3SW 2SW 1SW SW2 3SW

38

in Lemma 3 or the optimal conditions in Lemma 2 are the overall optimal conditions. If

, then the optimal conditions are given by 23 SWSW > ( ) ( )** butq EEE =−+ δu ,

( ) ( ) EEED tqu δδ −+= *DD tqu −− * , and ( ) *** 1 btED αα +−=t . If , then the optimal

conditions are given by

3SW2SW >

( ) ( ) φ−< *buδ− E+ *tq EEu , ( ) ( ) φ−* ** btD =δ = buD−*D− tqDu , and .

( ) ( ) EEquDDq δφδ −>−>0>− α

( ) ( )*b*tq EE −δ uE =

( ) ( ) EEE tq δ−+ *

) E

D u=

*bα+

( Equ

DD tq δ−*

*1 tED α−

2SW

3SW

( ) DDq

1SW

δφδ −>−>>− 0 α

1SW>23 SWSW >


Proposition 3: If u , and if is close to one, then the


(19) u + ,

(20) u − , and

(21) ( )*t = .











If u , and if is close to one, then we have

by Lemma 7. Therefore, the optimal conditions in Lemma 3 are the overall

39

optimal conditions. Thus, we have shown that if ( ) ( EEDD ququ ) δφδ −>−>>− 0 , and if α is

close to one, then the optimal conditions are given by ( ) ( )*bu*tq EE +u , E =−δ

( ) ( ) EEEDDD tqutqu δδ −+=−− ** , and ( ) ** btE αα +* 1tD −= . Under the optimal parameters,

skilled people choose to take difficult jobs, unskilled people with stigma choose to take easy

jobs, and unskilled people without stigma choose to be on welfare. The equilibrium is a

separating equilibrium.

( ) ( )Equ EDDqu δδ −>>− α

( ) ) φ−*δ−+ *tq EE

( ) ) φ−*δ−*tq DD

*D

1SW

2SW

3SW


Proposition 4: If φ−>0 , and if is close to zero, then the

equilibrium is a pooling equilibrium, and the optimal conditions are given by:

(22) (< buEu ,

(23) (= buD−u , and

(24) * bt = .











40

If u( ) ( ) EEDD quq δφδ −>−>>− 0 , and if α is close to zero, then is the largest

among , , and by Lemma 8. Therefore, the optimal conditions in Lemma 2 are the

overall optimal conditions. Thus, we have shown that if

2SW

1 2SW 3SWSW

( ) ( Equ ) EDDqu δφδ −>−>>− 0 , and if

α is close to zero, the optimal conditions are given by ( ) ( ) φ−*δ−*E <+ butq EEu ,

( ) ( ) φδ −=− *buD** btD =−qD

*tDu , and . Under the optimal parameters, skilled people choose to

take difficult jobs, and all unskilled people choose to be on welfare. The equilibrium is a pooling

equilibrium.


41

REFERENCES

Akerlof, George A., “The Economics of ‘Tagging’ as Applied to the Optimal Income Tax,

Welfare Programs, and Manpower Planning,” American Economic Review, March 1978, 68,

8-19.

Akerlof, George A., and Rachel E. Kranton, “Economics and Identity,” Quarterly Journal of

Economics, August 2000, 715-753.

Akerlof, George A., and Rachel E. Kranton, “Identity and Schooling: Some Lessons for the

Economics of Education,” Journal of Economic Literature, December 2002, 40, 1167-1201.

Akerlof, George A., and Rachel E. Kranton, “Identity and the Economics of Organizations,”

Journal of Economic Perspectives, 2005, 19(1), 9-32.

Auerbach, Alan J., Laurence J. Kotlikoff, and Jonathan Skinner, “The Efficiency Gains from

Dynamic Tax Reform,” International Economic Review, 1983, 24(1), 81-100.

Corneo, Giacomo, “The Efficient Side of Progressive Income Taxation,” European Economic

Review, 2002, 46, 1359-1368.

Diamond, Peter A., “Optimal Income Taxation: An Example with a U-Shaped Pattern of Optimal

Marginal Tax Rates,” American Economic Review, 1998, 88(1), 83-95.

Fair, Ray C., “The Optimal Distribution of Income,” Quarterly Journal of Economics, 1971,

85(4), 551-579.

Ghez, Gilbert R., and Gary S. Becker, The Allocation of Time and Goods over the Life Cycle,

New York: Columbia University Press, 1975.

Hausman, J., “Labor Supply,” in H. Aaron and J. Pechman, eds. How Taxes Affect Economic

Behavior, Washington: Brookings, 1981.

42

43

Heckman, James, “Shadow Prices, Market Wages and Labor Supply,” Econometrica, 1974,

42(4), 679-694.

Ireland, Norman J., “Status-seeking, Income Taxation and Efficiency,” Journal of Public

Economics, 1998, 70, 99-113.

Ireland, Norman J., “Optimal Income Tax in the Presence of Status Effects,” Journal of Public

Economics, 2001, 81, 193-212.

MaCurdy, Thomas E., “An Empirical Model of Labor Supply in a Life-Cycle Setting,” Journal

of Political Economy, 1981, 89, 1059-1085.

Mirrlees, James A., “An Exploration in the Optimal Theory of Income Taxation,” Review of

Economic Studies, April 1971, 38, 175-208.

Moffitt, Robert, “An Economic Model of Welfare Stigma,” American Economic Review,

December 1983, 73(5), 1023-1035.

Rosen, Harvey S., “Taxes in a Labor Supply Model with Joint Wages-Hours Determination,”

Econometrica, 1976, 44, 485-580.

Rosen, Harvey S., Public Finance, 6th edition, New York: McGraw-Hill, 2002.

Sadka, Efraim, “On Income Distribution, Incentive Effects and Optimal Income Taxation,”

Review of Economic Studies, 1976, 43(2), 261-267.

Saez, Emmanuel, “Optimal Income Transfer Programs: Intensive versus Extensive Labor Supply

Responses,” Quarterly Journal of Economics, August 2002, 117, 1039-1073.

Seade, Jesus K., “On the Shape of Optimal Tax Schedules,” Journal of Public Economics, 1977,

7(2), 203-235.

Stern, Nichols H., “On the Specification of Models of Optimal Income Taxation,” Journal of

Public Economics, 1976, 6, 123-162.