stigma, optimal income taxation, and the optimal …webfac/auerbach/e231_f06/zhiyongan_jo… ·...
TRANSCRIPT
STIGMA, OPTIMAL INCOME TAXATION, AND THE
OPTIMAL WELFARE PROGRAM∗
ZHIYONG AN Department of Economics
University of California at Berkeley 549 Evans Hall - #3880
Berkeley, CA 94720-3880 Phone: 510-846-3153
Email: [email protected]
ABSTRACT
In this research I integrate the theory of welfare stigma (Moffitt, 1983) and the theory of optimal income taxation. I study optimal income taxation and an optimal welfare program within a unified framework, taking welfare stigma into account. In the framework, I assume that the government has two policy instruments: general income taxation and a welfare program. I assume that individuals are heterogeneous along two dimensions: wages (skill) and welfare stigma. Each individual is assumed to take the income tax schedule and the parameters of the welfare program as given and make labor supply decisions to maximize his utility. The government is assumed to choose both an optimal income tax schedule and an optimal welfare program so as to maximize a social welfare function subject to its own revenue budget constraint and individuals’ behavioral response to the income tax schedule and the parameters of the welfare program. Within the unified framework, I use both theoretical analysis and numerical simulation to answer some questions that are important for public policy. Theoretical analysis shows that: (1) it can be optimal for the government to offer both general income taxation and a welfare program; and (2) the more intensely people suffer from welfare stigma, the higher the welfare benefit should be. Numerical simulations show that: (1) it is optimal for the government to offer both a negative income tax schedule and a welfare program; and (2) the actual welfare program might be less generous than the optimal welfare program.
∗ I thank George A. Akerlof, Robert M. Anderson, Alan J. Auerbach, John M. Quigley, and Brian D. Wright for valuable discussions and advice. Comments from Kim M. Bloomquist are highly appreciated.
I. INTRODUCTION
In this research, I integrate the theory of welfare stigma (Moffitt, 1983) and the theory of
optimal income taxation. In order to do so, I study optimal income taxation and an optimal
welfare program within a unified framework, taking welfare stigma into account. Within the
unified framework, I answer the following questions: If we introduce welfare stigma into optimal
income taxation, how will it change the optimal income tax schedule? Given reasonable
benchmark parameters, what do the optimal income tax schedule and the optimal welfare
program look like? Should there be a negative income tax? How negative should it be? Should
there be a generous welfare program? How generous should it be? Whether US tax policy
actually differs from the optimum with benchmark parameters? The answers to these questions
are important for public policy.
In order to answer the above questions, I first build a general model. The framework of
the general model is as follows. Unlike the traditional literature on optimal income taxation,
which assumes the government has only one instrument, i.e., general income taxation, I assume
the government has two instruments, namely, general income taxation and a welfare program. In
addition, unlike the traditional literature on optimal income taxation, which assumes individuals
are heterogeneous only along one dimension, i.e., wages (skill), I assume individuals are
heterogeneous along two dimensions, namely, wages and welfare stigma. Each individual is
assumed to take the income tax schedule and the parameters of the welfare program as given and
make two decisions to maximize his utility: (1) whether to self-select into the general income
taxation or the welfare program; and (2) conditional on self-selecting into the general income
taxation, how much labor to supply. If an individual self-selects into the welfare program, then in
addition to the utility derived from income and leisure, he will also incur a utility loss due to
1
welfare stigma (Moffitt, 1983). However, if an individual self-selects into the general income
taxation, there is no extra utility loss, because it is generally believed that people do not attach
stigma to general income taxation, whether it is positive or negative1. Intuitively, other things
being equal, individuals with high welfare stigma tend to self-select into the general income
taxation, while individuals with low welfare stigma tend to self-select into the welfare program.
The government is assumed to choose both an optimal income tax schedule and an optimal
welfare program so as to maximize a social welfare function subject to its own budget constraint
and individuals’ behavioral response to the income tax schedule and the parameters of the
welfare program. My objective is to answer the questions raised above within this framework. As
far as I know, this is the first attempt to study optimal income taxation and an optimal welfare
program within a unified framework while taking welfare stigma into account.
I use two approaches to tackle this problem. First, I simplify the general model to do
theoretical analysis. This simpler model expands the model in Akerlof’s “tagging” paper (1978).
Although this model is simple, the key insight and the framework of the general model are
carried over so that the conclusions should hold in a richer economic context. Second, I specify
the general model with benchmark parameters and then do numerical simulations, following the
approach of Fair (1971). The benchmark parameters are chosen according to their normal values.
Theoretical analysis shows that it can be optimal for the government to offer both general
income taxation and a welfare program. The more intensely people suffer from welfare stigma,
the higher the welfare benefit should be.
Numerical simulations show that it is optimal for the government to offer both a negative
income tax schedule and a welfare program, which verifies the theoretical conclusion that it can
1 This is common assumption in the literature on optimal income taxation. It is implicit in Mirrlees and Fair’s modeling (Mirrlees, 1971; Fair, 1971).
2
be optimal for the government to provide both general income taxation and a welfare program. In
addition, my simulation results imply that the actual welfare program might be less generous
than the optimal welfare program.
The organization of this paper is as follows. Section II reviews the relevant literature. The
literature review serves as the background for my research. Section III presents the general
model. Section IV builds a simpler version of the general model and presents the theoretical
analysis. In Section V, I first describe the algorithm to solve the general model numerically.
Then I specify the parameters of the general model according to their normal values. Finally, I
report my simulation results. Section VI concludes.
II. LITERATURE REVIEW
Mirrlees (1971) and Fair (1971) broke ground with their research on optimal income
taxation. In their models, the government is assumed to have only one instrument at hand,
namely, general income taxation. Individuals are assumed to be heterogeneous only along one
dimension, i.e., wages (skill). Each individual is assumed to take the income tax schedule as
given and choose his labor supply to maximize his own utility. The government is assumed to
choose the optimal income tax schedule by maximizing a social welfare function subject to its
own revenue budget constraint and individuals’ behavioral response to the chosen tax schedule.
The basic trade-off captured in their models is between equity and efficiency. Intuitively, due to
progressive income taxation, the government can redistribute income from rich people to poor
people. The redistribution of income can improve distributional equity because poor people have
higher marginal utility than rich people. However, labor supply is elastic. The progressive
3
income taxation will change people’s labor supply and thus result in deadweight loss (efficiency
cost). Because Mirrlees (1971) assumed a continuum of unbounded skills in his model, it is very
difficult to reach general conclusions. In contrast, Fair (1971) assumed a limited discrete number
of individuals in his model, thus laying a good foundation for his numerical simulations. His
simulations show that the average tax rate should increase with income (skills). In other words,
the optimal tax schedule should be progressive. Therefore, Fair (1971) goes beyond Mirrlees
(1971). In terms of methodology, this paper is indebted to Fair.
Since then, the theory of optimal income taxation based on the original Mirrlees-Fair
framework has been considerably developed. For example, Sadka (1976) and Seade (1977)
showed that if the skill distribution is bounded above, the marginal tax rate should be zero at the
top. This conclusion intuitively makes sense: If the marginal tax rate that applies to the top
people is changed from a positive number to zero, the top people will work more so that his
welfare will increase. Meanwhile, tax revenue is not reduced. Thus, the change will result in a
Pareto improvement. Symmetrically, Seade (1977) showed that if everybody works and labor
supply is bounded above zero, then the bottom marginal tax rate should also be zero. Diamond
(1998) investigated a special case of the Mirrlees-Fair general optimal income taxation model
with quasi-linear utility preferences. Quasi-linear preferences imply that there is no income
effect. Diamond showed that in this case the optimal marginal tax rates follow a U-shaped
pattern. Akerlof (1978) showed that redistribution should not only be based on income, but also
on other observable characteristics such as age, female head of household, etc., that are
correlated with skills. The intuition is that by “tagging” the government can focus limited
resources on the “tagged” poor so that the total social welfare can be increased. Saez (2002)
focused his attention on optimal income transfers for low-income people. Unlike Mirrlees who
4
assumed that labor supply is only along the intensive margin, Saez modeled labor supply
responses of low-income people along two dimensions: (1) the extensive margin (whether to
supply labor or not); and (2) the intensive margin (intensity of work after deciding to supply
labor). His analysis shows that the nature of labor supply response is very important for the
optimal tax schedule: (1) if the extensive margin dominates, then the optimal transfer program is
characterized by a low guaranteed income and subsidies for wage income, similar to the Earned
Income Tax Credit (EITC); and (2) if the intensive margin dominates, then the optimal transfer
program is characterized by a high guaranteed income and high phase-out tax rate, similar to a
classical Negative Income Tax (NIT).
Especially interesting to me, several recent papers attempt to introduce psychological
concepts into the study of optimal income taxation. For example, Ireland (1998, 2001)
introduced status-seeking into the study of optimal income taxation. Status-seeking means that
an individual cares about the spectators’ view of his utility. If an individual cares about the
spectators’ view of his utility, he needs to send out signals to reveal his status. In order to send
out signals, people are driven to over-consume some positional or visible goods. An extreme
case is to “burn money” in public. In order to meet this extra expenditure needs due to status-
seeking, people’s labor supply decisions are distorted, i.e., they have to over-supply labor and
under-consume leisure compared with the no-status-seeking equilibrium. As income taxes act as
a disincentive to supply labor and thus can be used to offset the distortion to labor supply caused
by status-seeking, status-seeking adds one more reason to tax income and implies that the
optimal tax schedule should be steeper and thus redistribution will be increased. Corneo (2002)
studied optimal income taxation when people care about their relative rank in the distribution of
income. If people care about their relative rank in the distribution of income, they have an extra
5
incentive to climb the “income ladder” and thus contribute to an over-supply of labor. In other
words, people’s caring about their relative rank in the distribution of income distorts their labor
supply decision. Meanwhile, when an individual climbs the “income ladder”, he exerts a
negative externality on some other people because those people’s relative ranks have worsened
and thus those people’s utilities are reduced due to his climbing. Therefore, an income tax may
improve efficiency for the same reasons that a Pigouvian taxation does. In other words, if people
care about their relative rank in the distribution of income, introducing a progressive income tax
can result in a Pareto improvement: On one hand, the negative externality can be corrected. On
the other hand, the distorted labor supply can be offset.
But so far no research on optimal income taxation has taken into account the stigma
factor (Moffitt, 1983). In order to account for the low take-up rate of welfare programs2, Moffitt
proposed the idea of stigma, “disutility arising from participation in a welfare program per se.”3
He said that there might be both a “flat” stigma that arises from the participation in welfare
programs itself and a “variable” stigma that varies with the size of the benefit. The “flat” stigma
and the “variable” stigma have different implications for people’s participation decisions. If only
a “flat” stigma exists, then an individual will participate in welfare programs if the utility gain
from the welfare benefit is greater than the utility loss due to the “flat” stigma. If only a
“variable” stigma exists, then the individual will only participate if his utility increases from the
welfare benefit. However, Moffitt’s empirical estimation shows that stigma appears to arise
mainly from the flat component, i.e., a fixed cost associated with participation in the program. 2 According to Moffitt (1983), the participation rate in the Aid to Families with Dependent Children (AFDC) was estimated to be only about 69 percent in 1970 and the participation rate in the Food Stamp Program was only 38 percent. 3 It is interesting to note that the concept of “stigma” can also be described using the terminology of “identity” (Akerlof and Kranton, 2000, 2002, and 2005): (1) people are not identified with welfare programs; (2) the ideal behavior associated with this nonidentification is nonparticipation in welfare programs; and (3) if people participate in welfare programs, they will lose an extra utility because their actual behavior deviates from the ideal behavior.
6
On the other hand, it is generally believed that people do not attach stigma to general
income taxation, whether it is positive or negative.
Clearly, Moffitt’s work shows that stigma plays an important role in people’s utility
functions and thus distorts people’s labor supply decisions and welfare. Because stigma plays an
important role in people’s utility functions and their labor supply decisions, it must also have
important implications for the optimal income tax schedule.
The above literature review motivates me to integrate the theory of welfare stigma and
the theory of optimal income taxation. Optimal income taxation and an optimal welfare program
are thus studied within a unified framework. The following questions are answered within this
unified framework: If we introduce stigma into optimal income taxation, how will it change the
optimal income tax schedule? Given reasonable benchmark parameters, what do the optimal
income tax schedule and the optimal welfare program look like? Should there be a negative
income tax? How negative should it be? Should there be a generous welfare program? How
generous should it be? Whether US tax policy actually differs from the optimum with benchmark
parameters? Clearly, the answers to these questions have important implications for public
policy.
III. THE GENERAL MODEL
In my model, the government is assumed to have two policy instruments at hand: (1)
general income taxation and (2) a welfare program. This assumption is different from that of the
traditional literature on optimal income taxation, which assumes that the government has only
general income taxation as its policy instrument. The general income taxation is assumed to be a
7
linear income tax schedule and is characterized by two parameters ( )tk, , where the parameter k
is a lump sum tax and the parameter t is a constant marginal tax rate. The welfare program is
characterized by a single parameter: the welfare benefit . If an individual enrolls in the welfare
program, he will get welfare benefit b . Thus, the government chooses three parameters: , ,
and b .
b
k t
An individual is characterized by two parameters: (1) his wages and (2) his welfare
stigma
iw
iφ . His welfare stigma is characterized by a single parameter iφ . This is consistent with
Moffitt’s empirical estimation, which shows that welfare stigma appears to arise mainly from the
flat component, i.e., a fixed cost associated with participation in the welfare program. { }iw and
{ }iφ are generated from two random distributions. In other words, I assume that individuals are
heterogeneous along two dimensions: (1) wages and (2) welfare stigma. This assumption is also
different from that of the traditional literature on optimal income taxation, which assumes that
individuals are heterogeneous only along one dimension, i.e., wages. I assume that the size of the
population is . N
An individual who is characterized by ( )iiw φ, is assumed to take the linear income tax
schedule and the welfare program parameter b as given and make two decisions to
maximize his utility. First, he must decide whether to self-select into the welfare program or the
general income taxation. If he decides to enroll in the welfare program, he does not supply any
labor and his only income is the welfare benefit b . Thus, his utility is given by
( tk, )
( ) ii Lbf φ−= ,U ,
where L is his endowment of time. Note that here I take Moffitt’s empirical estimation into
account. His estimation shows that welfare stigma is a fixed cost associated with participation in
the welfare program. Thus I subtract iφ from the individual’s utility ( )Lbf , generated from
8
income and leisure b L . His leisure is his endowment of time L because his labor supply is
zero if he enrolls in the welfare program. On the other hand, if the individual decides to self-
select into the general income taxation, he must make a second decision: how much labor to
supply. Conditional on self-selecting into the general income taxation, he chooses his optimal
labor supply by solving the following optimization problem: { }
( )( )iiiLLLkLwtf
i
−−− ,1max ,
where is his labor supply, ( )iL kLwt ii −−1 is his after-tax income, and iLL −
( )iiw
is his leisure.
Following the tradition, I assume individuals do not attach stigma to general income taxation,
whether it is negative or positive. Taken together, an individual with parameters φ, is
assumed to take as given and solve the following optimization problem to maximize his
utility:
( btk ,,
{ }
)
( )( ) ( )
−−−− iiii LbfLLkLwt φ,,,max
( )
Lf
i
1max
{ }
. For convenience, I denote
( ) ( )
−−− iiiiL
LbfLLkLwt φ,,,max −fi
1=iU max max .
( )maxmaxmax2
max ...,,...,,, Ni UUU
( ) itwnomeTaxatioGeneralInci1
(*:1(
( )n
i
1USW
=
N
i
omeTaxatio
( b
i
)
Welfarei :
( Welfare:
GeneralInci :1
The government is assumed to choose both an optimal income tax schedule and an
optimal welfare program, i.e., an optimal set of parameters ( )btk ,, , so as to maximize a social
welfare function subject to its own budget constraint and
individuals’ behavioral response to the income tax schedule and the welfare program parameters.
I assume that the government must balance its budget. The government’s budget constraint can
thus be written as ∑ ∑ , where
is an indicator function that is equal to 1 if individual chooses
general income taxation and to 0 if individual chooses welfare, and 1 is also an
indicator function that is equal to 1 if individual i chooses welfare and to 0 if individual i
=
=N
i 11
i
+i kL )) )
9
chooses general income taxation. is the
government’s net tax income, and ∑ is the government’s total welfare expense.
( )∑=
+N
iii kLtwnomeTaxatioGeneralInci
1))(*:1(
( )bWelfare:
k
( )
( )
=
N
ii
11
{ }( ) ( )
( ) ( )( ) ∑=
=+
−−−−
N
iii
iiii
N
bWelfareikLtwnomeTaxatio
N
LbfLLkLwtf
U
1
max
:1*
,,,,1
...,
φ
L
i
GeneralInc
U
i
max
...,,2,
max
,
In summary, the government needs to find the optimal set of parameters by
solving the following optimization problem:
( )bt,,
{ }
∑=
=
=
N
i
i
btk
iB
i
UAts
UUSW
1
max
max2
max1,,
:1)(
1
max)(..
...,,,max
( )
where constraint (A) represents individuals’ behavioral response to the income tax schedule and
the welfare program parameters, and constraint (B) is the government’s budget constraint. { }iw
and { }iφ are generated from two random distributions.
IV. THEORETICAL ANALYSIS
In Section III, I have presented the general model. However, Stern (1976) suggests that it
is very difficult to explicitly solve an optimal income taxation model even if people assume a
linear tax schedule4. Like Stern, I also assume a linear tax schedule in the general model. In
principle, my model is more complicated than Stern’s model because I assume that the
government has two instruments instead of one and I assume that individuals are heterogeneous
4 Stern assumed a linear tax schedule in his optimal income taxation model. However, he did not do any theoretical analysis. Instead, he resorted to a numerical method to solve his model.
10
along two dimensions instead of one. Therefore, in order to make the problem tractable and
solvable so that theoretical analysis is feasible, I simplify the general model in this section. This
simplified model is a variant of the model in Akerlof’s “tagging” paper (1978). In his model,
Akerlof assumed that there are two types of labor: skilled labor and unskilled labor. Skilled labor
can take both difficult and easy jobs, whereas unskilled labor can only take easy jobs. With only
two types of labor, Akerlof could derive explicit solutions. In addition, although there are only
two types of labor, the key insight and the general framework of Mirrlees-Fair are carried over so
that his conclusions hold in a richer economic context. This model therefore is a useful starting
point.
I expand Akerlof’s model as follows. First, I assume one-half of the population is made
up of skilled people and the other one-half of the population is made up of unskilled people. I
assume all the skilled people have welfare stigma φ . However, I assume ( )α−1 of the unskilled
people have welfare stigma φ while α of the unskilled people have no welfare stigma at all. In
summary, I assume there are three types of people: (1) skilled people with stigma (21 of the
whole population); (2) unskilled people with stigma (2
1 α− of the whole population); and (3)
unskilled people without stigma (2α of the whole population).
Skilled people have three options: (1) take a difficult job; (2) take an easy job; or (3) be
on welfare. These people’s utility function is given by
( ) ( ) ( ){ }φδδ −−+−− butqutqu EEEDDD ,,max . I assume ( ) 00 =u , ( ) 0.' >u , and ( ) 0." <u . If a
skilled person takes a difficult job, his output is and his disutility of doing the job is Dq Dδ . As
the economy is assumed to be competitive, his pre-tax income is also . His utility depends on Dq
11
his after-tax income and his disutility of doing the difficult job and is given by ( ) DDD tqu δ−− . I
assume ( ) 0>− DDqu δ . I assume if an individual takes an easy job, whether he is a skilled
person or an unskilled person, his output is and thus his pre-tax income is . His disutility
of doing the easy job is
Eq Eq
Eδ whether he is skilled or unskilled. Thus his pre-tax utility is given by
( )Equ Eδ− and his after-tax utility is given by ( ) EEE tqu δ−+ . I assume ( )Equ 0<− Eδ . Finally,
if a skilled person chooses to be on welfare, his income is the welfare benefit b and he does not
supply any labor. However, because he attaches stigma φ to being on welfare by assumption, his
utility is given by ( ) φ−bu . I assume if a skilled person is indifferent between taking a difficult
job, taking an easy job, and being on welfare, then he chooses to take the difficult job. I assume
if a skilled person is indifferent between taking an easy job and being on welfare, then he
chooses to take the easy job.
( ){ }δ−+ tqu EEmax
( EE tq +
u
( ) ( ){ }buE ,qumax
Unskilled people with stigma have two options: (1) take an easy job or (2) be on welfare.
These people’s utility function is given by ( ) φ−buE , . If an unskilled person
with stigma takes an easy job, his utility is given by ) Eu δ− . If an unskilled person with
stigma chooses to be on welfare, his utility is given by ( ) φ−b . I assume if an unskilled
individual with stigma is indifferent between taking an easy job and being on welfare, then he
chooses to take the easy job. In this model, I follow Akerlof (1978) and assume that unskilled
people cannot work at difficult jobs. An alternative but equivalent assumption is that they can
work at difficult jobs but that the disutility of doing so is so high that in equilibrium they will not
choose to.
Unskilled people without stigma have two options: (1) take an easy job or (2) be on
welfare. These people’s utility function is given by tEE δ−+ . If an unskilled
12
person without stigma takes easy jobs, his utility is given by ( ) EEE tqu δ−+ . If an unskilled
person without stigma chooses to be on welfare, his utility is given by ( )bu because he does not
attach stigma to being on welfare. I assume if an unskilled individual without stigma is
indifferent between taking an easy job and being on welfare, then he chooses to be on welfare.
( ){ }
−buE ,, φδ
tt ED ,,
(q
) φδ −>− EE
α
The social welfare function is given by:
( ) ( )( ) ( ){ }( ) ( ){ }
−+−−+
−+−−=
butqubutqu
tqutquSW
EEE
EEE
EEDDD
,max,,max
,maxmin
δφδ
δ5.
Each individual is assumed to take ( )btt ED ,, as given and make optimal labor supply
decisions to maximize his utility. The government is assumed to choose ( )b to maximize
subject to its own revenue budget constraint and individuals’ behavioral response to
.
SW
ttD ,( )bE ,
Throughout the theoretical analysis, I assume that under equilibrium it is optimal for an
individual to stay in the “system”. In other words, we assume ) 0* >−− DDD t δu ,
( ) 0* >−+ EEE tqu δ , and ( ) 0* >−φbu . Zero is the utility outside the “system”.
Because ( ) 0>− DDqu δ and ( ) 0<− EEqu δ by assumption, there are two possible cases
for this problem: (1) ( ) (δ >>− D qu0Dqu ; and (2)
( ) EDDqu ( )Equ δφδ −>−>>− 0 .
Proposition 1: If ( ) ( ) φδδ −>−>>− EEDD quq 0u , and if is close to one, then the
equilibrium is a separating equilibrium, and the optimal conditions are given by:
5 The standard social welfare function assumed in the optimal income taxation literature is a “sum” weighted by population. The “min” social welfare function is chosen so that it is feasible to derive the optimal solutions to the model. One justification for the “min” social welfare function is that it turns out that the intuition underlying the key conclusions does not depend on this specific assumption.
13
(1) ( ) ( )** butq EEE =−δu + ,
(2) ( ) ( ) EEEDDD tqutq δδ −+=− **u − , and
(3) ( )*t = . **1 btED αα +−
Proof: See Appendix 1.
Proposition 2: If ( ) ( ) φδδ −>−>>− EEDD quq 0u , and if α is close to zero, then the
equilibrium is either a separating equilibrium or a pooling equilibrium, and the optimal
conditions are given either by for the separating equilibrium:
(4) ( ) ( )** butq EEE =−δu + ,
(5) ( ) ( ) EEEDDD tqutq δδ −+=− **u − , and
(6) ( )*t = , **1 btED αα +−
or by for the pooling equilibrium:
(7) ( ) ( ) φδ −<−+ ** butq EEEu ,
(8) ( ) ( ) φδ −=− ** butq DDD −u , and
(9) * bt = . *D
Proof: See Appendix 1.
Proposition 3: If ( ) ( ) EEDD ququ δφδ −>−>>− 0 , and if α is close to one, then the
equilibrium is a separating equilibrium, and the optimal conditions are given by:
(10) ( ) ( )** butq EEE =−δu + ,
(11) ( ) ( ) EEEDDD tqutq δδ −+=− **u − , and
(12) ( )*t = . **1 btED αα +−
Proof: See Appendix 1.
14
Proposition 4: If ( ) ( ) EEDD ququ δφδ −>−>>− 0 , and if α is close to zero, then the
equilibrium is a pooling equilibrium, and the optimal conditions are given by:
(13) ( ) ( ) φδ −<−+ ** butq EEEu ,
(14) ( ) ( ) φδ −=− ** butq DDD −u , and
(15) * bt = . *D
Proof: See Appendix 1.
Proposition 5: For both cases, whether α is close to one or close to zero, it can be optimal for
the government to provide both general income taxation and a welfare program.
Proof: Proposition 5 is obvious from Propositions 1, 2, 3, and 4.
Proposition 5 intuitively makes sense. As the government has two concerns (namely,
income redistribution and welfare stigma) to address, the government should use two
instruments.
Proposition 6: For both cases, if α is close to zero, then the equilibrium is likely to be a pooling
equilibrium so that all the unskilled people are on welfare, and the optimal conditions are given
by:
(16) ( ) ( ) φδ −<−+ ** butq EEEu ,
(17) ( ) ( ) φδ −=− ** butq DDD −u , and
(18) * bt = . *D
Proof: Proposition 6 is obvious from Propositions 2 and 4.
Proposition 6 also intuitively makes sense. As α decreases, the unskilled people’s
capability of “producing” utility is decreasing on average, which implies that the burden on the
skilled people will increase. Therefore, conditional on a separating equilibrium, as α decreases,
15
*Dt will increase. When α decreases enough, t will increase enough so that skilled people will
switch from difficult jobs to easy jobs. In order to prevent skilled people from switching from
difficult jobs to easy jobs, when
*D
α is small enough, the optimal condition is changed from
( ) DDD t δ−− *qu = ( )*bu under a separating equilibrium to ( ) ( ) φδ −=−− ** butqu DDD under a
pooling equilibrium so that the government can extract more from the skilled people while still
keeping the skilled people taking difficult jobs.
0*
>φ
( ) φ−*b+ t ( ) δ =−− * utqu DDD
) ( ) φδ −=− ** bub D
0>
φ
Proposition 7: Conditional on a pooling equilibrium so that all the unskilled people are on
welfare, then ddb .
Proof: If the equilibrium is a pooling equilibrium so that all the unskilled people are on welfare,
then the optimal conditions are given by ( ) δ <−* uqu EEE , ( ) φ−*b ,
and . The optimal conditions imply that ** btD = ( −qu D , which further
implies that *
φddb .
In summary, Proposition 5 says that it can be optimal for the government to offer both
general income taxation and a welfare program. Proposition 6 says that if most of the unskilled
people suffer from welfare stigma (i.e., if α is close to zero), then it is likely to be optimal for
the government to put all the unskilled people on the welfare program (i.e., the equilibrium is
likely to be a pooling equilibrium so that all the unskilled people are on welfare). In addition,
conditional on a pooling equilibrium so that all the unskilled people are on welfare, Proposition 7
says that the more intensely people suffer from welfare stigma (i.e., the higher ), the higher the
welfare benefit should be (i.e., the higher ). *b
16
V. SPECIFICATION OF THE GENERAL MODEL AND NUMERICAL SIMULATIONS
The theoretical analysis of the simpler model in Section IV shows that it can be optimal
for the government to provide both general income taxation and a welfare program. I expect this
conclusion will carry over to the simulation analysis of the more complex general model.
However, the theoretical analysis cannot answer the following important questions: (1) given
reasonable benchmark parameters, what do the optimal income tax schedule and the optimal
welfare program look like?; (2) should there be a negative income tax?; (3) how negative should
it be?; (4) should there be a generous welfare program?; (5) how generous should it be?; and (6)
whether US tax policy actually differs from the optimum with benchmark parameters? But
answering these questions is very important for public policy. In this section, I answer these
questions by doing numerical simulations.
Recall that the government needs to solve the following optimization problem to find
both the optimal income tax schedule and the optimal welfare program, i.e., the optimal set of
parameters . ( )btk ,,
{ }( )
{ }( )( ) ( )
( ) ( )( ) ∑∑==
=+
=
−−−−=
N
i
N
iii
iiiiLi
Nibtk
bWelfareikLtwnomeTaxatioGeneralInciB
Ni
LbfLLkLwtfUAts
UUUUSW
i
11
max
maxmaxmax2
max1,,
:1*:1)(
...,,2,1
,,,,1maxmax)(..
...,,...,,,max
φ
( )
where constraint (A) represents individuals’ behavioral response to the income tax schedule and
the welfare program parameters, and constraint (B) is the government’s budget constraint. { }iw
and { }iφ are generated from two random distributions.
17
In this section, I first describe the algorithm to solve the general model numerically. Then
I specify the general model with reasonable benchmark parameters according to their normal
values. Finally, I report my simulation results.
V.A. The Algorithm
The algorithm to solve the general model is as follows.
Step 1: Choose a set of parameters ( )btk ,, .
Step 2: Taking the chosen set of parameters ( )btk ,, as given, I solve constraint (A) to get
each individual’s optimal labor supply: (1) whether to self-select into the general income
taxation or into the welfare program; and (2) conditional on self-selecting into the general
income taxation, how much labor to supply. I also calculate each individual’s optimal utility
taking ( as given. )btk ,,
Step 3: With the chosen set of parameters ( )btk ,, and each individual’s labor supply, I
can check whether constraint (B) is satisfied or not. If constraint (B) is satisfied, then the chosen
set of parameters is feasible and I keep it. I also keep each individual’s optimal labor
supply and optimal utility. If constraint (B) is not satisfied, then the chosen set of parameters
is infeasible and I discard it.
( btk ,, )
)( btk ,,
Step 4: By three-dimension grid searching, I can iterate Step 1 ~ Step 3 to find all the
feasible sets of parameters.
Step 5: For each feasible set of parameters, I calculate the government’s objective value
because I have calculated each individual’s optimal utility.
18
Step 6: By comparing the objective values, I can find the set of parameters ( that
generates the largest objective value. This set of parameters
)btk ,,
( )btk ,, is the solution to the
government’s optimization problem.
V.B. Specification of the General Model
I need to specify the general model from eight aspects: (1) the size of the population ;
(2) the endowment of time
N
L ; (3) the distribution of wages { }iw ; (4) the form of the individual
utility function ( )ii LLYf −, ; (5) the form of the social welfare function
( )maxmax ...,, Ni UUmax2
max1 ...,,, UUSW ; (6) the distribution of welfare stigma { }iφ ; (7) the
correlation coefficient between wages and welfare stigma; and (8) the government’s budget
constraint.
The Size of the Population
Fair (1971), in his simulation, essentially used 50 individuals to approximate a random
distribution. Since then, computing power has increased greatly. Therefore, I assume that there
are 500 individuals in my numerical simulations. I think that it might be sufficient for me to use
500 individuals to approximate the lognormal distribution of wages { }iw and the lognormal
distribution of welfare stigma { }iφ . Thus I assume the size of the population in my
numerical simulations.
500=N
The Endowment of Time
19
I am considering weekly time endowment. Thus, the time endowment L is equal to
24*7=168 hours. Thus I take 168=L in my numerical simulations.
The Distribution of Wages
According to the 2003 Current Population Survey (CPS), mean weekly work time is
around 39 hours. The mean annual wages or salary earned is $33,612. The standard deviation of
annual wages or salary earned is $43,636. If there are 52 weeks in one year, then the mean
weekly wages or salary is 646$52612,33$ ≈ and the standard deviation of the weekly wages or
salary is 839$52636,43 ≈$ . The mean hourly wages or salary is ( ) 6.16$39*52612,33 ≈$ and
the standard deviation of the hourly wages or salary is ( ) 5.21$39*52636,43$ ≈ .
Thus, I assume follows a lognormal distribution with mean 16.6 and standard
deviation 21.5 in my numerical simulations.
{ }iw
The Form of the Individual Utility Function
Stern (1976) specified a Constant Elasticity of Substitution (CES) utility function in his
simulation. That is, he assumed ( ) ( ) ( )( ) 1111,
−−−
−−+=−
εε
εε
εε
γγ iiii LLYLLYf , where Y is
individual ’s income and
i
i iLL − is his leisure. Auerbach et al. (1983) also specified a CES
utility function in their simulation. Although Fair (1971) specified a Cobb-Douglas utility
function in his simulation, a Cobb-Douglas utility function is a special case of a CES utility
function with 1=ε . Adopting the convention, I also specify a CES utility function in my
20
simulation. One advantage of specifying a CES utility function is that I can explicitly solve each
individual’s optimal labor supply.
I need to further specify ε and γ . The parameter ε is individuals’ elasticity of
substitution between income and leisure. The calculations by Stern (1976) show that ε is around
0.5. His most favored specification is 4.0=ε . Based on the work by Ghez and Becker (1975),
Heckman (1974), Rosen (1976), MaCurdy (1981) and Hausman (1981), Auerbach et al. (1983)
specified 8.0=ε in their simulation. Because Fair (1976) specified a Cobb-Douglas utility
function, he essentially specified 1=ε in his simulation. As a compromise, I specify 5.0=ε in
my benchmark simulation.
The parameter γ measures how much individuals value income relative to leisure. It
depends on the choice of labor units. Because I take the time endowment 168=L
168/
hours per
week in my simulations, and because the average labor supply is about 40 hours per week, this
implies that people spend about 25 percent of their time endowment working ( 40 25.0≈ ).
In my benchmark simulation, the parameter γ is set so that in the absence of government
intervention the individual with the mean wages ($16.6 per hour) would work for 0.25 of the
time endowment in the case of 5.0=ε . This suggests a value of 65.0≈γ . Thus, I specify
65.0=γ in my benchmark simulation.
Therefore, if an individual with wages and welfare stigma iw iφ decides to enroll in the
welfare program, his utility is given by ( ) ( )[ ] ii Lb φ+=−− 11 *35.0*0U because in this case
his income is the welfare benefit b , his leisure is his endowment of time
−−1
65.
L , and he incurs an
extra utility loss iφ due to his welfare stigma (Moffitt, 1983). If this individual decides to self-
select into the general income taxation and supply labor , then his utility is given by iL
21
( )( ) ( )[ 111 *35.01*65.0−−− −+−−= iiii LLkLwtU
( ) kLwt ii −−1
] because in this case his after-tax income is
, his leisure is iLL − , and there is no stigma attached to general income taxation.
iw ,
{ }( )( ) (
Li
.0maxmax − −i k 1
( )=iU max max − i Lwt
U max2 ,
U max2 ,
max2 ...,,
Thus, an individual with parameters ( )iφ is assumed to take as given and
solve the following optimization problem to maximize his utility:
( btk ,,
)[ ] ( ) ( )[ ]−−L
1*
−++− −
ii bLLwt φ1
11
35.0*65.0,*35.01*65 −−−
iL1
.
)
And I denote
{ }( ) ( )[ ] ( ) ( )[ ]
−+−+−
−−−−−−
iiiLLbLLk
i
φ111
111 *35.0*65.0,*35.01*65.0max
.
The Form of the Social Welfare Function
Fair (1971) argued that if people were given an opportunity to choose the social welfare
function, it is likely that many, if not most, would prefer one that has equal weights for all.
Therefore, Fair proposed two social welfare functions:
(1) ( ) ∑=
=N
iiNi UUUUSW
1
maxmaxmaxmax1 ...,,...,, ,
i.e., the social welfare is the sum of individuals’ utilities; and
(2) ( ) ∏=
=N
iiNi UUUUSW
1
maxmaxmaxmax1 ...,,...,, ,
i.e., the social welfare is the product of individuals’ utilities.
Fair used the second social welfare function in his numerical simulations. I follow Fair
and assume that in my numerical simulations. ( ∏=
=N
iiNi UUUUUSW
1
maxmaxmaxmax1 ...,,, )
22
The Distribution of Welfare Stigma
Moffitt’s estimation (1983) shows that stigma appears to arise mainly from the flat
component, i.e., a fixed cost associated with participation in the program. His estimation shows
that the mean of the flat stigma is around 0.65. This implies that on average, if we multiply the
income by 1.72 and divide the labor supply by 1.72 simultaneously, we can compensate for the
utility cost of stigma. Moffitt’s estimation also shows that the ratio of the standard deviation to
the mean of flat stigma is around 0.14.
Because the mean weekly wages or salary is around $646 and the mean weekly work
time is around 39 hours, if I apply Moffitt’s results in my CES utility function, then the mean of
stigma employed in my simulation is around
( ) ( )[ ] ( ) ( )[ ] 87.6439168*35.0646*65.072.1/39168*35.0646*72.1*65.0111111 ≈−+−−+−−−−−−
08.914.0*87.64
.
The standard deviation of stigma is around ≈ .
In summary, I assume { }iφ follows a lognormal distribution with mean 64.87 and
standard deviation 9.08 in my numerical simulations.
The Correlation Coefficient between Wages and Welfare Stigma
There is little direct empirical evidence on the correlation coefficient between wages and
welfare stigma. But it seems that there is a positive correlation between them. Thus, I arbitrarily
take the correlation coefficient between wages and welfare stigma to be 0.5 in my numerical
simulations.
The Government’s Budget Constraint
23
The actual budget constraint is made slightly more complicated than the one specified
with constraint (B). I assume that 15% of the total output under no government interference (i.e.,
no income tax and no welfare) is used to run the government. The number 15% is chosen by
referring to government spending in the United States6. Let be individual i ’s optimal labor
supply under no government interference. With the CES utility function, it can be shown that
freeiL
−
+
−
=εε
γγ
γγ
11i
iifree
iw
ww
LL , where 5.0=ε , 65.0=γ , and 168=L
>freeiL1
. Then, the cost to
run the government can be written as , where ( )∑=
N
i
freeii Lwgov
1)0(=ense 0 **15.exp_
( )01 >freeiL is an indicator function that is equal to 1 if and to 0 if otherwise. Thus, the
actual budget constraint used in my numerical simulation is
0>freeiL
( ) ( ) ( )( )∑∑ ∑== =
>+=+N
i
freeii
freei
N
i
N
iii LwLbWelfareikLtwnomeTaxatioGeneralInci
11 1*01*15.0:1))(*:1(
instead of
( )∑ ∑= =
=+N
i
N
iii bWelfareikLtwnomeTaxatioGeneralInci
1 1:1))(*:1( .( )
V.C. Simulation Results
My simulation based on the benchmark parameters shows that the optimal k , , and
are around -$170 per week, 0.32, and $387 per week, respectively. About 35 percent of the
individuals choose to be on welfare. Annually, the welfare benefit is around $20K given there
t b
6 According to Rosen (2002), the federal government expenditure as a percentage of gross domestic product (GDP) was about 15 percent in 1999 and the government expenditure (federal, state, and local) as a percentage of GDP was about 25 percent.
24
are 52 weeks in one year. Because is negative, the benchmark simulation shows that the
government should provide both a negative income tax schedule and a welfare program. Because
the optimal weekly welfare benefit is around $387, the benchmark simulation implies that the
actual welfare program might be less generous than the optimal welfare program
k
7.
t
In the benchmark simulation, I specify 5.0=ε . In order to check the robustness of the
results, I run one more simulation around 5.0=ε . I find that the basic conclusions are
qualitatively robust: (1) the government should provide both a negative income tax schedule and
a welfare program; and (2) the actual welfare program might be less generous than the optimal
welfare program. In more detail, when 4.0=ε (Stern’s most favored specification (1976)), my
simulation shows that the optimal , , and are around -$87 per week, 0.36, and $425 per
week, respectively. About 45 percent of the individuals choose to be on welfare.
k b
The results of the sensitivity analysis seem to intuitively make sense. When ε is
increased from 0.4 to 0.5, the labor supply becomes more elastic. Therefore, the marginal tax rate
is decreased from 0.36 to 0.32 so as to encourage labor supply. In addition, is decreased
from -$87 to –$170 and b is reduced from $425 to $387 so as to encourage switching from the
welfare program to general income taxation. Individuals choosing to be on welfare as a
percentage of the whole population is reduced from about 45 percent to about 35 percent.
t k
In summary, the numerical simulations show that it is optimal for the government to
provide both a negative income tax schedule and a welfare program, which verifies the
theoretical conclusion that it can be optimal for the government to provide both general income
7 The actual welfare benefit varies by family status, income level, state, and program. A good solid description of the key programs, Food Stamp Program, Temporary Assistance for Needy Families (TANF), Medicaid, and etc., is in the Green Book available online at http://waysandmeans.house.gov/Documents.asp?section=813, which shows benefits level.
25
taxation and a welfare program. In addition, the numerical simulations imply that the actual
welfare program might be less generous than the optimal welfare program.
VI. CONCLUSION
Moffitt’s work shows that stigma plays an important role in people’s utility functions and
thus distorts people’s labor supply decisions and welfare. Because stigma plays an important role
in people’s utility functions and their labor supply decisions, it must also have important
implications for the optimal income tax schedule. However, the traditional literature on optimal
income taxation does not take this important factor into account.
In this research, I integrate the theory of welfare stigma and the theory of optimal income
taxation. To meet this challenge, I am driven to study optimal income taxation and optimal
welfare program within a unified framework while taking welfare stigma into account. In my
framework, the government has two instruments at hand: general income taxation and a welfare
program. Individuals are heterogeneous along two dimensions, namely, wages and welfare
stigma. Each individual is assumed to take the income tax schedule and the parameters of the
welfare program as given and make his optimal labor supply decisions to maximize his utility:
(1) whether to enroll in the welfare program or self-select into the general income taxation; and
(2) conditional on self-selecting into the general income taxation, how much labor to supply. The
government is assumed to choose both an optimal income tax schedule and an optimal welfare
program so as to maximize a social welfare function subject to its own budget constraint and
individuals’ behavioral response to the income tax schedule and the welfare program parameters.
26
Within the unified framework, I use both theoretical analysis and numerical simulation to
provide answers for some questions that are important for public policy. Theoretical analysis
shows that: (1) it can be optimal for the government to offer both general income taxation and a
welfare program; and (2) the more intensely people suffer from welfare stigma, the higher the
welfare benefit should be. Numerical simulations show that: (1) it is optimal for the government
to offer both a negative income tax schedule and a welfare program, which verifies the
theoretical conclusion that it can be optimal for the government to provide both general income
taxation and a welfare program; and (2) the actual welfare program might be less generous than
the optimal welfare program.
27
APPENDIX 1: PROOF OF PROPOSITIONS 1, 2, 3, AND 4
In order to prove Propositions 1, 2, 3, and 4, I need to prove eight lemmas first.
Lemma 1: Conditional on that under equilibrium skilled people choose to take difficult jobs and
all unskilled people choose to take easy jobs, the optimal conditions are given by:
(1) ( ) ( )*1
*1 butq EEE >−+ δu ,
(2) ( ) ( ) EEEDDD tqutq δδ −+=− *1
*1u − , and
(3) * tt = . *11 ED
The maximum value of the social welfare function is ( ) DDD tqu δ−−= *11SW , where satisfies *
1Dt
( ) ( ) EDEDDD tqutqu δδ −+=−− *1
*1 , i.e., ( )( ) EEDDD qt −+−− δδ*
1D quut = −1*1 .
Proof: In order to induce skilled people to take difficult jobs, the government should set
parameters such that ( ) ( ) EEEDDD tqutqu δδ −+≥−− 11 and ( ) ( ) φδ −≥−− 11 butqu DDD . In
order to induce unskilled people with stigma to take easy jobs, the government should set
parameters such that ( ) ( ) φδ −≥−+ 11 butq EEE
( )
u . In order to induce unskilled people without
stigma to take easy jobs, the government should set parameters such that
( )1 butqu EEE >−+ 1δ . Taken together, in order to induce skilled people to take difficult jobs
and all unskilled people to take easy jobs, the government should set parameters such that
( )1 butqu EEE >−+ ( )1δ and ( ) ( ) EEEDDD tqutqu δδ −+≥−− 11
11 ED tt =
. The government’s budget
constraint implies that .
Conditional on that ( ) ( )11 butqu EEE >−+ δ , ( ) ( ) EEEDDD tqutqu δδ −+≥−− 11 , and
, in order to maximize the social welfare function, the government should extract as
much as possible from the skilled people without inducing them to switch from difficult jobs to
11 ED tt =
28
easy jobs. Therefore, the government should set parameters such that
( ) ( ) EEEDDD tqutqu δδ −+=−− *1
*1 . We also have ( ) ( )*
1*
1 butq EEE >−+ δu so that all unskilled
people choose to take easy jobs under equilibrium. The budget constraint implies that t .
Thus, we have proved the optimal conditions.
*1
*1 ED t=
( ) (*1 utq EEE >−δ ) ( ) EEED tqu δδ −+=− *
1*
1Et=
( Dqu
( ) EEE tqu δ−+ *1 ( Eq
( ) ( ) EEEDDD tqutqu δδ −+=−− *1
*1 ( ) DDD tquSW δ−−= *
11
( ) ( ) EDEDDD tqutqu δδ −+=−− *1
*1 ) ( )EEDD tqut δ −+=− *
1*
1*
1*
1 ED t=
( ) ( ) EDEDDD tqutqu δδ −+=−− *1
*1 ( )( DDDD tquu −−= − δ*
11*
1
( ) δ−+ *2tq EEE
( ) δ−*2tq DDD
*2
*2D
( )DD tquSW δ−−= *22 2
( ) ( ) φδ −=−− *2
*2 DDDD tutqu ) )φδ +− DDt *
2
( ) ) EDD tq δ−− 2 2 ( ( ) φ−− 22tq DD
With )*1bu + , ( DD tqu − *
1 , and t , the
utility of skilled people is
*1D
) Dδ−D*
1t− , the utility of unskilled people with stigma is
, and the utility of unskilled people without stigma is ) Eδ−Et+ *1u . Because
, we have , where t satisfies *1D
by ( EδDqu − and t .
is equivalent to ) Eq−E+δt .
Thus, we have proved Lemma 1.
Lemma 2: Conditional on that under equilibrium skilled people choose to take difficult jobs and
all unskilled people choose to be on welfare, the optimal conditions are given by:
(4) ( ) φ−< *2buu ,
(5) ( ) φ−= *2bu−u , and
(6) bt = .
The maximum value of the social welfare function is D , where t satisfies *D
, i.e., (( −= −DD quut 1*
2 .
Proof: In order to induce skilled people to take difficult jobs, the government should set
parameters such that ( EED tquu δ +≥− and ) δ ≥− buDu . In
order to induce unskilled people with stigma to be on welfare, the government should set
29
parameters such that ( ) ( ) φδ −<−+ 22 butq EEE
( )2
u . In order to induce unskilled people without
stigma to be on welfare, the government should set parameters such that
( )2 butqu EEE ≤−+ δ . Taken together, in order to induce skilled people to take difficult jobs
and all unskilled people to be on welfare, the government should set parameters such that
( ) ( ) φδ −2<−+ 2 butqu EEE and ( ) ( ) φδ −≥−− 2buD2tq DD
22 btD =
( )
u . The government’s budget
constraint implies that .
( ) φδ −<−+ 22 butq EEE ( )− 2tqu DD
(qD
( ) ( ) φ−< *2buδ−+ *
2tqu EEE
*2
*2 btD =
( ) ( ) φδ −<− *2
*2 buEEtqE ( ) (δ =−*
2 buDD− tqu D
( ) DDD tq δ−− *2u
( ) ( ) ( )*2
*2 bu<−φ*
2 butqu DDD =−− δ ( D tquSW −=2
( ) ( ) φ−*2Dδ =−− *
2 DDD tutqu ( ) (δ =−*2 ut DD−qD
( ) ( ) φ−*2Dδ =−− *
2 DDD tutqu (( −−DD tquu *
21=D
*2
Conditional on u , ( ) φδ −≥− 2buD , and t 22 bD = ,
in order to maximize the social welfare function, the government should extract as much as
possible from the skilled people without inducing them to switch from difficult jobs to welfare.
Therefore, the government should set parameters such that ) ( ) φ−*2δ =−− *
2 but DDu . We also
have so that all unskilled people choose to be on welfare under
equilibrium. The budget constraint implies that . Thus, we have proved the optimal
conditions.
With +u , ) φ−*2 , and , the utility
of skilled people is
*2
*2 btD =
, the utility of unskilled people with stigma is ( ) φ−*2bu , and
the utility of unskilled people without stigma is ( )*2bu . Because
, we have ) DD δ−*2 , where t satisfies *
D2
by ) φ−*2bu and t . *
2b=*2D
is equivalent to ) )φδ− D +t .
Thus, we have proved Lemma 2.
30
Lemma 3: Conditional on that under equilibrium skilled people choose to take difficult jobs,
unskilled people with stigma choose to take easy jobs, and unskilled people without stigma
choose to be on welfare, then the optimal conditions are given by:
(7) ( ) ( )*3
*3 butq EEE =−+ δu ,
(8) ( ) ( ) EEEDDD tqutq δδ −+=− *3
*3u − , and
(9) ( )*t = . *3
*33 1 btED αα +−
The maximum value of the social welfare function is ( ) DDD tquSW δ−−= *33 , where t satisfies *
3D
( ) ( )( )( ) ( )( )DDD tq δ−− *31 .EEDDDD uuqtquut αδδα +−+−−−= −− 1*
31*
3
Proof: In order to induce skilled people to take difficult jobs, the government should set
parameters such that ( ) ( ) EEEDDD tqutqu δδ −+≥−− 33 and ( ) ( ) φδ −≥−− 33 butq DDDu . In
order to induce unskilled people with stigma to take easy jobs, the government should set
parameters such that ( ) ( ) φδ −≥−+ 33 butq EEE
( )
u . In order to induce unskilled people without
stigma to be on welfare, the government should set parameters such that
( )3 butqu EEE ≤−+ 3δ . Taken together, in order to induce skilled people to take difficult jobs,
unskilled people with stigma to take easy jobs, and unskilled people without stigma to be on
welfare, the government should set parameters such that ( ) ( ) ( )33 butqu EEE ≤−+3bu ≤− δφ and
( ) DDD qutqu ( ) EEE t δδ −+ 3≥−− 3
( ) 333 1 btt ED
. The government’s budget constraint implies that
αα +−= .
Conditional on that ( ) ( ) EEEDDD tqutqu δδ −+≥−− 33 ,
( ) ( ) ( 333 butqubu EEE ≤−+≤− )δφ , and ( ) 33 1 btt ED 3αα +−= , in order to maximize the social
welfare function, the government should extract as much as possible from the skilled people
31
without inducing them to switch from difficult jobs to easy jobs. Therefore, the government
should set parameters such that ( ) ( ) EEEDDD tqutqu δδ −+=−− *3
*3 . It can be proved by
contradiction that ( ) ( )*3bu
*3Dt
*3tq EEE =−+ δ
*3E
u . Otherwise, if the government reduces b by a certain
amount, increases t by a certain amount, and reduces by a certain amount, the social
welfare can be improved. The budget constraint implies that
*3
( ) **3tEα +− 3bα*
3 1D =t . Thus, we
have proved the optimal conditions.
( ) ( )*3bu=*
3tqu EEE −+ δ ( ) ( ) Eδ−EE tq + *3DDD utqu δ =−− *
3
*3bα ( ) DDD tq δ−− *
3
( ) EδEE tq −+ *3
( ) ( ) ( )*3
*3 butqu EEE =−+ δ*
3tq DDD =−− δ ( Dqu=3
) ( ) ( )*3
*3 buEE =−δ*
3 tqu EDD +=−δ ( )( Dt *3Dqu −−1ub =*
3
) ) EEDD q−+− δδ*3 ( )( Dδ−DD tq − *
3uu= −1*3
) ) EEDD q−+− δδ*3 ( ) *
3*
3*
3 1 btt ED αα +−=
( ) ) ) ( )( )DDDEE tquuq δα −−+− − *3
11 .DDD tq δδ +−− *3
( ) ( ) φδ −<−+ 4tqu EE 4buE
With , , and
, the utility of the skilled people is ( ) *3
*3 1 tt ED α +−= u , the utility of unskilled
people with stigma is u , and the utility of unskilled people without stigma is
( )*3bu . Because u , we have ) DDt δ−− *
3SW .
Because ( tqu D − , we have )Dδ− and
(( DE tquut −= −1*3 . Substitute )b and
(( DE tquut −= −1*3 into , and we obtain
( ) ((D uut α−= −1*3
Thus, we have proved Lemma 3.
Lemma 4: It is impossible for the government to set parameters such that under equilibrium
unskilled people with stigma choose to be on welfare and unskilled people without stigma
choose to take easy jobs simultaneously.
Proof: In order to induce unskilled people with stigma to be on welfare, the government should
set parameters such that . In order to induce unskilled people without
stigma to take easy jobs, the government should set parameters such that
32
( ) ( 44 butqu EEE >−+ )δ . As ( ) ( ) φδ −<−+ 44 butq EEEu and ( ) 44 butqu EEE > ( )−+ δ cannot be
satisfied simultaneously, it is impossible for the government to set parameters such that under
equilibrium unskilled people with stigma choose to be on welfare and unskilled people without
stigma choose to take easy jobs simultaneously.
( ) ( ) φδδ −>−> EEqu− DDq α
*2
*3 DD t< 3SW 1SW 2SW 3SW
( )( )( ( )( )DDD tquu δα −−+ − *3
1EDDD tquu δδ −+−−− *
31
D =*3 1
( ) 00 = ( ) 001 =− ( ).1−u
( )( ( )( ) ) ( )( DDEE tquuq −>− − *3
1DDDE utquuq δ +−−> − *
31
EDDD tquu δδ +−−− *3
1
( ) EEqu δ< ( )1 −−E qu δ 0
( )( ) EE q−δDDD tquut −−< − δ*3
1*3 )( ) EEDD qt −+−− δδ*
1D uu= −1*1
*1
*3 DD tt <
( )( ) ( )(δ −<− −DDD tquu *
31
3≈ −DD quu 1*
3
( )( )φδ +− D2= −DD quut 1*
2
*2
*3 DD tt <
( ) ( φδδ −>>>− EDD quq 0 α
*1
*3 DD t< *
2*
3 DD t<
Thus, we have proved Lemma 4.
Lemma 5: If > 0u , and if is close to one, then we have t
and t , which further imply that is the largest among , , and .
*1
*3 DD t<
Proof: We have ( ) )Eqα−
( ).
t by Lemma
3. Because u and u is concave by assumption so that u and is convex,
we have
) ( )Dδ−δ− −1
because so that >E . Therefore, we have
D + . Because ( Dqt by Lemma 1,
it can be proved by contradiction that .
If α is close to one, we have )φδ +−− DDt *t .
Because − Dt * by Lemma 2, it can be proved by contradiction that
.
Thus, we have shown that if )−Eu , and if is close to one,
then we have t and . t
33
Because ( ) DDD tqu δ−− *11SW = by Lemma 1, ( ) DDD tquSW δ−−= *
22 by Lemma 2, and
( ) Dδ−3
2 3SW
DD tquSW −= *3
1SW SW
by Lemma 3, and imply that is the largest
among , , and .
*1
*3 DD tt < *
2*
3 DD tt < 3SW
Thus, we have proved Lemma 5.
Lemma 6: If ( ) ( ) φδδ −>−>>− EEDD ququ 0 , and if α is close to zero, then we have t ,
but we cannot compare t and , and we cannot compare and t , which further imply
that either or is the largest among , , and .
*1
*3 DD t<
*1D
*2Dt
3SW 2SW SW
*3Dt
3SW
*2D
1 2SW
Proof: In the proof of Lemma 5, we have shown that t . *1
*3 DD t<
By Lemma 1, we have ( ) ( ) EDEDDD tqutq δδ −+=−− *1
*1u . By Lemma 2, we have
( ) ( ) φδ −=−− *2
*2 DDDD tutqu . Although ( ) φδ −>− EEq
*1D
*2D
u , the utility contribution of
decreases with t . Therefore, we cannot compare t and t .
Eq
*1D
We have ( ) ( )( )( ) ( )( )DDDEEDDDD tquuqtquu δαδδα −−+−+−−− −− *3
1*3
1*3 1t = by Lemma
3. If α is close to zero, ( )( ) EEDDDD qtquu −+−−≈ − δδ*3
1*3t , which is equivalent to
( ) ( ) EDE tqu δ−+= *3DDqu δ−− Dt *
3 . Therefore, when α is close to zero, we cannot compare
and t , either.
*3Dt
*D2
Because ( ) DDD tqu δ−− *11SW = by Lemma 1, ( ) DDD tquSW δ−−= *
22 by Lemma 2, and
( ) Dδ−3
3SW SW
DD tquSW −= *3
*3Dt
*2D
by Lemma 3, therefore t and the facts that we cannot compare
and t imply that either or is the largest among , , and .
*1Dt
1SW 2SW
*3D <
2 3SW
Thus, we have proved Lemma 6.
34
Lemma 7: If ( ) ( ) EEDD ququ δφδ −>−>>− 0 , and if α is close to one, then we have
, which further implies that . *1
*2
*3 DDD ttt << 12 SWSW >3SW >
Proof: We have ( ) ( ) EDEDDD tqutq δδ −+=−− *1
*1u by Lemma 1. Because and ( ) 00 =u ( ).u is
concave by assumption, and because ( ) EEqu δφ −>− by assumption, we have
( ) ( ) ( ) ( ) φδ −<−+< *1
*1 DEED tuqutuδ−+ *
1 EDE tqu . Thus, we have ( ) ( ) φ−D tutq δ <−*1 DD −*
1Du .
Because ( ) ( ) φδ −=− *2
*2 DDD tut−Dqu
*1
*2 DD tt <
by Lemma 2, it can be proved by contradiction that
.
We have ( ) ( )( )( ) ( )( )DDDEEDDDD tquuqtquu δαδδα −−+−+−−− −− *3
1*3
1*3 1t = by Lemma
3. If α is close to one, then ( )( ) ( )( )φδδ +−−<−−≈ −−DDDDDDD tquutquu *
31*
31*
3t . Because
( )( )φδ +− DDt *2
*2
*3 DD tt <−= DD qut *
2−u 1 , it can be proved by contradiction that .
Thus, we have shown that if ( ) ( ) EEDD ququ δφδ −>−>>− 0 , and if α is close to one,
then we have t . *1
*2
*3 DDD tt <<
Because ( ) DDD tqu δ−− *11SW = by Lemma 1, ( ) DDD tquSW δ−−= *
22 by Lemma 2, and
( ) Dδ−3*DDD tquSW −= *
3 by Lemma 3, t implies that . *1
*23 DD tt << 1SW23 SWSW >>
Thus, we have proved Lemma 7.
Lemma 8: If ( ) ( ) EEDD ququ δφδ −>−>>− 0 , and if α is close to zero, then we have t
and t , which further imply that is the largest among , , and .
*1
*2 DD t<
*3
*2 DD t< 2SW 1SW 2SW 3SW
Proof: In the proof of Lemma 7, we have shown that . *1
*2 DD tt <
We have ( ) ( )( )( ) ( )( )DDDEEDDDD tquuqtquu δαδδα −−+−+−−− −− *3
1*3
1*3 1t = by Lemma
3. If α is close to zero, we have ( )( ) EEDDDD qtquut −+−−≈ − δδ*3
1*3 , which is equivalent to
35
( ) ( ) EDEDDD tqutqu δδ −+=−− *3
*3 . Because ( ) ( ) φδ −=−− *
2*
2 DDDD tutqu by Lemma 2, we can
show that t following the same argument in the proof of Lemma 7. *3
*2 DD t<
( ) EED qu δφδ −>−>> 0
*1
*2 DD t< **
2 DD tt <
( )DD tqu − *11 ( )DD tquSW −−= *
22
( ) DDD tquSW δ−−= *33
1SW 2SW 3SW
*1Dt *
3*
2 DD tt <
( ) φδδ −>>− DDq 0 E α
( )*tq EEE −δ
( ) ) EEt δ−*DDD tq δ−*
*1 tED α +−
12** 12
12
13 =CCC
4** 12
12
11 =CCC
Thus, we have shown that if ( )Dqu − , and if α is close to zero,
then we have t and . 3
Because Dδ−SW = by Lemma 1, Dδ by Lemma 2, and
by Lemma 3, t and imply that is the largest
among , , and .
*2D < 2SW
Thus, we have proved Lemma 8.
Proposition 1: If ( )−> Equu , and if is close to one, then the
equilibrium is a separating equilibrium, and the optimal conditions are given by:
(10) ( )*bu=u + ,
(11) ( Equ +=u − , and
(12) ( )*t = . *bα
Proof: Because skilled people have three options (take a difficult job, take an easy job, or be on
welfare), unskilled people with stigma have two options (take an easy job, or be on welfare), and
unskilled people without stigma have two options (take an easy job, or be on welfare), there are
possible equilibrium outcomes in principle. However, it is impossible to be
optimal so that under equilibrium skilled people choose to take easy jobs or be on welfare
because the government will then have no resource to subsidize unskilled people. Thus, we only
need to consider possible equilibrium outcomes.
36
By Lemma 4, it is impossible for the government to set parameters such that under
equilibrium unskilled people with stigma choose to be on welfare and unskilled people without
stigma choose to take easy jobs simultaneously. Thus, we only need to consider three possible
equilibrium outcomes: (1) skilled people choose to take difficult jobs, and all unskilled people
choose to take easy jobs; (2) skilled people choose to take difficult jobs, and all unskilled people
choose to be on welfare; and (3) skilled people choose to take difficult jobs, unskilled people
with stigma choose to take easy jobs, and unskilled people without stigma choose to be on
welfare.
Conditional on the first possible equilibrium outcome, the maximum social welfare is
by Lemma 1. Conditional on the second possible equilibrium outcome, the maximum social
welfare is by Lemma 2. Conditional on the third possible equilibrium outcome, the
maximum social welfare is by Lemma 3.
1SW
2SW
3SW
If ( ) ( ) φδδ −>−>>− EEDD ququ 0 , and if α is close to one, then by Lemma 5, is
the largest among , , and . The optimal conditions in Lemma 3 are therefore the
overall optimal conditions. Thus, we have shown that if
3SW
1SW 2SW 3SW
( ) ( ) φδδ −>−>>− EEDD quq 0u , and if
α is close to one, the optimal conditions are given by ( ) ( )*bu=*tq EEE −+ δu ,
( ) ( ) EEED tqu δδ −+=− *DD tq − *u , and ( ) *tE
*bα+*tD 1 α−= . Under the optimal parameters,
skilled people choose to take difficult jobs, unskilled people with stigma choose to take easy
jobs, and unskilled people without stigma choose to be on welfare. The equilibrium is a
separating equilibrium.
Thus, we have proved Proposition 1.
37
Proposition 2: If ( ) ( ) φδδ −>−>>− EEDD quq 0u , and if α is close to zero, then the
equilibrium is either a separating equilibrium or a pooling equilibrium, and the optimal
conditions are given either by for the separating equilibrium:
(13) ( ) ( )** butq EEE =−δu + ,
(14) ( ) ( ) EEEDDD tqutq δδ −+=− **u − , and
(15) ( )*t = . **1 btED αα +−
or by for the pooling equilibrium:
(16) ( ) ( ) φδ −<−+ ** butq EEEu ,
(17) ( ) ( ) φδ −=− ** butq DDD −u , and
(18) * bt = . *D
Proof: Following the same argument as in the proof of Proposition 1, we only need to consider
three possible equilibrium outcomes: (1) skilled people choose to take difficult jobs, and all
unskilled people choose to take easy jobs; (2) skilled people choose to take difficult jobs, and all
unskilled people choose to be on welfare; and (3) skilled people choose to take difficult jobs,
unskilled people with stigma choose to take easy jobs, and unskilled people without stigma
choose to be on welfare.
Conditional on the first possible equilibrium outcome, the maximum social welfare is
by Lemma 1. Conditional on the second possible equilibrium outcome, the maximum social
welfare is by Lemma 2. Conditional on the third possible equilibrium outcome, the
maximum social welfare is by Lemma 3.
1SW
2SW
3SW
If ( ) ( ) φδδ −>−>>− EEDD quq 0u , and if α is close to zero, then by Lemma 6, either
or is the largest among , , and . Therefore, either the optimal conditions 3SW 2SW 1SW SW2 3SW
38
in Lemma 3 or the optimal conditions in Lemma 2 are the overall optimal conditions. If
, then the optimal conditions are given by 23 SWSW > ( ) ( )** butq EEE =−+ δu ,
( ) ( ) EEED tqu δδ −+= *DD tqu −− * , and ( ) *** 1 btED αα +−=t . If , then the optimal
conditions are given by
3SW2SW >
( ) ( ) φ−< *buδ− E+ *tq EEu , ( ) ( ) φ−* ** btD =δ = buD−*D− tqDu , and .
( ) ( ) EEquDDq δφδ −>−>0>− α
( ) ( )*b*tq EE −δ uE =
( ) ( ) EEE tq δ−+ *
) E
D u=
*bα+
( Equ
DD tq δ−*
*1 tED α−
2SW
3SW
( ) DDq
1SW
δφδ −>−>>− 0 α
1SW>23 SWSW >
Thus, we have proved Proposition 2.
Proposition 3: If u , and if is close to one, then the
equilibrium is a separating equilibrium, and the optimal conditions are given by:
(19) u + ,
(20) u − , and
(21) ( )*t = .
Proof: Following the same argument as in the proof of Proposition 1, we only need to consider
three possible equilibrium outcomes: (1) skilled people choose to take difficult jobs, and all
unskilled people choose to take easy jobs; (2) skilled people choose to take difficult jobs, and all
unskilled people choose to be on welfare; and (3) skilled people choose to take difficult jobs,
unskilled people with stigma choose to take easy jobs, and unskilled people without stigma
choose to be on welfare.
Conditional on the first possible equilibrium outcome, the maximum social welfare is
by Lemma 1. Conditional on the second possible equilibrium outcome, the maximum social
welfare is by Lemma 2. Conditional on the third possible equilibrium outcome, the
maximum social welfare is by Lemma 3.
If u , and if is close to one, then we have
by Lemma 7. Therefore, the optimal conditions in Lemma 3 are the overall
39
optimal conditions. Thus, we have shown that if ( ) ( EEDD ququ ) δφδ −>−>>− 0 , and if α is
close to one, then the optimal conditions are given by ( ) ( )*bu*tq EE +u , E =−δ
( ) ( ) EEEDDD tqutqu δδ −+=−− ** , and ( ) ** btE αα +* 1tD −= . Under the optimal parameters,
skilled people choose to take difficult jobs, unskilled people with stigma choose to take easy
jobs, and unskilled people without stigma choose to be on welfare. The equilibrium is a
separating equilibrium.
( ) ( )Equ EDDqu δδ −>>− α
( ) ) φ−*δ−+ *tq EE
( ) ) φ−*δ−*tq DD
*D
1SW
2SW
3SW
Thus, we have proved Proposition 3.
Proposition 4: If φ−>0 , and if is close to zero, then the
equilibrium is a pooling equilibrium, and the optimal conditions are given by:
(22) (< buEu ,
(23) (= buD−u , and
(24) * bt = .
Proof: Following the same argument as in the proof of Proposition 1, we only need to consider
three possible equilibrium outcomes: (1) skilled people choose to take difficult jobs, and all
unskilled people choose to take easy jobs; (2) skilled people choose to take difficult jobs, and all
unskilled people choose to be on welfare; and (3) skilled people choose to take difficult jobs,
unskilled people with stigma choose to take easy jobs, and unskilled people without stigma
choose to be on welfare.
Conditional on the first possible equilibrium outcome, the maximum social welfare is
by Lemma 1. Conditional on the second possible equilibrium outcome, the maximum social
welfare is by Lemma 2. Conditional on the third possible equilibrium outcome, the
maximum social welfare is by Lemma 3.
40
If u( ) ( ) EEDD quq δφδ −>−>>− 0 , and if α is close to zero, then is the largest
among , , and by Lemma 8. Therefore, the optimal conditions in Lemma 2 are the
overall optimal conditions. Thus, we have shown that if
2SW
1 2SW 3SWSW
( ) ( Equ ) EDDqu δφδ −>−>>− 0 , and if
α is close to zero, the optimal conditions are given by ( ) ( ) φ−*δ−*E <+ butq EEu ,
( ) ( ) φδ −=− *buD** btD =−qD
*tDu , and . Under the optimal parameters, skilled people choose to
take difficult jobs, and all unskilled people choose to be on welfare. The equilibrium is a pooling
equilibrium.
Thus, we have proved Proposition 4.
41
REFERENCES
Akerlof, George A., “The Economics of ‘Tagging’ as Applied to the Optimal Income Tax,
Welfare Programs, and Manpower Planning,” American Economic Review, March 1978, 68,
8-19.
Akerlof, George A., and Rachel E. Kranton, “Economics and Identity,” Quarterly Journal of
Economics, August 2000, 715-753.
Akerlof, George A., and Rachel E. Kranton, “Identity and Schooling: Some Lessons for the
Economics of Education,” Journal of Economic Literature, December 2002, 40, 1167-1201.
Akerlof, George A., and Rachel E. Kranton, “Identity and the Economics of Organizations,”
Journal of Economic Perspectives, 2005, 19(1), 9-32.
Auerbach, Alan J., Laurence J. Kotlikoff, and Jonathan Skinner, “The Efficiency Gains from
Dynamic Tax Reform,” International Economic Review, 1983, 24(1), 81-100.
Corneo, Giacomo, “The Efficient Side of Progressive Income Taxation,” European Economic
Review, 2002, 46, 1359-1368.
Diamond, Peter A., “Optimal Income Taxation: An Example with a U-Shaped Pattern of Optimal
Marginal Tax Rates,” American Economic Review, 1998, 88(1), 83-95.
Fair, Ray C., “The Optimal Distribution of Income,” Quarterly Journal of Economics, 1971,
85(4), 551-579.
Ghez, Gilbert R., and Gary S. Becker, The Allocation of Time and Goods over the Life Cycle,
New York: Columbia University Press, 1975.
Hausman, J., “Labor Supply,” in H. Aaron and J. Pechman, eds. How Taxes Affect Economic
Behavior, Washington: Brookings, 1981.
42
43
Heckman, James, “Shadow Prices, Market Wages and Labor Supply,” Econometrica, 1974,
42(4), 679-694.
Ireland, Norman J., “Status-seeking, Income Taxation and Efficiency,” Journal of Public
Economics, 1998, 70, 99-113.
Ireland, Norman J., “Optimal Income Tax in the Presence of Status Effects,” Journal of Public
Economics, 2001, 81, 193-212.
MaCurdy, Thomas E., “An Empirical Model of Labor Supply in a Life-Cycle Setting,” Journal
of Political Economy, 1981, 89, 1059-1085.
Mirrlees, James A., “An Exploration in the Optimal Theory of Income Taxation,” Review of
Economic Studies, April 1971, 38, 175-208.
Moffitt, Robert, “An Economic Model of Welfare Stigma,” American Economic Review,
December 1983, 73(5), 1023-1035.
Rosen, Harvey S., “Taxes in a Labor Supply Model with Joint Wages-Hours Determination,”
Econometrica, 1976, 44, 485-580.
Rosen, Harvey S., Public Finance, 6th edition, New York: McGraw-Hill, 2002.
Sadka, Efraim, “On Income Distribution, Incentive Effects and Optimal Income Taxation,”
Review of Economic Studies, 1976, 43(2), 261-267.
Saez, Emmanuel, “Optimal Income Transfer Programs: Intensive versus Extensive Labor Supply
Responses,” Quarterly Journal of Economics, August 2002, 117, 1039-1073.
Seade, Jesus K., “On the Shape of Optimal Tax Schedules,” Journal of Public Economics, 1977,
7(2), 203-235.
Stern, Nichols H., “On the Specification of Models of Optimal Income Taxation,” Journal of
Public Economics, 1976, 6, 123-162.