using elasticities to derive optimal income tax …people.tamu.edu/~ganli/publicecon/saez01.pdfin...

Using Elasticities to Derive Optimal Income Tax Rates

Emmanuel Saez

The Review of Economic Studies, Vol. 68, No. 1. (Jan., 2001), pp. 205-229.

Stable URL:

http://links.jstor.org/sici?sici=0034-6527%28200101%2968%3A1%3C205%3AUETDOI%3E2.0.CO%3B2-E

The Review of Economic Studies is currently published by The Review of Economic Studies Ltd..

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available athttp://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtainedprior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content inthe JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained athttp://www.jstor.org/journals/resl.html.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission.

The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academicjournals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers,and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community takeadvantage of advances in technology. For more information regarding JSTOR, please contact [email protected].

http://www.jstor.orgSat Aug 18 16:48:15 2007


http://www.jstor.org/about/terms.html

http://www.jstor.org/journals/resl.html

Review o f Economic Studies (2001) 68, 205-229 02001 The Review o f Economic Studies Limited

Using Elasticities to Derive Optimal Income Tax Rates

EMMANUEL SAEZ Harvard University and NBER

First version received June 1999; final zjersion accepted May 2000 (Eds.)

This paper derives optimal income tax formulas using compensated and uncompensated elasticities o f earnings with respect to tax rates. A simple formula for the high income optimal tax rate is obtained as a function o f these elasticities and the thickness o f the top tail o f the income distribution. In the general non-linear income tax problem, this method using elasticities shows precisely how the different economic effects come into play and which are the key relevant parameters in the optimal income tax formulas o f Mirrlees. The optimal non-linear tax rate formulas are expressed in terms o f elasticities and the shape o f the income distribution. These formulas are implemented numerically using empirical earning distributions and a range o f realistic elasticity parameters.

1. INTRODUCTION

There is a controversial debate about the degree of progressivity that the income tax should have. This debate is not limited to the economic research area but also attracts much attention in the political sphere and among the public in general. At the centre of the debate lies the equity-efficiency trade-off. Progressivity allows the government to redistribute from rich to poor, but progressive taxation and high marginal tax rates have efficiency costs. High rates may affect the incentives to work and may therefore reduce the tax base, producing large deadweight losses. The modern setup for analysing the equity-efficiency tradeoff using a general nonlinear income tax was built by Mirrlees (1971). Since then, the theory of optimal income taxation based on the original Mirrlees framework has been considerably developed. The implications for policy, however, are limited for two main reasons.

First, optimal income tax schedules have few general properties: we know that optimal rates must lie between 0 and 1 and that they equal zero at the top and the bottom. These properties are of little practical relevance for tax policy. In particular the zero marginal rate at the top is a very local result. In addition, numerical, simulations show that tax schedules are very sensitive to the utility functions chosen. Second, optimal income taxation has interested mostly theorists and has not changed the way applied public finance economists think about the equity-efficiency tradeoff. Though behavioural elasticities are the key concept in applied studies, there has been no systematic attempt to derive results in optimal taxation which could be easily used in applied studies. As a result, optimal income tax theory is often ignored and tax reform discussions are centred on the concept of deadweight burden. Thus, most discussions on tax reforms focus only on the efficiency aspect of taxation and do not incorporate the equity aspect in the analysis.

This paper argues that there is a simple link between optimal tax formulas and elasticities of earnings familiar to empirical studies. It shows that using elasticities directly to derive optimal income tax rates is a useful method to obtain new results in optimal income

205

206 REVIEW OF ECONOMIC STUDIES

taxation. First, a simple formula for the optimal tax rate for high incomes is derived as a function of both substitution and income effects and the thickness of the top tail of the income distribution. Second, deriving the general Mirrlees formula for optimal non-linear tax rates in terms of elasticities provides a clear understanding of the key economic effects underlying the formula. It shows that the shape of the income distribution plays a critical role in the pattern of optimal tax rates. Third, the optimal tax formulas derived using elasticities do not explicitly require the strong homogeneity assumptions about preferences usually made in the optimal income tax literature. Therefore the elasticity method might be robust to the introduction of heterogeneity in preferences. Last, because the formulas derived are closely related to empirical magnitudes, they can be easily implemented numerically using the empirical income distribution and making realistic assumptions about the elasticity parameters.

The paper is organized as follows. Section 2 reviews the main results of the optimal income tax literature. Section 3 derives a simple formula for optimal high income tax rates and relates it to empirical magnitudes. Section 4 considers the general optimal non-linear income tax problem. The formula of Mirrlees (1971) is derived directly in terms of elasticities. Section 5 presents numerical simulations of optimal tax schedules and Section 6 concludes.

2. LITERATURE REVIEW

The Mirrlees (1971) model of optimal income taxation captures the key efficiency-equity tradeoff issue of redistribution: the government has to rely on a distortionary nonlinear income tax to meet both its revenue requirements and redistribute income. General results about optimal tax schedules are fairly limited. Tuomala (1990) presents most of the formal results.

Mirrlees (1971) showed that there is no gain from having marginal tax rates above 100% because nobody will choose to have such a rate at the margin. Mirrlees (1971) also showed that optimal marginal rates cannot be negative. Seade (1982) clarified the conditions under which this result holds. The most striking and well known result is that the marginal tax rate should be zero at the income level of the top income level when the income distribution is bounded (Sadka (1976) and Seade (1977)). Numerical simulations have shown, however, that this result is very local (see Tuomala (1990)). This result is therefore of little practical interest. Mirrlees (1971) did not derive this simple result because he considered unbounded distributions of skills. He nonetheless presented precise conjectures about asymptotic optimal rates in the case of utility functions separable in consumption and labour. Nonetheless, these conjectures have remained practically unnoticed in the subsequent optimal income tax literature. This can be explained by two reasons. First, Mirrlees conjectures depend on the unobservable distribution of skills and on abstract properties of the utility function with no obvious intuitive meaning. Second, the zero top rate result was probably considered for a long time as the definitive result because the empirical income distribution is indeed bounded. The present paper argues that in fact unbounded distributions are of much more interest than bounded distributions to address the high income optimal tax rate problem.

A symmetrical zero rate result has been obtained at the bottom. Seade (1977) showed that if everybody works (and labour supply is bounded away from zero) then the bottom rate is zero. However, if there is an atom of non workers then the bottom tax rate is positive and numerical simulations show that, in this case, the bottom rate can be substan- tial (Tuomala (1990)).

207 SAEZ ELASTICITIES AND INCOME TAXES

A number of studies have tried to relate optimal income tax formulas to the elasticity concepts used in applied work. Using the tools of optimal commodity tax theory, Dixit and Sandmo (1977) expressed the optimal linear income tax rate in terms of elasticities. However, in the case of the non-linear income tax problem, the attempts have been much less systematic. Roberts (2000) uses a perturbation method, similar in spirit to what is done in the present paper, and obtains optimal non-linear income tax formulas expressed in terms of elasticities.' He also derives asymptotic formulas that are similar to the ones I obtain.' Recently, Diamond (1998) analysed the case of utility functions with no income effects and noticed that the Mirrlees formula for optimal rates is considerably simpler in that case and could be expressed in terms of the labour supply elasticity. He also obtained simple results about the asymptotic pattern of the marginal rates.3 Piketty (1997) considered the same quasi-linear utility case and derived Diamond's optimal tax formulas for the Rawlsian criterion without setting a formal programme of maximization. He considered instead small local changes in marginal rates and used directly the elasticity of labour supply to derive the behavioural effects of this small reform. My paper clarifies and generalizes this alternative method of derivation of optimal taxes. Finally, the non- linear pricing literature, which considers models that are formally very close to optimal income tax models, has developed a methodology to obtain optimal price formulas based on demand profile elasticities that is also close to the method adopted here (see Wilson (1993)).

Another strand of the public economics literature has developed similar elasticity methods to calculate the marginal costs of public funds. The main purpose of this literature was to develop tools more sophisticated than simple deadweight burden computations to evaluate the efficiency costs of different kinds of tax reforms and the optimal provision of public goods (see e.g. Ballard and Fullerton (1992) and Dahlby (1998)). I will show that the methods of this literature can be useful to derive results in optimal taxation and that, in particular, Dahlby (1998) has come close to my results for high income rates.

Starting with Mirrlees (1971), considerable effort has gone into simulations of optimal tax schedules. Following Stern (1976), attention has been paid on a careful calibration of the elasticity of labour supply. Most simulation results are surveyed in Tuomala (1990). It has been noticed that the level of inequality of the distribution of skills and the elasticities of labour supply significantly affect optimal schedules. Most simulations use a log- normal distribution of skills which matches roughly the single moded empirical distribution but has also an unrealistically thin top tail which leads to marginal rates converging to zero. Nobody has tried to use empirical distributions of income to perform simulations because the link between skills and realized incomes was never investigated in depth.4 The present study pays careful attention to this issue and presents simulations based on the empirical earnings distribution.

3. HIGH INCOME OPTIMAL TAX RATES

I show in this section that the classic method of the optimal linear income tax literature can be used to derive in a simple way an optimal tax rate formula for high income earners.

1. Revesz (1989) also attempted to express the Mirrlees formulas in terms of elasticities. 2. The link between Roberts (2000) and the present analysis is discussed in detail in Section 4. 3. Some of these results had been obtained by Atkinson (1990) in a more specialized situation. 4. However, Kanbur and Tuomala (1994) realized that it is important to distinguish between the skill

distribution and the income distribution when calibrating the distribution of skills. Their work improved significantly upon previous simulations. I come back to their contribution in Section 5.


I will consider that the govenment sets a flat marginal rate z above a given (high) income level zand then derive the welfare and tax revenue effects of a small increase in z using elasticities. The optimal tax rate z is obtained when a small change in the tax rate has no first-order effects on total social welfa~-e.5

3.1. Elasticity concepts

I consider a standard two good model. Each taxpayer maximizes a well-behaved individual utility function u = u(c, z) which depends positively on consumption c and negatively on earnings z. Individual skills or ability are embodied in the individual utility function. Assuming that the individual faces a linear budget constraint c = z(l - z) + R, where z is the marginal tax rate and R is virtual (non-labour) income. The first-order condition of the individual maximization programme, (1 - z)u, + u, = 0, defines implicitly a Marshallian (uncompensated) earnings supply function z = z(l - z, R). The uncompensated elasticity is defined such that

Income effects are captured by the parameter

The Hicksian (compensated) earnings function is the earnings level which minimizes cost c - z needed to reach a given utility level u for a given tax rate z and is denoted by zc = zC(l-2, u). The compensated elasticity of earnings is defined by

The two elasticity concepts and the income effects parameter are related by the Slutsky equation

The compensated elasticity is always non-negative and q is non positive if leisure is not an inferior good, an assumption I make from now on.

3.2. Deriving the high income optimal tax rate

The government sets a constant linear rate z of taxation above a given (high) level of income z. I normalize without loss of generality the population with income above zto one and I note h(z) the density of the earnings distribution at the optimum tax regime.6 To obtain the optimal z, I consider a small increase dz in the top tax rate z for incomes

5. Dahlby (1998) considered piecewise linear tax schedules and used the same kind of methodology to compute the effects of a general tax rate reform on taxes paid by a "representative" individual in each tax bracket. By specializing his results to a reform affecting only the tax rate of the top bracket, he derived a formula for the tax rate maximizing taxes paid by the "representative" individual of the top bracket that is a special case of the one obtained here.

6. Note that the density h(z) is endogenous to the tax schedule. I come back to this in detail later on.

SAEZ ELASTICITIES AND INCOME TAXES

-Before reform schedule - - After reform schedule

Z

0 Before tax income z

FIGUI~E1 High income tax rate perturbation

above z as depicted on Figure 1. This tax change has two effects on tax revenue. First, there is a mechanical effect, which is the change in tax revenue if there were no behavioural responses, and second, there is a reduction in tax revenue due to reduced earnings through behavioural responses. Let us examine these two effects successively.

Meclzanical effect. The mechanical effect (denoted by M ) represents the increase in tax receipts if there

were no behavioural responses. A taxpayer with income z (above 2) would pay (z - z)dz additional taxes. Therefore, summing over the population above z and denoting the mean of incomes above z by z,, the total mechanical effect M is equal to

M = [z,, - z]dz. ( 5 )

Behavioural responses. As shown in Figure 1, the tax change can be decomposed into two parts; first, an

overall uncompensated increase dz in marginal rates (starting from 0 and not just from z), second, an overall increase in virtual income dR = zdz. Therefore, an individual with income z changes its earnings by

where we have used definitions (1) and (2). The reduction in income dz displayed in equation (6) implies a reduction in tax receipts equal to zdz. The total reduction in tax receipts due to the behavioural responses is simply the sum of the terms zdz over all


individuals earning more than z

- zdz B = -(< "zm- f j ~ )---,

1-7

where 5"= Sr<~z)zh(z)dz/zmis a weighted average of the uncompensated elasticity. The elasticity term <& inside the integral is the average elasticity over individuals earning income z. Similarly, f j =STq(,)h(z)dz is the average income effect. Note that f j and 5" are not averaged with the same weights. It is not necessary to assume that people earning the same income have the same elasticity; the relevant parameters are simply the average elasticities at given income level^.^

In order to obtain the optimal tax rate, we must equalize the revenue effect obtained by summing (5) and (7) to the welfare effect due to the small tax reform. To obtain the welfare effect, let us consider g which is the ratio of social marginal utility for the top bracket taxpayers to the marginal value of public funds for the government. In other words, g is defined such that the government is indifferent between g more dollars of public funds and one more dollar consumed by the taxpayers with income above z. The smaller g, the less the government values marginal consumption of high incomes. Thus g is a parameter reflecting the redistributive goals of the government.

To compute the welfare effects, let us note u((1- z)z(l - z , R ) + R , z(l - 2, R)) , the individual utility at the optimum labour supply choice for a top bracket taxpayer. Using the envelope theorem, the effect of the small tax change on u is du = u,(-zdz + d R ) = -u,(z - z)dz where ( z-z)dz is the mechanical increase in individual tax. As a result and by definition of g, each additional dollar raised by the government because of the tax reform reduces on average social welfare of people in the top bracket by g. Thus the total welfare loss due to the tax reform is equal to gM. Consequently, the government sets the rate z such that, ( 1 - g ) M + B = 0. Thus, using (5) and (7), the optimal rate is such that

Equation (8) gives a strikingly simple answer to the problem of the optimal marginal rate for high income earners. Note that this formula does not require identical elasticities among taxpayers and thus applies to populations with heterogeneous preferences or elasticities. The only relevant behavioural parameters are the average elasticity 5"and average income effects f j for taxpayers with income above z. Unsurprisingly, the optimal rate z is a decreasing function of the social weight g put on high income taxpayers, the average elasticity C",and the absolute size of income effects - fj. Interestingly, the optimal rate is an increasing function of zn,/z. The ratio zn,/z is a key parameter for the high income optimal tax problem. This parameter depends on the shape of the income distribution and has not been studied in the optimal tax literature.

If the distribution of income is bounded, then, when z is close to the top, the ratio zn,/z tends to one and thus, from (8),we deduce that the top rate must be equal to zero. This is the classical zero top rate result derived by Sadka (1976) and Seade (1977). The intuition for the result is straightforward. As can be seen comparing (5) and (7),close to the top, the mechanical increase in tax revenue M is negligible relative to the loss in tax

7. Note that, in deriving (7), I have implicitly assumed that the set of taxpayers who might jump discontinuously because of the small tax reform is negligible. This is expected to be true almost surely but constructing particular counter-examples might nonetheless be possible.

- - - - - - - - -

SAEZ ELASTICITIES AND INCOME TAXES 21 1

revenue B due to the behavioural response implying that the optimal rate must be close to zero.

3.3. Empirical earnings distributions and optimal top rate

To assess whether the zero top result is actually relevant, it is useful to examine the ratio z,/z using empirical earnings distributions. Figure 2 plots the values of the ratios z,,/z computed using annual wage income reported on tax return data for years 1992 and 1993 in the u . s . ~ On Figure 2, the ratios z,/a are reported as a function of z for incomes between $0 to $500,000 in the left panel and for incomes between $10,000 to $30 million in the right panel (using a semi-log scale). Figure 2 shows that the ratio is strikingly stable (and around 2) over the tail of the income di~tribution.~ As discussed above, the ratio must be equal to one at the level of the highest income. However, Figure 2 shows that even at income level $30 million, the ratio is still around 2. For example, if the second top income taxpayer earns half as much as the top taxpayer then the ratio is equal to 2 at the level of the second top earner. Thus the ratio might well come to one only in the vicinity of the top income earner. Consequently, the zero top result only applies to the very highest taxpayer and is therefore of no practical interest.

5

-

es4 *h?

-

' $ 0 $ 1 0 0 ~ $ 2 0 0 ~ $ 3 0 0 ~$ 4 0 0 ~$500K ' $ 1 0 ~ $IOOK $ ~ , & O K $10,b00K Wage income r Wage income z

FIGURE2

Ratio mean income above z divided by z, z,/z, years 1992 and 1993

From $150,000 to close to the very top, the ratio z,,/z is roughly constant around 2. This means that formula (8) can be applied by replacing z,,/z by 2 for any a above $150,000. Distributions with constant ratio zm/ z are exactly Pareto distributions. There- fore, the tails of empirical earnings distributions can be remarkably well approximated by Pareto distributions.1° More precisely, a Pareto distribution with parameter a > 1 is such that Prob(1ncome > z )= C/za for some constant C. For a Pareto distribution, z m / z is constant and equal to a/(a - 1). The higher a, the thinner is the tail of the income distribution. For the U.S. wage income distribution, the ratio z,,/z is around 2 and thus the parameter a is approximately equal to 2.

8. The public use tax files prepared yearly by the Internal Revenue Service have been used for this exercise. This data is particularly fitted for this type of computations because it oversamples high income taxpayers. As many as one third of the highest income earners in the U.S. are included in the sample. The ratios have been computed using the amounts reported on the line Wages, Salaries and tips of Form 1040. The sample has been restricted to married taxpayers only.

9. The ratio becomes noisy above $10 million because the number of taxpayers above that level is very small and crossing only one taxpayer has a non trivial discrete effect on the curves.

10. Pareto discovered this empirical regularity more than a century ago (see Pareto (1965)).


Assuming that the elasticity 5" and income effects f j converge as z increases, and assuming that the ratio z,/z converges (to a limit denoted by a/(a - l)), the optimal tax rate (8) converges. Using the Slutsky equation (4), the limiting tax rate can be written in terms of the limiting values of the elasticities t " and tCand the Pareto parameter a

In that case, the government wants to set approximately the same linear rate C above any large income level and thus C is indeed the optimal non-linear asymptotic rate of the Mirrlees problem."

The top rate C depends negatively of the thinness of the top tail distribution measured by the Pareto parameter a. This is an intuitive result, if the distribution is thin then raising the top rate for high income earners will raise little additional revenue. Interestingly, for a given compensated elasticity l C , the precise division into income effects and uncompensated rate effects matters. The higher are absolute income effects (-fj) relative to uncompensated effects ( tU) , the higher is the asymptotic tax rate C. Put in other words, what matters most for optimal taxation is whether taxpayers continue to work when tax rates increase (without utility compensation). In particular, though 5' is a sufficient statistic to approximate the deadweight loss of taxation, same values of cccan lead to very different optimal tax rates.

The case g = 0 corresponds to the situation where the government does not value the marginal consumption of high income earners and sets the top rate so as to extract as much tax revenue as possible from high incomes (soak the rich). Formula (9) specialized to the case g = 0 is the high income tax rate maximizing tax revenue. In the case with no income effects ( l c = ("1, this "Laffer" rate is equal to C = 1/(1 +a() with a around 2 for the U.S. This formula is a simple generalization of the well known formula for the flat tax rate maximizing tax revenue, 1/(1 + t ) , where 5 is the average elasticity over all taxpayers.

As the income distribution is affected by taxation, a may depend on 7. I show in Section 4 that, in the Mirrlees model, the parameter a is independent of C as long as f < 1 which implies that formula (9) can be applied using directly the empirical value of a. The intuition is the following. When elasticities are constant, changing the tax rate has the same multiplicative effect on the incomes of each high income taxpayer and therefore the ratio z,/z is unchanged. Empirically, in the U.S. a does not seem to vary systematically with the level of the top rate.12

There is little consensus in the empirical literature on behavioural responses to taxation about the size of high income elasticities. Some studies have found estimates in excess of 1 while others have found elasticities very close to zero. Gruber and Saez (2000) summarize the empirical literature based on U.S. tax reforms and discuss the reasons for discrepancies.13 They find elasticity estimates around 0.25 for gross income. It is unlikely, though not impossible, that the long-term compensated elasticity are bigger than 0.5. The uncompensated elasticity is probably even smaller.

Table 1 presents optimal asymptotic rates using formula (9) for a range of realistic values for the Pareto parameter of the income distribution, 5" and c', (the asymptotic elasticities) and g. Except in the cases of high elasticities, the optimal rates are fairly high. It is important to remember, though, that these optimal rates are the optimal tax rates on

11. This point is confirmed in Section 4. 12. See Saez (1999~)for an empirical examination. 13. The recent volume of Slemrod (2000) also provides a number of elasticity estimates for high incomes.


TABLE 1

Optimal tax rates for high income earners

Uncompensated elasticity = 0 Uncompensated elasticity = 0.2 Uncompensated elasticity = 0.5

Compensated elasticity Compensated elasticity Compensated elastiaty

0.2 0.5 0 8 0.2 0 5 0 8 0.5 0.8

(1) (2) (3) (4) (5) (6) (7) (8)

Panel A: Social marginal utility with infinite incorne g = 0 Pareto parameter

1.5 91 80 71 77 69 63 57 53 2 83 67 56 71 59 50 50 43 2.5 77 57 45 67 51 42 44 37

Panel B: Social marginal utility with injnite incorne g = 0.25 Pareto parameter

1 5 88 7 5 65 71 63 56 50 45 2 80 60 48 65 52 43 43 37 2.5 71 50 38 60 44 32 38 31

g is the ratio of social marginal utility with infinite income over marginal value of public funds. The Pareto parameter of the income distribution takes values 1.5, 2, 2.5. Optimal rates are computed according to formula (9).

income assuming that there are no other taxes distorting the leisure-consumption choice. Therefore, an optimal income tax rate z derived from (9) should be reduced to (1 - t ) . z in the presence of a consumption tax at rate t . Comparing the rows in Table 1 shows that the Pareto parameter has a big impact on the optimal rate. Pareto parameters for income distributions vary across countries, the parameter is low in the U.S. compared to most European countries or Canada. A thorough investigation of Pareto parameters across countries would be relatively simple to carry out and would provide an important piece of information for tax policy discussions. Comparing columns (2), (5 ) and (7) (or columns (3), (6), (a)), we see also that, at fixed compensated elasticity, the optimal rate is very sensitive to the size of income effects.

4. OPTIMAL NON-LINEAR INCOME TAX RATES

The last section considered only the problem of optimal tax rates at the high income end. In this section, I investigate the issue of optimal rates at any income level using the same elasticity method. In order to contrast my approach to the original Mirrlees approach, I first present briefly the Mirrlees (1971) model.

4.1. The Mirrlees model

In the model, all individuals have the same utility function which depends positively on consumption c and negatively on labour supply 1 and is noted u(c, 1). Individuals differ only in their skill level (denoted by n) which measures their marginal productivity. Earn- ings are equal to z = nl. The population is normalized to one and the distribution of skills is written F(n), with density f (n) and support in [ o , . ~ ) .c,, z, = nl,, and u, denote the consumption, earnings and utility level of an individual with skill n. The government cannot observe skills and thus is restricted to setting taxes as a function only of earnings,


c = z - T(z ) .The government maximizes a social welfare function

where G ( . ) is an increasing and concave function of utility. The government maximizes W subject to a resource constraint and incentive compatibility constraints. The resource constraint states that total consumption is less than total earnings minus government expenditures, E,

I note p the multiplier of the budget constraint ( 1 1) which represents the marginal value of public funds. The incentive compatibility constraints state that, for each n, the selected labour supply I , maximizes utility u(n1- T(n l ) , I ) , given the tax function. The derivation of the first-order condition for optimal rates is sketched in the Appendix. Note that in the model, redistribution takes place through a guaranteed income level (equal to - T(0) ) that is taxed away as earnings increase.

4.2. Optimal marginal rates

The general first-order condition Mirrlees obtained depends in a complicated way on the derivatives of the utility function u(c, I) which are not related in any obvious way to empirical magnitudes (see equation (22) in Appendix). Moreover, it is derived using powerful but blind Hamiltonian optimization. Thus, the optimal taxation literature has not elucidated the key economic effects leading to the optimal formula. In this subsection, I derive a formula for optimal tax rates using elasticities of earnings and show precisely the key economics effects behind the optimal tax rate formula.

4.2.1. Results and derivation. I denote by H ( z ) the cumulated income distribution function (the total population is normalized to one) and by h(z) the density of the income distribution at the optimum. I note g(z) the social marginal value of consumption for taxpayers with income z, at the optimum, expressed in terms of the value of public funds.14 It is again important to keep in mind that both h(z) and g(z) are endogenous to the tax schedule. I first present a simple preliminary result that is also useful to understand the relation between the income distribution and the distribution of skills in the Mirrlees economy.

Lemma 1. For any regular tax schedule T not necessarily optimal, the earnings function z, is non-decreasing and satis$es the following dijferential equation,

If equation (12) leads to in<0 then z, is discontinuous and (12) does not hold.

The proof, which is routine algebra, is presented in the Appendix. In the case of a linear tax (T" = 0 ) the earnings equation (12) simplifies to dzlz = (1 + <")dn/n. In the

14. This is G'(u)u,/p using the notation of the Mirrlees model


general case, a correction term in T" which represents the effect of the change in marginal rates is present. By definition, the income density and the skill density are related through the equation h(z) i =f (n). Consequently, for a given skill distribution and using Lemma 1 , we see that a non-linear tax schedule produces a local deformation of the income distribution density h(z).

In order to simplify the presentation of optimal tax rates formulas, I introduce h*(z) which is the density of incomes that would take place at z if the tax schedule T ( . ) were replaced by the linear tax schedule tangent to T ( . ) at level z.15 I call the density h*(z) the virtual density. Applying Lemma 1 to the linearized schedule, we have i * / z = (1 + r ) / n where z* is the derivative of earnings with respect to n when the linearized schedule is in place. By definition, we also have h*(z)i* =f (n). Thus h and h* are related through the following equation

Of course, the virtual density h* is not identical to the actual density h. However, because the density h at the optimum tax schedule is endogenous (changes in the tax schedule affect the income distribution through behavioural responses), there is very little inconvenience in using h* rather than h. Using h* is a way to get rid of the deformation component induced by the non-linearity in the tax schedule. In that sense and as evidenced by Lemma 1 , h* is more closely related than h to the underlying skill distribution which represents intrinsic inequality.

The following proposition presents the optimal tax formula expressed in terms of the behavioural elasticities (same notations as in the previous section) and the shape of the income distribution using the concept of virtual density h*.16

Proposition 1. The $first-order condition for the optimal tax rate at income level z* can be written as follows,

~ ' ( z * )=-( 1 1 - H ( z * ) ) z ( - g ( z ) ) e x p [ ~ z ( l - ) dz' h(z) dz. (14)5:. 1 - T'(z*) (;*I z * ~ * ( z * ) Z* 5(z,, z 1 -H(z*>

Alternatively, using the notations of the Mirrlees model, this equation can be rewritten as,

where

In equations (16) and (17), sub or superscripts (n) mean that the parameter is computed at the skill level n.

15. This linearized tax schedule is characterized by rate z= T'(z) and virtual income R = z - T(z ) -z(l - z). 16. The proof of the proposition makes clear why introducing h* is a useful simplification.


Obtaining (15) in the context of the Mirrlees model is possible using the Mirrlees first-order condition. This derivation is presented in the Appendix.17This rearrangement of terms of the Mirrlees formula is a generalization of the one developed by Diamond (1998) in the case of quasi-linear utility functions. This method, however, does not show the economic effects which lead to formula (14). Formula (14) can, however, be fruitfully derived directly in terms of elasticities using the same method as in Section 3. The formula is commented in the light of this direct derivation just after the proof.

Direct proof of Proposition 1. I consider the effect of the following small tax reform perturbation around the optimal tax schedule. As depicted on Figure 3, marginal rates

-Before reform schedule - - After reform schedule

Substitut~on Income effect effect

7* r*+dr* Before tax income z

FIGURE3

Local marginal tax rate perturbation

are increased by an amount d z for incomes between z* and z* + dz*. I also assume that d z is second order compared to dz* so that bunching (and inversely gaps in the income distribution) around z* or z* + dz* induced by the discontinuous change in marginal rates are negligible. This tax reform has three effects on tax receipts: a mechanical effect, an elasticity effect for taxpayers with income between z* and z* + dz*, and an income effect for taxpayers with income above z*.

Mechanical eflect net of welfare loss. As shown in Figure 3, every taxpayer with income z above z* pays dzdz* additional

taxes which are valued (1 -g(z))dzdze by the government therefore the overall mechanical

17. Revesz (1989) has also attempted to express the optimal non-linear tax formula of Mirrlees In terms of elasticities. His derivation is similar in spirit to the one presented in the Appendix.


effect M net of welfare loss is equal to18

M = dzdz* (1 -g(z))h(z)dz.

Elasticity effect. The increase d z for a taxpayer with income z between z* and z* + dz* has an elasticity

effect which produces a small change in income (denoted by dz). This change is the conse-quence of two effects. First, there is a direct compensated effect due to the exogenous increase d z . The compensated elasticity is the relevant one here because the change d z takes place at level z* just below z. Second, there is an indirect effect due to the shift of the taxpayer on the tax schedule by dz which induces an endogenous additional change in marginal rates equal to dT' = T"dz. Therefore, the behavioural equation can be written as follows

which implies

It is easy to see that 1 - T' + icz*T" >0 if and only if the curvature of the indifference curve at the individual optimum bundle is larger than the curvature of the schedule z - T ( z ) , or equivalently, if and only if, the individual second-order condition is strictly satisfied. Mirrlees (1971) showed that bunching of types occurs when this condition fails. I assume here that 1 - T' + ccz*T" >0. Note that this condition is always satisfied at points where T' (z*)20.

Introducing the virtual density h*(z*) and using equation (13), the overall effect on tax receipts (denoted by E) can be simply written as

where cFZ*,is the compensated elasticity at income level z*. The use of the virtual density h* is useful because it allows to get rid of the complication due to the endogenous change in marginal rate d T f = TVdz . In other words, one can derive the above expression for E without taking into account the endogenous change in marginal rates by just replacing h by h*.

Income effect. A taxpayer with income z above z* pays -d R = dzdz* additional taxes. So, taxpayers

above the small band [z*,z* +dz*] are induced to work more through income effects which reinforce the mechanical effect. The income response dz is again due to two effects. First, there is the direct income effect (equal to q d R l ( 1 - T ' ) ) . Second, there is an indirect elastic effect due to the change in marginal rates dT' = T"dz induced by the shift dz along the tax schedule. Therefore

T"dz dzdz*dz = -ccz- -1 - T ' q1-T'2

18. The tax reform has also an effect on h(z) but this is a second order effect in the computation of M.


which implies

dzdz*dz = -q

I - T ' + Z < ~ T " '

Introducing again the virtual density h*(z) to get rid of the endogenous rate change com-ponent and summing (18) over all taxpayers with income larger than z*, I obtain the total tax revenue effect due to income effects responses

As in Section 3, in deriving E and I, I have implicitly assumed that the set of taxpayers who might jump discontinuously because of the small tax reform is negligible. This amounts to assuming that only local incentive constraints bind at the optimum. Mirrlees (1971) proved that, assuming the single-crossing property holds, this is always the case except at bunching points.

Any small tax reform around the optimum schedule has no first-order effect on wel-fare. Thus the sum of the three effects M, E and I must be zero which implies

T' h*(z) dz]. (19)

Equation (19) can be considered as a first-order linear differential equation and can be integrated (see Appendix) using the standard method to obtain equation (14) of the proposition. Changing variables from z* to n, and using the fact that, by Lemma 1, z*h*(z*)(l + 5") = nf (n), it is straightforward to obtain equation (15) of Proposition 1. When changing variables from z* to n, an additional term 1+ 5" appears on the right-hand side to form the term A(n) of equation (15). This counterintuitive term (higher uncompensated elasticity should not lead to higher marginal rates) should be incorporated into the skill distribution ratio (1 -F)/(nf) to lead to the income distribution ratio (1 -H)/(z*h*). Expressing optimal tax formulas in terms of the skill distribution instead of the income distribution can thus be misleading. I /

4.2.2. Interpretation and implications.

Interpretation of Proposition 1. In the light of this direct proof, let us analyse the decomposition of optimal tax rates

presented in Proposition 1. Analysing equation (14), it appears that three elements deter-mine optimal income tax rates: the shape of the income (or skill) distribution, elasticity (and income) effects, and social marginal weights.

Shape of income distribution. The shape of the income distribution affects the optimal rate at level z* mainly

through the term (1 -H(z*))/(z*h(z*)). The elastic distortion at z* induced by a marginal rate increase at that level is proportional to income at that level times number of people at that income level (z*h(z*)) while the gain in tax receipts is proportional to the number of people above z* (i.e. 1 -H(z*)). Therefore, the government should apply high marginal rates at levels where the density of taxpayers is low compared to the number of taxpayers

SAEZ ELASTICITIES AND INCOME TAXES 219

with higher income. This is obviously the case at the bottom of the income distribution because z*h(z*) is close to zero while 1 -H(z*) is close to one. At the top, for a Pareto distribution with parameter a, the ratio (1 -H)/(z*h) is constant and equal to l l a . From the evidence displayed in Section 3, we expect the ratio to converge to a constant close to 0.5 (remember that a is around 2) for large z*. Figure 4 presents the graphs of the ratio (1 -H(z))/(zh(z)) for years 1992 and 1993 as a function of z. These graphs are based on the same data and samples as the graphs of Figure 2. The ratios are U-shaped. The hazard ratio is very high for low incomes, it decreases until income level $80,000 and then increases until $200,000. Above $200,000, the ratio is indeed approximately constant, around 0.5, showing that the Pareto approximation is adequate. The fact that the ratio increases from $80,000 to $200,000 suggests that, with constant elasticities, optimal rates should be increasing in that range.

Wage income z

FIGURE4

Hazard ratio (1 -H(z))/(zh(z)),years 1992 and 1993

Of course, the ratio (1 -H)/(z*h) is endogenous (because of behavioural responses, changing the tax schedule may change the income distribution). Nevertheless, directly using the income distribution allows a better understanding of the optimal tax rate for-mula. In the numerical simulations presented in the following section, the endogeneity issue is solved by estimating an exogenous skill distribution based on the actual income distribution.

Substitution and income effects. Behavioural effects enter the formula for optimal rates in two ways. First, increasing

marginal rates at level z* induces a compensated response from taxpayers earning z*. Therefore, <&, enters negatively the optimal tax rate at income level z*. Second, this


marginal rate change increases the tax burden of all taxpayers with income above z*. This effect induces these taxpayers to work more through income effects which is good for tax receipts. Therefore, this income effect leads to higher marginal rates (everything else being equal) through the exponential term in (14) which is larger than one. Note that this term is identically equal to one when there are no income effects (this case was studied by Diamond (1998)).19

Social marginal welfare weights. The social marginal weights g(z) enter the optimal tax formula through the term

(1 -g(z)) inside the integral. Social marginal weights represent the relative value for the government of an additional dollar of consumption at each income level. More precisely, the government is indifferent between giving l/g(zl) additional dollars to a taxpayer with income z, or giving l/g(z2) dollars to a taxpayer with income 2,. These weights summarize in a transparent way the distributive objectives of the government. If the government has redistributive tastes, then these weights are decreasing in income. In that case, expression (1 -g(z)) in equation (14) is increasing in z. Therefore, taste for redistribution is unsurpris-ingly an element tending to make the tax schedule progressive. If the government had no redistributive goals, then it would choose the same marginal welfare weights for everybody and thus equation (14) can also be applied in the case with no redistributive concerns. The shape of the income distribution and the size of both substitution and income effects would still matter for the optimal income tax.

The original Mirrlees (1971) derivation relies heavily on the fact that there is a uni-dimensional skill parameter which characterizes each taxpayer. Mirrlees (1986) tried to extend the model to heterogeneous populations where individuals are characterized by a multidimensional parameter instead of a single dimensional skill parameter. He adopted the same approach as he used in his original 1971 study and derived first-order conditions for the optimal tax schedule. However, these conditions were even more complicated than in the unidimensional case and thus it proved impossible to obtain results or interpret the first-order conditions in that general case. The direct proof using elasticities shows that it is not necessary to introduce a unidimensional exogenous skill distribution to obtain for-mula (14). Therefore, formula (14) might, in principle, be valid for any heterogeneous population as long as &, and {&,are considered as average elasticities at income level

20z. It is, in fact, possible to recover formula (14) using the first-order conditions of the general multi-dimensional case derived in Mirrlees (1986). Therefore, the elasticity method could be a useful step to take to extend in a fruitful way the Mirrlees (1971) model to heterogeneous populations. One important caveat should be mentioned: formula (14) is valid only at points where the first-order condition characterizes the optimal schedule. The small literature on multi-dimensional screening models has shown that assessing whether first-order conditions characterize the optimum schedule is much more compli-cated in the multi-dimensional case because non-local incentive constraints are likely to bind in these problems (see the analysis of Rochet and Chone (1998)). Therefore, in the multi-dimensional case, without additional restrictive conditions, formula (14) might not be valid. The difficult analysis of the singularities in the multi-dimensional case is beyond the scope of the present paper.

19. The heuristic proof shows clearly why negative tax rates are never optimal. If the tax rate were negative in some range then increasing it a little bit in that range would decrease earnings of taxpayers in that range (because of the substitution effect) but this behavioural response would increase tax receipts because the tax rate is negatiue in that range. Therefore, this small tax reform would unambiguously increase welfare.

20. Equation (13) linking the virtual density h* to the actual density h can be generalized to the case of heterogeneous populations.


In any case, Proposition 1 suggests that the unidimensional skill distribution in the Mirrlees model should not be considered as a real economic element (which could be measured empirically) but rather as a simplification device to perform computations and numerical simulations. The skill distribution should simply be chosen so that the resulting income distribution is close to the empirical income distribution. This route is followed in Section 5.

Formula (14) could also be used to pursue a positive analysis of actual tax schedules. Considering the actual tax schedule T ( . )and the actual Income distribution H( . ) , and making assumptions about the patterns of elasticities 5;) and if,,,it is also possible to use equation (14) to infer the marginal social weights g(z). Even if the government does not explicitly maximize welfare, it may be interesting to know what are the implicit weights that the government is using. For example, if some of the weights appear to be negative then the tax schedule is not second-best Pareto effi~ient.~'

Links with previous studies. As discussed in Section 2, Roberts (2000) has obtained a formula equivalent to (19)

using also a perturbation approach. His perturbation induces all taxpayers in a small band of income to bunch at the upper end of the band. His derivation is perhaps less transparent than the present one because it is obtained using Taylor expansions and does not decompose the tax revenue changes into income and substitution effects. Moreover, his approach relies on the assumption that there is only one type of individual at each income level as in the Mirrlees (1971) model.

The derivation presented here is also close to the denland profile approach used in the literature on optimal nonlinear pricing for a regulated monopoly (see Wilson (1993)). The nonlinear price problem is formally equivalent to the optimal income tax problem with constant welfare weights g(z). Moreover, the non-linear pricing literature generally assumes away income effects. In that particular case, the non-linear pricing literature has been able to derive optimal pricing formulas directly in terms of demand profiles and express optimal pricing formulas as a simple inverse elasticity rule that is formally equival-ent to formula (14) with no income effects and constant weights g(z). In the income tax case, the demand profile elasticity becomes the elasticity of the number of taxpayers above a given income level z (i.e. 1 -H(z)) with respect to (one minus) the marginal rate at z (i.e. 1 - ~'(z))." In the case of the income tax problem, it is more convenient to express optimal tax formulas in terms of standard labour supply elasticities rather than the "demand profile" elasticity. Nonetheless, it is perhaps surprising that the optimal income tax literature before Diamond (1998) did not consider more seriously the case with no income effect which is standard in the nonlinear pricing literature because it is very convenient to solve and analyse.

Optimal asymptotic rates. It is possible to recover the high income optimal tax formula (9) from Section 3 using

equation (14) for large z * . ~ ~With large z,g(z) tends to g, and the ratio (1 -H)/(z*h*) tends to l / a when the tail is Paretian. Assuming that elasticities converge, the exponential term in (14) is approximately equal to (z/z*)' -6'/5' and thus the fact that h(z) is Paretian implies that the integral term in (14) tends to (1 -g)a/ [n -(1 -["/[')I. Putting together these results, one can obtain (9).

21. This analysis has been used frequently in the com~noditytaxation literature uhere it is known as the inverse optimum problem (see e.g . Ahined and Stern (1984)).

22. See Saez (199927) for more details. 23. Saez (1999a) d~scussesthis point in detail.


Diamond (1998) obtained this formula in the case with no income effects but expressed the formula in terms of the Pareto parameter of the skill distribution instead of the income d i~ t r ibu t ion .~~ Using Lemma 1, it can be shown that the Pareto parameter of the income distribution is equal to the Pareto parameter of the skill distribution divided by 1 + 5". This shows that, as discussed in Section 3, the Pareto parameter a is independent of the limiting tax rate in the Mirrlees model. Roberts (2000) also obtained an asymptotic formula that is close to equation (9). However, the basic methodology of Section 3 is a much easier way to obtain the same optimal tax rate result for high incomes than going through the asymptotics of the general formula.

5. NUMERICAL SIMULATIONS

5.1. Methodology

As we saw in the previous section, there are three key elements that determine optimal tax rates: elasticities, the shape of the income distribution, and the redistributive tastes of the government. -In the simulations, careful attention is paid to the calibration of each of these parameters.

Simulations are presented using utility functions with constant compensated elasticity 5'. This provides a useful benchmark because the compensated elasticity is the key parameter in empirical studies. Even though there is empirical evidence showing that elasticities may be higher at the low end and the high end of the income distribution (see e.g. Blundell (1992) and Gruber and Saez (2000)). it is useful to start with the case of constant elasticities in order to see how optimal tax rates should be set in that benchmark case. It is fairly simple to adapt the simulation methodology to the case of varying ela~ticities.'~

As we saw, for a given compensated elasticity, varying income effects affects optimal rates. As most, though not all, empirical studies find small income effects relative to substitution effects (see e.g. Blundell and MaCurdy (1999)), it is useful to consider the case with no income effects. Therefore, in the simulations, I use two types of utility functions with constant elasticities. With utility functions of Type I

there are no income effects. The elasticity (uncompensated and compensated) is equal to l lk . This case was examined theoretically by Atkinson (1990) and Diamond (1998).

Type I1 utility functions are such that

u = log (c) -log 1+ -----::';I. The compensated elasticity is equal to l l k but there are income effects. The uncompensated elasticity 5" can be shown to tend to zero when rz tends to infinity. Comparing the results of Type I and Type I1 utility functions will allow us to assess the impact of income effects on optimal schedules keeping constant substitution effects. It is important to keep in mind that the utility functions should be chosen so as to replicate the empirical elasticities and that 1 does not necessarily represent hours of work. As a result, Type I utility

24. That is why his table of high income optimal tax rates is not directly cornparable to the results presented in Table 1. He also confused a and 1+ a when selecting examples.

25, This is attempted by Gruber and Saez (2000) in a simpler four-bracket optimal income tax setting.


function, where 1 tends to infinity for large n, is clearly not realistic when 1 represents hours of work but is nevertheless appropriate if, as evidenced empirically, income effects are much smaller than substitution effects. As discussed in Section 3, there is controversy in the empirical literature about the size of substitution effects. I choose two values for the compensated elasticity parameters 5' = 0.25 and 5"= 0.5. These values fall within the middle range of empirical estimates.

I use the earnings distribution of year 1992from tax return data to perform numerical simulations. Formula (14) cannot be directly applied using the empirical income distribution because the income distribution is affected by taxation. Therefore, it is useful to come back to the Mirrlees formulation and use an exogenous skill distribution to perform numerical simulations. The main innovation is that the skill distribution is calibrated such that, given the utility function chosen and the actual tax schedule, the resulting income distribution replicates the empirical earnings distribution. Previous simulations almost always used log-normal skill distributions which match globally unimodal empirical distributions but approximate very poorly empirical distributions at the tails (both top and bottom tails). Moreover, changing the elasticity parameter without changing the skill distribution, as usually done in numerical simulations, might be misleading because changing the elasticities modifies the resulting income distribution and thus might affect optimal rates also through this indirect effect.

Optimal rates simulations are performed using two different social welfare criteria, Utilitarian and Rawlsian. Because for both types of utility functions, uc+O as n+m, g is always equal to zero and thus the asymptotic rates are the same with both welfare criteria. In the case of the Utilitarian criterion, social marginal weights g(z) are proportional to LL,which is approximately decreasing at the rate l/c. Optimal rates are computed such that the ratio of government spending E to aggregate production is equal to 0.25. The original Mirrlees (1971) method of computation is used and the details are presented in the Appendix.

5.2. Results

Optimal marginal rates are plotted on Figure 5 for yearly wage incomes between SO and $300,000. The curves represent the optimal non-linear marginal rates and the dashed hori- zontal lines represent the optimal linear rates (see below). As expected, the level of the optimal rates depends on the level of elasticities and on the type of the utility function. Tn all four cases, however, the optimal rates are clearly ~ - s h a ~ e d . ~ ~ Optimal rates are decreasing from $0 to $75,000 and then increase until income level $200,000. Above $200,000, the optimal rates are close to their asymptotic level. This U-shape pattern is strikingly close to many actual tax schedules. The high rates at the bottom obtained in the simulations correspond to the phasing-out of the guaranteed income level. As in actual systems, the simulations suggest that the government should apply high rates at the bottom in order to target welfare only to low incomes. Tn most countries, rates drop significantly once welfare programmes are phased-out and tax rates are in general increasing at high income levels because most income tax systems are progressive. In the simulations presented, tax rates increase at high income levels because of the shape of the income distribution (as discussed above) and because of the redistributive tastes of the government. Note that the increasing pattern of tax rates due to the U-shape pattern of the ratio

26. The rate at the bottom is not zero because labour supply tends to zero as the skill n tends to zero, violating one of the assulnptions of Seade (1977).

REVIEW OF ECONOMIC STUDIES

Utilitarian criterion, utility type I Utilitarian criterion, utility type I1

0-$0 $100,000 $200,000 $300,000 $0 $100,000 $200,000 $300,000

Wage income z Wage income z

Rawlsian criterion, utility type I Rawlsian criterion, utility type I1

3.% 0.4 -

2 0.2-

0 - 0 $0 $100,000 $200,000 $300,000 $0 $100,000 $200,000 $300,000

Wage income z Wage income 7,

FIGURE5 Optimal tax simulations

(1 -H)/(zh) cannot be obtained with a log-normal skill distribution because in that case, the ratio (1 -H)/(zh) is always decreasing. The increasing pattern of marginal rates at the high end depends of course on the assumption of constant elasticities and might be reversed if elasticities are increasing with income (Gruber and Saez (2000)).

As expected, the Rawlsian criterion leads to higher marginal rates. The difference in rates between the two welfare criteria is larger at low incomes and decreases smoothly toward 0 (the asymptotic rates are the same).

I have also reported in dashed lines on Figure 5, the optimal linear rates computed for the same utility functions, welfare criteria and skill distribution (the upper one corresponding to 5"= 0.25 and the lower one to 6' = 0.5). The optimal linear rates are also computed so that government spending over total earnings be equal to 0.25. Table 2 reports the optimal average marginal rates weighted by income in the non-linear case along with the optimal linear rate.27 The guaranteed consumption levels of people with skill zero (who supply zero labour and thus earn zero income) in terms of average income are also reported. As average incomes differ in the linear and non-linear cases, I also report (in parentheses) the ratio of the guaranteed income for the linear case to the guaranteed income for the non-linear case: this ratio allows a simple comparison between the absolute levels of consumption of the poorest individuals in the linear and non-linear case.

The average marginal rates are substantially lower in the non-linear cases than in the linear cases. The guaranteed levels of consumption are slightly higher in relative terms in the linear cases (than in the non-linear cases) but as average earnings are lower in the linear cases, the absolute levels are similar. Therefore, non-linear taxation is significantly

27. The asymptotic rate in the non-linear case is reported in parentheses.

SAEZ ELASTICITIES AND INCOME TAXES 22 5

TABLE 2

Numerical simulations o f optimal taxes

Utilitarian criterion Rawlsian criterion

Compensated elasticity Compensated elasticity

0.25 0.5 0,25 0.5

Non-linear Linear Non-linear Linear Non-linear Linear Non-linear Linear (1) (2) (3) (4) (5) (6) (7) (8)

Panel A: Utility Type I (no income effects) Optimal Average Rate 0.51 0.61 0.38 0.51 0.68 0.80 0.52 0.67 (Asymptotic Rate) (0.68) (0.51) (0.68) (0.51)

Guaranteed Income Level 0.33 0.36 0.21 0.26 0.55 0.55 0.42 0.42 (linear over non-linear level) (1.03) (1.09) (0.92) (0.87)

Panel B: Utility Type II (income effects) Optimal Average Rate 0.59 0.67 0.48 0.60 0.77 0.88 0.65 0.82 (Asymptotic Rate) (0.81) (0.69) (0.81) (0.69)

Guaranteed Income Level 0.40 0.42 0.31 0.35 0.60 0.63 0.50 0.57 (linear over non-linear level) (1.00) (1.01) (0.92) (0.92)

In the non-linear case, optimal rates are averaged with income weights; asymptotic rates are reported in parentheses below average rates. The guaranteed income level is expressed in percentage of average income. The ratio of the absolute guaranteed level in the linear case over the absolute guaranteed level in the non-linear case is reported in parentheses.

more efficient than linear taxation to redistribute income. In particular, it is better from an efficiency point of view to have high marginal rates at the bottom (which corresponds to the phasing out of the guaranteed income level).

Mirrlees (1971) found much smaller optimal marginal rates in the simulations he presented. Rates were slightly decreasing along the income distribution and the levels around 20% to 30%. The smaller rates he found were the consequence of two effects. First, the utility function he chose (u = log (c) + log (1 - 1)) implies high elasticities. Income effects are constant with q = -0.5 and compensated elasticities are large with 5"decreasing from around 1 (at the bottom decile) to 0.5 (at the top decile). These high elasticities lead to low optimal tax rates. Second, the log-normal distribution for skills implies that the hazard ratio (1 -H(z))/(zh(z)) is decreasing over the income distribution and tends to zero as income tends to infinity. This implied a decreasing pattern of optimal rates.

Subsequently, Tuomala (1990) presented simulations of optimal rates using utility functions with smaller elasticities. As in Stern (1976) for the linear tax case, Tuomala (1990) used the concept of elasticity of substitution between consumption and leisure to calibrate utility functions. This concept does not map in any simple way into the concepts of income effects and elasticities used in the present paper. Tuomala's utility function implies that the compensated elasticity is around 0.5 but income effects are unrealistically large (q = -1) implying a negative uncompensated elasticity. Unsurprisingly, he found higher tax rates but the pattern of optimal rates was still regressive, from around 60% at the bottom to around 25% at 99-th percentile because of the shape of the skill distribution. Kanbur and Tuomala (1994) noticed that it is important to calibrate the log-normal skill distribution indirectly so that the income distribution inferred from the skill distribution matches the actual distribution. They obtained optimal tax rates substantially higher than previous simulations and closer to those presented here.

6. CONCLUSION

Using elasticities to derive optimal income tax rates is a fruitful method for a number of reasons. First, it is straightforward to obtain an optimal tax formula for high incomes.


The literature following Mirrlees (1971)on optimal income taxation had not been able to obtain this simple formula. Using elasticity estimates from the empirical literature, the formula for asymptotic top rates suggests that marginal rates for labour income should not be lower than 50% and may be as high as 80%. Second, the elasticity method has the advantage of showing precisely how the different economic effects come into play and which are the relevant parameters for optimal taxation. The original maximization method of Mirrlees (1971) did not allow such a simple economic interpretation. Third, because optimal tax formulas are expressed in terms of parameters that can be observed or estimated, numerical simulations can be performed and calibrated using the empirical income distribution.

The analysis can be extended in several ways. First, the ratios z,/z and ( 1 -H(z)) / (zh) introduced in Sections 3 and 4 are closely linked to optimal pattern of marginal rates and can be fruitfully examined using empirical income distributions. It would be interesting to compute these ratios for other years and countries to see whether the U-shape pattern is universal of specific to the U.S. Second, the general framework under which the approach used here to derive optimal tax rates is valid, needs still to be worked out precisely. Last, it might be fruitful to apply the same methodology to other tax and redistribution problems. In particular, the issue of optimal tax rates at the bottom of income distribution deserves more attention in order to cast light on the important problem of designing income maintenance programmes.

APPENDIX

Deriving the Mirrlees optimal tax formula

Each individual chooses 1 to maximize u(n1- T(n l ) , I), which implies, n(l - T'(z,,))u,+ ul = 0 . Differentiating u, with respect to n, we have duldn = - lu/ /n. Following Mirrlees (1971), in the maximization programme of the government, u,, is regarded as the state variable, I,, as the control variable while c, is determined implicitly as a function of u, and I,, from the equation u i = u(c,, I,,). The government maximizes equation (10) by choosing I , and u,, subject to equation (1 1) and duldn = - lu/ /n. Denoting by p and $(n) the corresponding multipliers, we obtain (see Mirrlees (1971),equation (33))

where T,,, = exp [-S~(l ,u, / (c , , l ,))/(su,(c,, 1,))dsI. y is defined such that y(u , I) = -lui(c, I) where c is a function of (u, I ) such that u = u(c, I). A superscript (n)means that the corresponding function is estimated at (c,, I , , u,).

Proof of Lemma 1 .

i , , / z ,= (ln+ni,)l(nl,,) and I , = 1 (w, , R,) where w, = n(l - T ' ) is the net-of-tax wage rate and R, = nl, - T(nl,,)-nl,,(l - T ' ) is the virtual income of an individual with skill n. I note I(w, R) the uncompensated labour supply function. Therefore

T' -n(n1,+ l ,)T"] +-a1 (nl,

. + l,)(nl, T") , dR

and rearranging

Using the definitions (1 ) and (2) along with the Slutsky equation (4)


and therefore

which is exactly (12).The second-order condition for individual maximization is i,ZO.Therefore, if (12) leads to 2, <0, this means that T' decreases too fast producing a discontinuity in the income distribution. / /

Proof of Proposition 1

In order to rewrite equation (2) in terms of elasticities, I first derive formulas for c,i'and q as a function of the utility function u and its derivatives. The uncompensated labour supply l (w, R ) is obtained implicitly from the first-order condition of the individual maximization programme, wu, + u1= 0. Differentiating this condition with respect to I , w and R leads to

Replacing w by -ul/u, , the following formulas for c and q are obtained

and using the Slutsky equation (4)

The first-order condition of the individual n(l - T f ) u ,+ ul = 0 implies n + ul /u , = nT' = -(ur/u,)T' / ( l - T ' ) . Therefore (22) can first be rewritten as follows

The first part of (25) is equal to A(n) iff - y l / u l = (1+ 5")/5 ' . v is defined such that y ( u , I ) = -lul(c, I ) where c is a function of (u , I ) such that u = u(c, 1). Therefore, using (23) and (24), simple algebra shows that - Y I / U I = (1 + 5") /5 ' .

The integral term of (25) is equal to Bin) if it is shown that

By definition of T,,, and expressing u f n ) / u ~ 'as an integral

I note J(s) = -(duY1/ds + ~,~u$>/s)/uY)the expression in (26) inside the integral. Now, u?) = u,(c,, I,), therefore du!?/ds = uE&+ &)is. From duld~z= -lul/n, I obtain u:)c,+ uj")i, = ti, = -l,uj"'/s. Substituting 6 from the latter into the former, I obtain du$)/ds= -[si, + ls]uIu,,/(su,) + uCli,.Substituting this expression for du$)/ds in J(s) and using again the expressions (23) and (24), we have finally

which finishes the proof. Note that on bunching intervals included in (n, m) , 2, = C, = 0, J(s) = 0,and all the preceding equations remain true, and thus the proof goes through. 1 1

Derivation of theformula for optimal rates (14) from formula (19)

I note

jZ= T' K ( i )= -q h*(i')dif


Equation (19) can be considered as a first-order differential equation in K(z), K'(z*) = D(z*)[C(z*)+K(z*)], where C(z*) =Jz[I -g(z)]h(z)dz and D(z*) = q/(z*c). Routine integration using the method of the variation of the constant and taking into account that K ( a )= 0, leads to

c"

K(z*) = -Iz* D(z)C(z) exp [-I:* D(z')dz']dz.

Integration by parts leads to

~ ( z * )= ~ ( z * ) .-1 c.(z) exp [ - I z* ~ ( i ) ~ / / d z --*

Differentiation of (27) leads directly to (14). I /

Numerical simulations

Separability of the utility function in labour and consumption simplifies the computations. Therefore, for Type I utility, I use u = c-lk+'/(k+ 1) . and G(u) = log(u) (in the utilitarian case). For Type I1 utilities, u = log(c) -log [I + lk+'/(k+ 1)] and G(u) = u (in the utilitarian case). For both types of utility functions. optimal rates are computed by solving a system of two differential equations in u(n) and p(n) = (n+ ul/u,)/yl. The system of differential equations can be written as follows

and dulrln = -lul/n.

The system of differential equations used to solve optimal rates depends on f (n) through the expression nf'(n)/ f (n).f (n) is derived from the empirical distribution of wage income in such a way that the distribution of income z(n) = nl(n) inferred from f(n) with flat taxes (reproducing approximately the real tax schedule) matches the empirical distribution. I check that the optimal solutiolls lead to increasing earnings z, which is a necessary and sufficient condition for individual second-order conditions (Mirrlees (1971)).

Acknowledgements. This paper is based on Chapter 1 of my Ph.D. thesis at MIT. I thank Mark Armstrong, Peter Diamond, Esther Duflo, Roger Guesnerie, Michael Kremer, James Mirrlees, Thomas Piketty, James Poterba, Kevin Roberts, David Spector, two anonymous referees, the RES 1999 Tour participants and numerous seminar participants for very helpful comments and discussions. Financial support from the Alfred P. Sloan Foundation is thankfully acknowledged.

REFERENCES AHMAD, E. and STERN, N. H. (1984), "The Theory of Reform and Indian Indirect Taxes", Journal of Public

Economics, 25, 259-298. ATKINSON, A. B. (1990), "Public Economics and the Economic Public", European Economic Review, 34,

225-248. BALLARD, C. L. and FULLERTON, D. (1992), "Distortionary Taxes and the Provision of Public Goods",

Journal of Economic Perspectives, 6, 1 17- 13 1. BLUNDELL, R. (19921, "Labour Supply and Taxation: A Survey", Fiscal Studies, 13, 15--40. BLUNDELL, R. and MaCURDY, T. (1999), "Labour Supply: A Review and Alternative Approaches", in

Ashenfelter, 0 . and Card, D. (eds.), Handbook of Labor Economics (Amsterdam: North-Holland). DALHBY, B. (1998), "Progressive Taxation and the Social Marginal Cost of Public Funds", Journal of Public

Economics, 67, 105-122. DIAMOND, P. (1998), "Optimal Income Taxation: An Example with a U-Shaped Pattern of Optimal Marginal

Tax Rates", American Economic Review, 88, 83-95. DIXIT, A. K. and SANDMO, A. (1977), "Some Simplified Formulae for Optimal Income Taxation", Scandinao-

ian Journal of Economics, 79. 417-423. GRUBER, J. and SAEZ, E. (2000), "The Elasticity of Taxable Income: Evidence and Implications" (NBER

Working Paper No. 7512). KANBUR, R, and TUOMALA, M. (1994), "Inherent Inequality and the Optimal Graduation of Marginal Tax

Rates", Scandinavian Journal of Economics, 96, 275-282. MIRRLEES, J. A. (1971), "An Exploration in the Theory of Optimal Income Taxation", Review of Economic

Studies, 38, 175-208. MIRRLEES, J. A. (1986), "The Theory of Optimal Taxation", in Arrow, K. J. and Intrilligator, M. D. (eds.),

Handbook of Mathematical Economics (Amsterdam: North-Holland).


PARETO, V. (1965) Ecrits sur la Courbe de la Ripartition de la Richesse (Genkve: Librairie Droz). PIKETTY, T. (1997), "La Redistribution Fiscale face au ChBmage", Revue Fran~aise d'Economie, 12, 157-201. REVESZ, J. T. (1989), "The Optimal Taxation of Labour Income", Public Finance, 44, 453-475. ROBERTS, K. (2000), "A Reconsideration of the Optimal Income Tax" in Hammond, P. J. and Myles, G. D.

(eds.), Incentives and Organization: Papers in Honour of Sir James Mirrlees (Oxford: Oxford University Press).

ROCHET, J.-C. and CHONE, P. (1998), "Ironing, Sweeping, and Multi-dimensional Screening", Econornetrica, 66, 783-826.

SADKA, E. (1976), "On Income Distribution, Incentive Effects and Optimal Income Taxation", Review of Economic Studies, 42, 261-268.

SAEZ, E. (1999a), "Using Elasticities to Derive Optimal Income Tax Rates" (Chapter 1, MIT Ph.D. Thesis). SAEZ, E. (1999b), "A Characterization of the Income Tax Schedule Minimizing Deadweight Burden" (Chapter

2, MIT Ph.D. Thesis). SEADE, J. K. (1977), "On the Shape of Optimal Tax Schedules'?, Journal of Public Economics, 7, 203-236. SEADE, J. K. (1982), "On the Sign of the Optimum Marginal Income Tax", Review of Economic Studies, 49,

637-643. SLEMROD, J. (2000) Does Atlas Shrug? The Economic Consequences of Taxing the Rich (Cambridge University

Press). STERN, N. H. (1976), "On the Specification of Models of Optimal Taxation", Journal of Public Economics, 6,

123-162. TUOMALA, M. (1990) Optimal Income Tax and Redistribution (Oxford: Clarendon Press). WILSON, R. B. (1993) Nonlinear Pricing (Oxford: Oxford University Press).

You have printed the following article:

Using Elasticities to Derive Optimal Income Tax RatesEmmanuel SaezThe Review of Economic Studies, Vol. 68, No. 1. (Jan., 2001), pp. 205-229.Stable URL:


This article references the following linked citations. If you are trying to access articles from anoff-campus location, you may be required to first logon via your library web site to access JSTOR. Pleasevisit your library's website or contact a librarian to learn about options for remote access to JSTOR.

References

Distortionary Taxes and the Provision of Public GoodsCharles L. Ballard; Don FullertonThe Journal of Economic Perspectives, Vol. 6, No. 3. (Summer, 1992), pp. 117-131.Stable URL:

http://links.jstor.org/sici?sici=0895-3309%28199222%296%3A3%3C117%3ADTATPO%3E2.0.CO%3B2-X

Optimal Income Taxation: An Example with a U-Shaped Pattern of Optimal Marginal TaxRatesPeter A. DiamondThe American Economic Review, Vol. 88, No. 1. (Mar., 1998), pp. 83-95.Stable URL:

http://links.jstor.org/sici?sici=0002-8282%28199803%2988%3A1%3C83%3AOITAEW%3E2.0.CO%3B2-8

An Exploration in the Theory of Optimum Income TaxationJ. A. MirrleesThe Review of Economic Studies, Vol. 38, No. 2. (Apr., 1971), pp. 175-208.Stable URL:

http://links.jstor.org/sici?sici=0034-6527%28197104%2938%3A2%3C175%3AAEITTO%3E2.0.CO%3B2-V

Ironing, Sweeping, and Multidimensional ScreeningJean-Charles Rochet; Philippe ChonéEconometrica, Vol. 66, No. 4. (Jul., 1998), pp. 783-826.Stable URL:

http://links.jstor.org/sici?sici=0012-9682%28199807%2966%3A4%3C783%3AISAMS%3E2.0.CO%3B2-3

http://www.jstor.org

LINKED CITATIONS- Page 1 of 2 -

http://links.jstor.org/sici?sici=0034-6527%28200101%2968%3A1%3C205%3AUETDOI%3E2.0.CO%3B2-E&origin=JSTOR-pdf

http://links.jstor.org/sici?sici=0895-3309%28199222%296%3A3%3C117%3ADTATPO%3E2.0.CO%3B2-X&origin=JSTOR-pdf

http://links.jstor.org/sici?sici=0002-8282%28199803%2988%3A1%3C83%3AOITAEW%3E2.0.CO%3B2-8&origin=JSTOR-pdf

http://links.jstor.org/sici?sici=0034-6527%28197104%2938%3A2%3C175%3AAEITTO%3E2.0.CO%3B2-V&origin=JSTOR-pdf

http://links.jstor.org/sici?sici=0012-9682%28199807%2966%3A4%3C783%3AISAMS%3E2.0.CO%3B2-3&origin=JSTOR-pdf

On Income Distribution, Incentive Effects and Optimal Income TaxationEfraim SadkaThe Review of Economic Studies, Vol. 43, No. 2. (Jun., 1976), pp. 261-267.Stable URL:

http://links.jstor.org/sici?sici=0034-6527%28197606%2943%3A2%3C261%3AOIDIEA%3E2.0.CO%3B2-D

On the Sign of the Optimum Marginal Income TaxJesus SeadeThe Review of Economic Studies, Vol. 49, No. 4. (Oct., 1982), pp. 637-643.Stable URL:

http://links.jstor.org/sici?sici=0034-6527%28198210%2949%3A4%3C637%3AOTSOTO%3E2.0.CO%3B2-K

http://www.jstor.org

LINKED CITATIONS- Page 2 of 2 -

http://links.jstor.org/sici?sici=0034-6527%28197606%2943%3A2%3C261%3AOIDIEA%3E2.0.CO%3B2-D&origin=JSTOR-pdf

http://links.jstor.org/sici?sici=0034-6527%28198210%2949%3A4%3C637%3AOTSOTO%3E2.0.CO%3B2-K&origin=JSTOR-pdf

using elasticities to derive optimal income tax …people.tamu.edu/~ganli/publicecon/saez01.pdfin...

Documents