sufficient statistics for top end wealth...

85
Sufficient Statistics for Top End Wealth Inequality * Dan Cao Georgetown University Wenlan Luo Georgetown University July 2016 Abstract Are there simple and observable aggregate statistics that might help determine and forecast the degree of wealth inequality? Do they explain the differences in wealth inequality across economies and over time? To answer these questions, we build a general equilibrium, neoclassical growth model, in which the stationary wealth dis- tribution has heavy right tail with the Pareto tail index, θ . In the simplest cases of the model, the tail index depends on interest rate (r), growth rate ( g), aggregate la- bor income share (EY), and aggregate capital to output ratio, (KY), summarized in a formula θ = ˆ θ (r, g, EY, KY) (rather than the simple gap r - g, put forth in Thomas Piketty’s “Capital in the 21st century”). In addition, financial development, produc- tion technology, corporate and wealth taxes affect the tail index through their effects on these sufficient statistics. When we calibrate the model to the U.S. economy, we find that the model requires significant persistent heterogeneous returns to invest- ment in order to generate the tail index of U.S. wealth distribution. In this calibration, earnings inequality and, to a lesser extent, initial wealth distribution have negligi- ble effects on top end wealth inequality. The transition of the model economy after a uniform wealth destruction shock or changes in corporate tax (but not after a financial deregulation shock) produces the joint dynamics of EY, KY, and wealth inequality ex- perienced in major developed economies after World War II. Lastly, we find empirical evidence for persistent heterogeneous returns in the PSID surveys. Keywords: Top End Wealth Inequality; Pareto Wealth Distribution; Sufficient Statis- tics; Heterogeneous Returns; r-g; Corporate Tax; Financial Development * For useful comments and discussions we thank Daron Acemoglu, George Akerlof, Jinhui Bai, Alberto Bisin, Mary-Ann Bronson, Paco Buera, Martin Evans, Garance Genicot, Pedro Gete, Mark Huggett, Roger Lagunoff, Ben Moll, Thomas Piketty, Vincenzo Quadrini, Martin Ravallion, Richard Rogerson, John Rust, and participants at Georgetown macroeconomic seminar. We also thank Son Le for excellent research assis- tance and his initial involvement in the project. Corresponding author. Email address: [email protected] 1

Upload: trantu

Post on 06-Aug-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Sufficient Statistics for Top End Wealth Inequality∗

Dan Cao†

Georgetown University

Wenlan Luo

Georgetown University

July 2016

Abstract

Are there simple and observable aggregate statistics that might help determine

and forecast the degree of wealth inequality? Do they explain the differences in wealth

inequality across economies and over time? To answer these questions, we build a

general equilibrium, neoclassical growth model, in which the stationary wealth dis-

tribution has heavy right tail with the Pareto tail index, θ. In the simplest cases of

the model, the tail index depends on interest rate (r), growth rate (g), aggregate la-

bor income share (EY), and aggregate capital to output ratio, (KY), summarized in a

formula θ = θ (r, g, EY, KY) (rather than the simple gap r − g, put forth in Thomas

Piketty’s “Capital in the 21st century”). In addition, financial development, produc-

tion technology, corporate and wealth taxes affect the tail index through their effects

on these sufficient statistics. When we calibrate the model to the U.S. economy, we

find that the model requires significant persistent heterogeneous returns to invest-

ment in order to generate the tail index of U.S. wealth distribution. In this calibration,

earnings inequality and, to a lesser extent, initial wealth distribution have negligi-

ble effects on top end wealth inequality. The transition of the model economy after a

uniform wealth destruction shock or changes in corporate tax (but not after a financial

deregulation shock) produces the joint dynamics of EY, KY, and wealth inequality ex-

perienced in major developed economies after World War II. Lastly, we find empirical

evidence for persistent heterogeneous returns in the PSID surveys.

Keywords: Top End Wealth Inequality; Pareto Wealth Distribution; Sufficient Statis-

tics; Heterogeneous Returns; r-g; Corporate Tax; Financial Development

∗For useful comments and discussions we thank Daron Acemoglu, George Akerlof, Jinhui Bai, AlbertoBisin, Mary-Ann Bronson, Paco Buera, Martin Evans, Garance Genicot, Pedro Gete, Mark Huggett, RogerLagunoff, Ben Moll, Thomas Piketty, Vincenzo Quadrini, Martin Ravallion, Richard Rogerson, John Rust,and participants at Georgetown macroeconomic seminar. We also thank Son Le for excellent research assis-tance and his initial involvement in the project.†Corresponding author. Email address: [email protected]

1

1 Introduction

The recent empirical work by Piketty (2014), Saez and Zucman (2016), and others hasfound that the wealth distribution in major developed countries is highly unequal, espe-cially in the U.S. The distribution has thick right tails, which are well approximated byPareto Law. Are there simple and observable aggregate statistics that might help deter-mine and forecast the degree of wealth inequality, and relatedly the Pareto tail index? Dothey explain the differences in wealth inequality across economies and over time?

Piketty (2014) proposes a theory, formalized in Piketty and Zucman (2015), that thePareto tail index, θ, is determined by a simple sufficient statistics: the gap between interestrate, r, and the growth rate of the economy, i.e.,

θ = θ0(r− g),

for some function θ0(.). While this simple theory is very appealing and makes impor-tant predictions about the future of wealth inequality in major developed countries,1 ithas come under fierce criticism including Acemoglu and Robinson (2015), Jones (2015),Mankiw (2015), Krusell and Smith (2015), Ray (2015), and others. One criticism thatstands out is that, by focusing on r− g, Piketty (2014) implicitly assumes that the savingrate out of capital income is equal to 1.2 In this paper, we show that when we endogenizethis saving rate in a fully optimizing, general equilibrium model, the tail index is a func-tion of the model’s parameters and observable equilibrium variables other than, and inaddition to, the simple gap r− g.3 In the simplest cases, we have the following formula:

θ = θ1 (r, g, EY, KY) ,

where EY and KY are the aggregate labor income share and capital to output ratio, respec-tively. In general equilibrium, the observables, EY and KY, provide information aboutthe endogenous saving rate, as well as the implied aggregate capital accumulation andequilibrium rate of return to capital (or rates of return to capital if the rates differ acrossagents).

Returning to the question asked at the beginning of this paper, our formula for θ sug-

1For example, recent low growth in OECD countries should imply high wealth inequality in the nearfuture.

2Admittedly, the formula in Piketty and Zucman (2015) for top end inequality involves the (exogenous)average saving rate (their equation for ω between equations (15.3) and (15.5)), despite lack of emphasis.However, this exogenous saving rate might be unobservable, unlike r and g.

3Our results also apply under the assumption that the saving rate is exogenously given but is constantover time. What turns out to be more important is the general equilibrium implications of this saving rate.

2

gests that, although there might be useful sufficient aggregate statistics, in general, onewould need more information about the economy than r− g to determine and forecast topend wealth inequality. Relatedly, in general equilibrium, observable aggregates provideinformation about unobservable parameters that are valuable for such an enterprise.

Moreover, in the class of models that we consider, under the restrictive assumptionthat the agents in the model face the same rate of returns to capital, i.e., interest rate r, asassumed by Piketty (2014) and Piketty and Zucman (2015), the model, when calibratedto match the historical values of EY, KY, r, g in the U.S., has a tail index of the stationarywealth distribution that is too high. That is top end wealth inequality is too low comparedto the estimates from U.S. data. Heterogeneous rates of return to capital are indispensablein helping the model match the historical tail index.

Our model builds on a general equilibrium framework in continuous time and fea-tures idiosyncratic investment risk as in Quadrini (2000) and Angeletos (2007). In themodel, the agents share the same utility function but might differ in their investmentreturns, which are determined by their investment productivity.4 For tractability, we as-sume that at any instant, there are at most two levels of investment returns, high or low.In the special case of homogenous returns that we also consider, the two returns are thesame. The agents are subject to idiosyncratic shocks, as in Huggett (1993) and Aiya-gari (1994), with Poisson arrival rates. If an agent is hit by a shock, her return switchesfrom high to low, or low to high. In addition, to capture earnings inequality, we allowfor permanent heterogeneity in labor productivity across agents. There is a bond mar-ket that allows the agents to save or borrow, subject to a borrowing constraint. Giventhe idiosyncratic investment returns and the financial market structure, the agents makethe consumption, saving, and investment decisions to maximize their inter-temporal ex-pected utility, taking interest rate and wage rate as given. We study the properties of the(recursive) competitive equilibrium of the model, in which prices are determined suchthat relevant markets, bond and labor markets, clear at all time.

The model delivers several sets of results. First, in the stationary balanced growthpath of the model, the stationary wealth distribution has a right Pareto tail, with the tailindex determined by a simple quadratic equation. Following Piketty (2014), we definetop end wealth inequality as the tail index, θ (a lower tail index corresponds to more wealth

4Heterogenous investment returns in our model can have several interpretations. They might arisefrom different degrees of financial sophistication while investing in publicly traded financial instruments,as documented in Calvet, Campbell, and Sodini (2007), differential access to high return investments inprivate businesses, or from entrepreneurship as documented and modeled in Quadrini (2000) and Cagettand De Nardi (2006). While we do not take a stand on the precise channel leading to heterogeneous returns,our model has a direct entrepreneurship interpretation.

3

inequality). Piketty (2014) and Piketty and Zucman (2015) argue that top end inequalityis a function of the difference between interest rate and the growth rate of the economy.In our second set of results, we generalize this result (or rather intuition) to our modeleconomy with heterogenous investment returns. We show that

θ = θ2(gH − g),

where gH is the (endogenous) growth rate of wealth of the agent with (weakly) higher in-vestment returns, relative to the (endogenous) growth rate of the economy. In addition, θ2

is strictly decreasing, i.e., the higher the gap gH − g, the higher top end wealth inequality.The aforementioned formula contains endogenous and potentially unobservable vari-

ables. In two special cases, we obtain sharper characterizations and more user-friendlyformulae. First, in the special case of homogenous returns, we show that

θ = θ1a(r, g, n, EY, KY),

where n is population growth rate. Second, under strictly heterogenous investment re-turns but log utility and without population growth, we have

θ = θ1b(r, g, EY, KY).

Therefore, at least in these cases, top end inequality is determined by factors other thanthe simple difference between interest rate and growth rate of the economy. In addition,we show that, everything else being equal, lowering labor income share, or aggregategrowth rate, is associated with an increase in top end wealth inequality ( ∂θ1b

∂EY > 0 and∂θ1b∂g > 0), as discussed in Piketty (2014). However, an increase in capital to output ratio

corresponds to a decrease in top end wealth inequality ( ∂θ1b∂KY > 0), unlike the outcome

suggested by Piketty (2014).We then calibrate the model to the U.S. economy and show that the parameters can be

chosen to match important moments in the data, including the aggregate capital to out-put ratio, labor income share, and top end wealth inequality. In the calibration, in orderto generate a Pareto tail index of 2 for top end wealth inequality, we need strictly het-erogeneous returns with a difference of around 4% in un-levered returns (6% in leveredreturns) that lasts on average 10 years. In this calibrated model, in which top end wealthinequality is determined by heterogenous investment returns, we also look at the impor-tance of earnings inequality and initial wealth distribution at the beginning of the agents’lifetime - initial wealth inequality - on top end wealth inequality. We find that varying

4

earnings inequality and initial wealth inequality, while keeping the aggregate variablesconstant, do not alter top end wealth inequality as measured by the Pareto tail index, orto a lesser extent, the top wealth share statistics (top 0.1%, 1%, or 5% wealth shares).

Outside the stationary balanced growth path of the calibrated model, we examine thejoint dynamics of top end wealth inequality and the aggregate variables, capital to out-put ratio and labor income share, over transitional paths after a significant destruction ofphysical capital (which was experienced in major European countries during WWI andWWII) and after changes in corporate tax (which were experienced in the U.S. duringand after WWII and in the 1970s). We find that the dynamics of capital to output ra-tio and labor income share in the model have similar patterns as in the data (increasingcapital to output ratio and decreasing earning to output ratio over time). In addition,top end inequality follows a U-shape over the transitional paths, as also observed in thedata by Piketty (2014), however in much smaller magnitudes and at lower speeds. Wealso consider the transitional path after a relaxation of borrowing constraints, which cor-responds to financial deregulation in the 1980s in the U.S. Although relaxing borrowingconstraints increases top end wealth inequality, it leads to lower capital to output ratioand higher labor income share, neither of which is consistent with the evidence in U.S.data documented in Piketty and Zucman (2014).

Lastly, as the model calls for persistent heterogenous returns to match top end wealthinequality in the data, we find some evidence for persistent differences in returns towealth across households from the PSID surveys. For each household in the survey, wedefine returns as the ratio between the sum of financial income - asset income from farmand business, rent, interest, dividends, income from royalty and trust funds (excludingcapital gains because of data quality) - and core asset (the sum of net worth, excludingother debts, home equity, and vehicles). The resulting returns to wealth display great het-erogeneity and persistence. We also find that persistent heterogeneous returns can be in-ferred from wealth mobility. For example, households which have experienced episodesof high returns are also more likely to show upward wealth mobility in the future.

The current paper belongs to the theoretical literature on Pareto distribution for wealth.Early work started with Stiglitz (1969); more recently, Benhabib, Bisin, and Zhu (2011,2015, 2016), Moll (2012), Toda (2014), Achdou et al. (2015), Nirei and Aoki (2016), and oth-ers have provided interesting and important mechanisms to generate stationary wealthdistribution with Pareto tails.5 The Pareto tail index in these papers are given by formu-lae involving unobservable parameters and variables. For example, Benhabib, Bisin, andZhu (2011, 2015, 2016) provide formulae for the tail index that require knowledge of the

5Benhabib and Bisin (2016) provide an excellent survey of the literature.

5

entire distribution of stochastic investment returns and the entire demographic structureof the economy. In this context, Piketty (2014)’s and Piketty and Zucman (2015)’s “r− g”theory is compelling since one needs to observe only simple aggregate statistics from aneconomy to gauge the degree of wealth inequality in that economy. In this paper, fol-lowing the intuition in Piketty (2014) and Piketty and Zucman (2015), we not only showthat wealth distribution has a right Pareto tail as found in the earlier literature, but wealso attempt to go a step further by characterizing how the tail parameters vary with theunderlying parameters and observable aggregate variables in the economy. In generalequilibrium, observable aggregates provide information about unobservable parametersthat help determine and forecast wealth inequality. This mechanism is absent in partialequilibrium papers, such as Benhabib, Bisin, and Zhu (2011, 2015, 2016).

Our paper is also related to the growing literature studying idiosyncratic investmentrisks. In this literature, most papers, including Angeletos (2007), Toda (2014), Benhabib,Bisin, and Zhu (2015), Piketty and Zucman (2015), and Nirei and Aoki (2016), assumeI.I.D. investment risks, or, more precisely, investment returns.6 In this paper, we extendtheir analysis to allow for persistent idiosyncratic investment risks. From this perspective,our paper is closely related to Benhabib, Bisin, and Zhu (2011, 2016), Buera and Shin(2011), Moll (2012), Moll (2014), Achdou et al. (2015), Buera and Moll (2015), who considerpersistent idiosyncratic investment risks. Our paper can be seen as a simplification ofthese papers. We use a simple two-level idiosyncratic investment risk Poisson process,which allows us to obtain sharp analytical characterizations of top end wealth inequalityand transitional paths in general equilibrium with or without production.7

Heterogenous returns in our paper can be interpreted as returns to entrepreneurs ver-sus workers, as in the literature on entrepreneurship, including Quadrini (2000), Cagettand De Nardi (2006), and Buera (2009). In these papers, entrepreneurs have access tomore productive production technologies than non-entrepreneurs. This assumption canbe mapped into heterogeneous returns to investment in our model. However, unlikethese papers, we do not assume fixed-costs of starting up a business or decreasing re-turns to scale. This assumption allows us to preserve the homogeneity of the optimiza-tion problem of the agents, which simplifies the characterization of wealth distribution

6Piketty and Zucman (2015) assume I.I.D. saving rates, but this assumption is isomorphic to I.I.D. in-vestment risks assumption. The analytical results in Nirei and Aoki (2016) rely on I.I.D. investment shocks,but the authors relax this assumption in their quantitative investigation.

7Benhabib et al. (2011) and Gabaix et al. (2015, Online Appendix) provide analytical characterizationsof the Pareto tail index for wealth distribution under persistent investment shocks as we do, but only in apartial equilibrium setting in which the rates of returns are exogenous. General equilibrium might lead tointeresting comparative statics, such as a non-monotone effect of financial development on the tail index,as shown in Moll (2012) and in our Proposition 2.

6

and transitional paths.Another strand of literature starting with Huggett (1996) attempts to explain the high

degree of wealth inequality using models with idiosyncratic earnings shocks over thelifecycle of households. However, this class of models cannot match wealth inequality atthe very top unless one assumes very large temporary earnings shocks as in Castanedaet al. (2003). Using a similar model, Kaymak and Poschke (2016) show that increases inearnings inequality have accounted for most of the increase in wealth inequality in theU.S. since the 1970s. In our model, in which wealth inequality is driven by persistentheterogeneous investment returns, changing earnings inequality has negligible effects onwealth inequality. Benhabib, Bisin, and Zhu (2011) and Benhabib, Bisin, and Luo (2015)also emphasize this point in a partial equilibrium context.

Over transitional paths, Gabaix et al. (2015, Online Appendix) have recently arguedthat heterogenous returns give rise to fast transitional dynamics of top end wealth in-equality. However, in their analysis, the authors take the returns on wealth as exoge-nously given. In our paper, the returns are endogenously determined by the total supplyof capital and labor along transitional paths. We find that, despite significant return het-erogeneity, the transition of top end wealth inequality is rather slow.

Empirically, we find some evidence for heterogeneous and persistent returns to wealthin the data from PSID surveys. Fagereng et al. (2016) have found similar evidence in Nor-wegian household level data. Saez and Zucman (2016) have also found that, for founda-tions, total rates of returns - including unrealized capital gains - rise sharply with foun-dation wealth. Using Swedish household level data, Calvet, Campbell, and Sodini (2007)documented that more financially sophisticated households earn higher mean returnsfrom investments in publicly traded financial instruments.

The rest of this paper is organized as follow. Section 2 presents a general model withcapital accumulation, production, and labor. Section 3 presents a special case, the AKmodel, and results on top end inequality and transitional dynamics. In Section 4, we cal-ibrate the main model to the U.S. economy and investigate the quantitative implicationsof the model. Lastly, Section 5 presents empirical evidence for persistent heterogeneityin idiosyncratic investment returns and their implications for wealth mobility. Section 6concludes.

2 An Neoclassical Economy in Continuous Time

In this section, we develop a neoclassical growth model in continuous time as presentedin Acemoglu (2009) but with heterogenous agents facing idiosyncratic death shocks and

7

investment shocks, which determine the productivity of their capital investment. Theagents can finance their projects only by issuing risk-free debt, collateralized by theircapital.

2.1 The Environment

Time - t - is continuous and runs from 0 to ∞.8 The economy is populated by a continuumof agents that are indexed by h ∈ Nt = [0, Nt] where Nt is population size at time t.

Investment Productivity Let iht denote the individual state of agent h at time t. We

assume that iht follows a two-state Markov chain, ih

t ∈ I = L, H, which capture low(ih

t = L) and high investment productivity (iht = H), with Poisson switching rates λLH

and λHL from one state to the other.By the law of large numbers, the fraction of agents with high investment returns and

low investment returns are respectively

MH =λLH

λHL + λLH

ML =λHL

λHL + λLH= 1−MH.

In each instant, the agents can produce output using a constant-return-to-scale technol-ogy, which depends on the idiosyncratic state, using capital and labor9

Yt = Fiht(kh

t , lht ).

We assume that the high type is more productive than the low type.

Assumption 1. For any k, l > 0,

FH(k, l) > FL(k, l).

While in this paper we focus on the case in which there are only two-level of invest-ment productivities, we can extend easily the model to allow for more heterogeneity inthis dimension. In addition, when agents with lower productivity actively produce inequilibrium, this model is equivalent to one in which a competitive representative firm

8We have also studied and solved the model in discrete time. However, the solution for the equilibriumwealth distribution is much simpler in continuous time.

9The production function includes depreciation, i.e. Yt is net output. The CES production function inSection 4 is an example.

8

uses the less productive production function and entrepreneurs that have access to themore productive production. The macroeconomic literature on entrepreneurship (suchas Quadrini (2000) and Cagett and De Nardi (2006)) often embraces this interpretation.

Labor Productivity Each agent is also endowed with xht efficiency units of labor. We

assume that xht grows at the rate gx common across households: xh

t+t′ = xht egxt′ .10 We

assume that the initial labor productivity of an agent is determined when she is born.

Normalized by a constant trend Gx,t = egxt, xht

Gx,tis initially drawn from a distribution

defined over R+ with c.d.f Ψ(x) and mean 1.11

Death Shocks, Population Growth, and Redistribution Population Nt grows at a con-stant rate n ≥ 0 and the initial population is normalized to 1. We also assume that agentsare hit by “death shock,” arriving at Poisson rate λ > 0.Therefore, there are (n + λ)Nt∆tnew borns in each infinitesimal [t, t + ∆t] time interval. The investment productivity ofthe new born can be H or L with the fraction MH and ML respectively. The labor produc-tivity of the new borns, relative to the constant trend Gx,t, are drawn from the distributionΨ(.).

The total wealth of the dying agent is redistributed to the new borns according to aredistribution function

Γi,t(ψ).

That is, conditional on investment productivity type i, each agent receives a draw of frac-tion ψ from the distribution Γi,t (Pr (ψ ≥ ψ) = Γi,t(ψ)). The agent then obtains a fractionψ of the total wealth of the dying agents per new born. We also assume that the support ofΓi,t belongs to R+.

Since in each instant, there is a mass λ∆tNt of dying agents and a mass(λ + n

)∆Nt of

new borns, and the total wealth of the dying agents is redistributed fully, we must have

MH

∫ψdΓH,t(ψ) + ML

∫ψdΓL,t(ψ) = 1.

In most cases, we assume that the redistribution function preserves the total wealth ofagents with the same investment productivity.

10In Online Appendix E, we extend the model to allow for heterogenous but deterministic growth rateof labor productivity across agents. As shown in Gabaix et al. (2015), heterogenous growth rates of laborproductivity can potentially help explain high labor income inequality and a rapid rise in labor incomeinequality. However, we cannot allow for idiosyncratic productivity shocks over the agents’ lifetime sincethe optimization problem of the agents would no longer be homogenous in total wealth.

11The model does not exhibit the scale effect, so normalizing the mean of Ψ to 1 is without loss ofgenerality.

9

Assumption 2. The distribution function is type preserving i.e.

Mi

∫ψ

ψdΓi,t(ψ)Wt = Wi,t

for i = H, L, where Wt is aggregate financial wealth and Wi,t is the aggregate financial wealth ofall agents with investment productivity i (conditional aggregate).

Death shocks and redistribution are crucial in preventing the wealth distribution fromever expanding and give rise to a stationary distribution.12 The redistribution functionsΓi,t also correspond to wealth inequality at the beginning of the agents’ lifetime - initialwealth inequality. In the calibration of the model in Section 4, we use the estimates byHuggett et al. (2011) of wealth inequality at the ages 20-25 in PSID for Γi.

Utility, Constraints, and Optimization The agents share the same instantaneous utilityfunction

u(c) =

c1−σ

1−σ if σ 6= 1

log(c) if σ=1.

Let rt, et∞t=0 denote the sequence of interest rates and wages. Agent h chooses a

sequence of capital holding kht , bond holding bh

t , and consumption cht to maximize

maxch

t ,kht ,bh

t

E

[∫ ∞

0exp

(−(ρ + λ

)t)

u(

cht

)dt]

(1)

subject to

dwht

dt= max

lht

Fih

t

(kh

t , lht

)− etlh

t

+ rtbh

t + etxht − ch

t (2)

wht = kh

t + bht , (3)

where wht denote the financial wealth of agent h at time t. The budget constraint, (2), and

portfolio constraint, (3), are standard. At time t, change in financial wealth, dwht

dt , consistsof return from capital investment, interest from bond holding, rtbt, labor earning, etxh

t ,minus consumption, ch

t . Financial wealth, wht , is allocated to capital holding kh

t and bondholding bh

t . The agent is also subject to the borrowing constraint (5) defined below.Let Qt denote the present discounted value of future labor income for each agent, i.e.

12See Gabaix (2009) for other types of assumptions that guarantee the existence of a stationary wealthdistribution with Pareto tail such as a reflecting barrier.

10

human wealth,

Qht =

∫ ∞

0exp

(−∫ t′

0rt+t′1

dt′1

)et+t′xh

t+t′dt′.

Since xht grows at the rate gx, we also have Qh

t = qtxht , where

qt =∫ ∞

0exp

(−∫ t′

0

(rt+t′1

− gx

)dt′1

)et+t′dt′.

The dynamics of Qht and qt are then given by

dQht

dt= rtQh

t − etxht

anddqt

dt= (rt − gx) qt − et. (4)

Given Qht , we assume that the borrowing constraint takes the form

0 ≤ kht and 0 ≤ mkh

t +(

bht + Qh

t

), (5)

where m ∈ (0, 1). It is important that the borrowing constraint is in terms of borrowingplus the present discounted value of labor income so that the optimization problem of theagent is homogeneous, as we will see below.13

The definition of an equilibrium is standard.

Definition 1. For any initial distribution of wealth

wh0

h∈N0, a competitive equilibrium is

described by stochastic processes: the processes for interest rates and wage rates rt, et∞t=0,

wealth distribution

wht∞

t=0,h∈Nt, capital and bond holdings

kh

t , bht∞

t=0,h∈Ntand con-

sumption

cht∞

t=0,h∈Nt; such that

(i) The allocation

wht , kh

t , bht , ch

t

solves agent h’s maximization problem (1) given theprocesses for interest rates and wage rates.

(ii) Bond and labor markets clear in each instant∫h∈Nt

bht dh = 0,

and ∫h∈Nt

lht dh =

∫h∈Nt

xht dh.

Notice that by Walras’ Law, the market clearing condition in the bond market implies

13Angeletos (2007) makes a similar assumption on borrowing constraint, except for m = 0.

11

that market for consumption goods also clear, i.e total output equals to total consumptionplus total investment:

∫h∈Nt

Fiht

(kh

t , lht

)dh =

∫h∈Nt

cht dh +

∫h∈Nt

dwht

dtdh.

Together with the portfolio choice constraint (3), the market clearing condition in the bondmarket implies market clearing in capital market:∫

h∈Ntkh

t dh =∫

h∈Ntwh

t dh.

Given the homogeneous structure of the model, in the following subsection, we showthat a competitive equilibrium has a simpler representation. Indeed, the definition ofcompetitive equilibrium suggests that we need to keep track of the whole wealth dis-tribution to solve for the equilibrium. However, taking advantage of the homogeneityof the agents’ optimization problem, we show that only some aggregates are sufficientto determine the equilibrium in economy. Having solved for the equilibrium prices andpolicy functions, we can then go back to solve for the equilibrium wealth distribution.In this sense, solving for an equilibrium is “decoupled” into solving for the equilibriumaggregates and solving for the equilibrium wealth distribution.

2.2 Markov Equilibrium

To obtain sharper characterization of a competitive equilibrium, we first characterize theagents’ optimal dynamic strategies using the Hamilton-Jacobi-Bellman (HJB) equation fortheir value functions in a competitive equilibrium.

To simplify the notation, we use the following auxiliary functions, Ri(e) and Si(e):

Ri(e)k = maxlht

Fi(k, l)− el

and Si(e)k denote the maximizer. Because Fi has constant returns to scale, we have

Fi(k, Si(e)k) = Ri(e)k + eSi(e)k.

Ri(e) and Si(e) are basically the payments to capital and labor per unit of capital used inthe production function Fi.

Let V(t, iht , xh

t , wht ) denote the value function from the maximization (1) of agent h. The

following lemma provides the PDE that characterizes V.

12

Lemma 1. The HJB equation for V is

(ρ + λ

)V − ∂V

∂t= max

c,k,bu(c) +

∂V∂x

gxxt +∂V∂w

(Riht(et)k + rtb− c + etxt)

+ λiht ,−ih

t(V(t,−ih

t , xht , wh

t )−V(t, iht , xh

t , wht )) (6)

where the maximization problem is subject to the constraints wht = k + b and 0 ≤ k and 0 ≤

mk +(b + Qh

t)

(we adopt the standard notation that −H = L and −L = H for iht and −ih

t ).

Proof. Online Appendix C.

We conjecture and verify that the value function V has the form:

Vh(t, iht , xh

t , wht ) =

v(t, iht )(wh

t + xht qt)1−σ

if σ 6= 1

v(t, iht ) +

1ρ+λ

log(wh

t + xht qt)

if σ=1.

In addition, we work with bt = bt +Qht . Given the functional form for V, the HJB equation

for the agents, (6), becomes

(ρ + λ)v(t, iht )−

∂v(t, iht )

∂t= max

c,k,bHih

t(c, k, b; v(t, ih

t ))

+ λiht ,−ih

t(v(t,−ih

t )− v(t, iht )) (7)

where

Hi(c, k, b; t, v) =

u(c) + (1− σ)v(Ri(et)k + rtb− c) when σ 6= 1

log(c) + 1ρ+λ

(Ri(et)k + rtb− c) when σ = 1

(8)

and the maximization problem is subject to

1 = k + b and 0 ≤ k and 0 ≤ mk + b. (9)

This implies that the policy functions are linear in total wealth, i.e. financial wealthplus human wealth:

kht = k∗ih

t(t)(

wht + Qh

t

)bh

t = b∗iht(t)(

wht + Qh

t

)ch

t = c∗iht(t)(

wht + Qh

t

).

13

Let Wi,t =∫

h:iht =i wh

t dh denote the total wealth conditional on investment productivitytype i. The market clearing condition in the bond market becomes:

b∗H(t) (WH,t + MHGtNtqt) + b∗L(t) (WL,t + MLGtNtqt)− GtNtqt = 0, (10)

and in the labor market becomes:

SH(et)k∗H,t(WH,t + MHGtNtqt) + SL(et)k∗L,t(WL,t + MLGtNtqt) = GtNt (11)

which depends only on WH,t, WL,t, qt. The following lemma provides the equations thatdetermine the dynamics of these three variables.

Lemma 2. In a competitive equilibrium, the aggregate dynamics of aggregate conditional wealthsatisfy, for i ∈ H, L

dWi,t

dt= (k∗i,tRi(et) + b∗i,trt − c∗i,t) (Wi,t + qtMiGtNt)− λi,−iWi,t + λ−i,iW−i,t

+ λ

(Mi

∫ψ

ψdΓi,t(ψ)W−i,t −M−i

∫ψ

ψdΓ−i,t(ψ)Wi,t

)− dqt

dtMiGtNt − gxqtMiGtNt

(12)

and qt satisfies (4).

Proof. Online Appendix C.

The analysis above suggests that (WH,t, WL,t, qt) are sufficient endogenous state vari-ables to determine interest rate, wage rate and allocations at time t. The following defini-tion formalizes the intuition.

Definition 2. A Markov equilibrium is a competitive equilibrium in which the equilib-rium prices rt, et and policy functions, c∗i,t, k∗i,t, b∗i,t depend only on the aggregate state vari-ables (WH,t, WL,t, qt).

It is also important to characterize the dynamics of the aggregate growth rate of theeconomy and of the wealth shares of agents with the same investment productivity.

Indeed, let gt denote the growth of the aggregate total wealth, which is given by

gt =1

WH,t + WL,t + Qt

d (WH,t + WL,t + Qt)

dt,

14

and Xt is the wealth share of agents with high investment productivity:

Xt =WH,t + MH Ntqt

Wt + Qt.

The following result characterizes the dynamics of gt and Xt as functions of the agents’policy functions.

Lemma 3. In a competitive equilibrium, the dynamics of the growth rate of total wealth is givenby

gt =(

RH(et)k∗H,t + r∗t b∗H,t − c∗H,t)

Xt

+(

RL(et)k∗L,t + r∗t b∗L,t − c∗L,t)(1− Xt)

+ nQt

Wt + Qt. (13)

In addition, Xt satisfies

dXt

dt= (g∗H,t − g∗L,t)Xt(1− Xt)− λHLXt + λLH(1− Xt)

+ λ

(MH

∫ψ

ψdΓH,t(ψ)(1− Xt)−ML

∫ψ

ψdΓL,t(ψ)Xt

)− λMH ML

(∫ψ

ψdΓH,t(ψ)−∫

ψψdΓL,t(ψ)

)Qt

Wt + Qt

+ nQt

Wt + Qt(MH − Xt) , (14)

where g∗i,t denote the relative growth rate of type i agents:

g∗i,t ≡ Ri(et)k∗i,t + rtb∗i,t − c∗i,t − gt.

Proof. Online Appendix C.

From the expression for gt in (13) and the definition of g∗i,t above, we obtain

0 = g∗H,tXt + g∗L,t(1− Xt) + nQt

Wt + Qt. (15)

15

When Assumption 2 holds, equation (14) simplifies to

dXt

dt= (g∗H,t − g∗L,t)Xt(1− Xt)− λHLXt + λLH(1− Xt)

+ nQt

Wt + Qt(MH − Xt) . (16)

The results above fully characterize the aggregate dynamics of the economy withoutthe knowledge of the entire wealth distribution among the agents in the economy. Giventhe aggregate dynamics, the following lemma provides the PDEs that characterize theevolution of the wealth distribution over time. Let ωh

t denote the total wealth (financialplus human wealth) of agent h relative to the average total wealth in the population attime t:

ωht =

wht + qtxh

tWt+Qt

Nt

,

and let pi(t, ω) denote:pi(t, ω) = Pr(ωh

t ≥ ω, iht = i).

By the law of large numbers, pH(t, ω) (pL(t, ω)), is also the fraction of agents with highinvestment return (low investment return) and with relative total wealth exceeding ω.pH(t, ω) and pL(t, ω) satisfy the following Kolmogorov forward equation.

Lemma 4. pH and pL satisfy the following PDEs:

∂pi(t, ω)

∂t= −∂pi(t, ω)

∂ωω(

g∗i,t + n)−(λi,−i + λ + n

)pi(t, ω) + λi,−i p−i(t, ω)

+(λ + n

)Ji,t

(ω (Wt + Qt)

Wtλ

λ+n

,qtGtNt

Wtλ

λ+n

)(17)

whereJi,t(y1, y2) =

∫Γi,t (y1 − y2x) dΦ(x)

and pi also satisfies the boundary conditions:

limω→0

pi(t, ω) = 1 and limω→∞

pi(t, ω) = 0.

Proof. Online Appendix C.

16

2.3 Stationary Balanced Growth Path

In this subsection, we look for a stationary equilibrium as in Huggett (1993) and Aiyagari(1994), in which interest rate, wage rate, rates of return on investment, wealth redistribu-tion functions, and (relative) wealth distributions remain unchanged over time. However,the economy as a whole grows at a constant rate.

Definition 3. A stationary stationary balanced growth path, or balanced growth path(BGP) for short, is a Markov equilibrium in which interest rate, wage rate, rates of returnon investment, growth rate, and relative wealth distribution are constant over time.

The concept of stationary balanced growth path is standard (see for example Huggettet al. (2011) and Acemoglu and Cao (2015)). With strictly concave production functions,we can easily show that in a BGP the economy grows that the rate n + gx and Wi,t =

Wie(gx+n)t.Since interest rate and wage rate are constant over time, qt ≡ q, and by (4), q is deter-

mined by:0 = (r− gx) q− e. (18)

The value functions in (7) become v(t, i) ≡ vi, where vi satisfies

(ρ + λ)vi = maxc,k,bHi(c, k, b; vi) + λi,−i(v−i − vi) (19)

where

Hi(c, k, b; v) =

u(c) + (1− σ)v(Ri(e)k + rb− c) when σ 6= 1

log(c) + 1ρ+λ

(Ri(e)k + rb− c) when σ = 1

(20)

and the maximization is subject to the constraints (9).The dynamics of aggregate wealth, (12), simplifies to

(gx + n)Wi = (k∗i Ri(e) + b∗i r− c∗i ) (Wi + qMi)− λi,−iWi + λ−i,iW−i

+ λ

(Mi

∫ψ

ψdΓi(ψ)W−i −M−i

∫ψ

ψdΓ−i(ψ)Wi

)− gx qMi. (21)

The markets clearing conditions (10) and (11) become

b∗HX + b∗L (1− X) =q

W + q, (22)

17

andeSH(e)k∗HX + eSL(e)k∗L(1− X) =

eW + q

, (23)

where X = WH+MH qWH+WL+q .

The following theorem characterizes the BGP in the economy.

Theorem 1 (Stationary Balanced Growth Path). Assume Assumption 1 holds. Dependingon the parameters of the model, a stationary BGP is characterized by six equations: two HJBequations (19) for H- and L-agents, the two equations determining WH and WL, (21), the twomarket clearing conditions, (22) and (23).

However, the unknowns differ in three different cases:Case 1: r = RL(e). Then

(k∗H, b∗H

)=(

11−m ,− m

1−m

)and b∗L = 1− k∗L. The six unknowns

are vH, vL, k∗L, e, WH, WL.Case 2: RL(e) < r < RH(e). Then

(k∗H, b∗H

)=(

11−m ,− m

1−m

)and

(k∗L, b∗L

)= (0, 1).The

six unknowns are vH, vL, r, e, WH, WL.Case 3: r = RH(e). Then

(k∗L, b∗L

)= (0, 1) and b∗H = 1 − k∗H. The six unknowns are

vH, vL, k∗H, e, WH, WL.Lastly, in Case 1 and Case 2:

RH(e)k∗H + rb∗H − c∗H > RL(e)k∗L + rb∗L − c∗L.

Proof. Appendix A.In Case 1, given that r = RL(e), L-agents are indifferent between producing using

their production function and lending to the H-agents at interest rate r. In equilibrium,they do both, i.e. 0 ≤ k∗L ≤ 1 and k∗L, b∗L are determined by the equilibrium conditions.The same logic applies to Case 3.

One important implication of this theorem is that in Case 1 and Case 2, the agentswith the higher rate of returns on investment (H-agents) save more. While consumptioncan be higher or lower due to the opposite income and substitution effects, the effect tohigher rates of returns on saving is unambiguous.14 This result is consistent with Saezand Zucman (2016) who find that saving rates tend to rise with wealth.15

14The proof of this property is not straightforward due to the switching probability between the tworates of returns. When σ < 1, we can show that the substitution effect on consumption dominates theincome effects, therefore c∗H < c∗L so the result on saving rate is immediate. It is more difficult to prove theresult when σ > 1, so that c∗H > c∗L.

15We also find some empirical evidence for higher saving rates conditional on higher returns to wealthin PSID.

18

Theorem 1 does not tell us the conditions under which the equilibrium belongs to Case1, 2, or 3. For a special case of this model, the AK model in Section 3 and with log utility,Proposition 1 provides a complete characterization. We show that Case 1 happens whenm is low, Case 2 when m is intermediate, and Case 3 when m is high. We have verifiednumerically that the same pattern holds for the current model with production.

In the BGP characterized by Theorem 1, the PDEs for stationary wealth distribution,(17), become

0 = −∂ pi(ω)

∂ωω (g∗i + n)−

(λi,−i + λ + n

)pi(ω) + λi,−i p−i(ω)

+(λ + n

)Ji

(ω (WH + WL + q)

(WH + WL)λ

λ+n

,q

(WH + WL)λ

λ+n

), (24)

where Ji is defined in Lemma 4 (with Γi,t ≡ Γi).The following theorem characterizes the stationary wealth distribution.

Theorem 2 (Pareto Tail). Assume that the supports of Φ and Γi,t are bounded above, then in astationary BGP, we obtain the following properties of the stationary distributions.

1) The stationary distribution of total wealth (financial plus human wealth) has right Paretotail. More specifically,

p∗H(ω) = ΨHω−θ

p∗L(ω) = ΨLω−θ

for all ω ≥ ω for some ΨH, ΨL > 0 and ω > 0, and −θ is the negative root of the followingquadratic equation:(

λHL + λ + ng∗H + n

+ η

)(λLH + λ + n

g∗L + n+ η

)− λHLλLH(

g∗H + n) (

g∗L + n) = 0. (25)

2) The distribution of financial wealth follows an approximate power law with the same tail indexas the one for the distribution of total wealth, i.e. there exist d > d > 0 such that

dw−θ ≥ Pr

(wh

tWt/Nt

≥ w

)≥ dw−θ,

when w→ ∞, where θ is determined in Part 1.

Proof. Appendix A.

19

Similar results still hold if the supports of Φ and Γi are unbounded but the distribu-tions have thin tails, or Pareto tail with the tail index strictly higher than θ. However, weneed to relax the definition of Pareto tail to “asymptotic Pareto tail” as in Acemoglu andCao (2015).16

Following Piketty (2014), we define top end wealth inequality as the upper tail index, θ

in Theorem 2. Lower θ corresponds to fatter tail of the wealth distribution, i.e. higherdegree of wealth inequality.

The results in Theorem 2 deserve some discussion.

Homogenous Rates of Returns First of all, in the special case without heterogeneousinvestment risk, g∗H = g∗L = −n q

W+q and λHL = λLH = 0, by (25), the tail index is givenby

θ =

(1 +

λ

n

)(1 +

qW

). (26)

In the AK model presented in Section 3, q = 0, we recover the formula for the tailindex given in Jones (2015)

θ = 1 +λ

n.

As noted by Jones (2015), this formula contradicts the intuition put forth in Piketty (2014):in general equilibrium, lower population growth leads to lower top end wealth inequality.

With concave production functions, using (18) and the observation that eW = EY

KY , (26)implies that17

θ = θ1a(EY, KY, r; gx, λ, n

):=(

1 +λ

n

)(1 +

EYKY

1r− gx

). (27)

In Online Appendix F, we present a direct derivation of (27) in the standard neoclassicalgrowth model a la Ramsey-Cass-Koopman with labor productivity growth.18 We showthat

θ =λ + n

r− c− gx.

After solving out for consumption rate c using equilibrium conditions, we arrive at (27).Using the standard values for parameters λ, n, gx and the historical values of EY and

16A distribution density ϕ has asymptotic Pareto tail χ if for any ξ > 0 there exists B, B and x such thatBx−(χ−1+ξ)x < ϕ(x) < Bx−(χ−1−ξ)x for all x ≥ x.

17Notice that the tail index depends on r − gx and n separately, unlike what is conjectured by Piketty(2014), i.e. θ = θ0(r− g) where g = gx + n.

18In a companion note (Online Appendix II), we also present an equivalent formulation with exogenouslabor augmenting technological progress.

20

KY in the U.S. data (see Table 1), (27) implies a tail index of 22.33 which is too high com-pared to those observed in the U.S. data (which are between 1.4 and 2 over the years). Wewill show below that allowing for heterogenous returns in the model can help match theempirical tail index.

Heterogeneous Rates of Returns Returning to the general model with heterogenousreturns, equation (25) also tells us how θ depends on g∗H, g∗L and the transitional proba-bilities λHL, λLH and population growth. We obtain sharper characterizations of how θ

varies if we assume that there is no population growth and that wealth redistribution istype preserving.

Theorem 3 (Sufficient Statistics I). Assume n = 0 and Assumption 2 holds. In a stationaryBGP with k∗L > 0, i.e. Case 1 in Theorem 1, top end wealth inequality is a function of the relativegrowth rate of the high type and the persistent parameters:

θ = θ2(g∗H; λHL, λLH, λ).

In addition, top end wealth inequality increases in g∗H, i.e. ∂θ2∂g∗H

< 0.

Proof. Appendix A.

This is a generalization of the now famous result from Piketty (2014) that top endwealth inequality depends on “r − g”, which corresponds to formula (27) above. Withheterogenous investment returns, top end inequality depends on the relative growth rateof wealth of the high type agents, which in turn depends on their investment returns,equilibrium interest rate and endogenous consumption and portfolio choice decisions:

g∗H = RH(e)k∗H + rb∗H − c∗H − g.

The first two terms in the right hand side capture the rate of returns on the optimal port-folio of the H-type. The third term captures the fact that a fraction of the returns is con-sumed. This term is implicitly ignored by Piketty (2014) when he focuses on the gap r− gas the sole determinant of wealth inequality. This implicit assumption is criticized byMankiw (2015) and Ray (2015).

In a BGP, the aggregate labor income share and capital to output ratio are given by:

EY ≡ e(FH(1, SH(e)) + δ) k∗H (WH + MHQ) + (FL(1, SL(e)) + δ) k∗L(WL + MLQ)

,

21

and

KY ≡ WH + WL

(FH(1, SH(e)) + δ) k∗H (WH + MHQ) + (FL(1, SL(e)) + δ) k∗L(WL + MLQ).

How are KY and EY related to each other? By looking at the distribution of output be-tween wages and returns to capital, we obtain(

k∗HX∗

k∗HX∗ + k∗L(1− X∗)(RH(e) + δ) +

k∗L(1− X∗)k∗HX∗ + k∗L(1− X∗)

(RL(e) + δ)

)︸ ︷︷ ︸

average rate of return to capital

KY = 1− EY︸ ︷︷ ︸captital income share

.

(28)Notice that this identity is a generalization of the Piketty (2014)’s First Fundamental Lawof Capitalism: “rβ = α,” in his model with homogeneous returns. In our model with het-erogenous returns, capital income share is equal to the product of the weighted averageof the rates of return to capital of the two types of agents and the capital to output ratio.19

The following theorem links the top end inequality to the two aggregate statistics.

Theorem 4 (Sufficient Statistics II). Assume n = 0 and Assumption 2 holds and u(c) =

log c. In a stationary BGP with k∗L > 0, top end inequality θ is a function of the aggregatelabor income share, capital to output ratio, interest rate, together with the primitive parametersgx, δ, λHL, λLH, λ:

θ = θ1b(EY, KY, r; gx, δ, λHL, λLH, λ

).

In addition, ∂θ1b∂KY > 0, ∂θ1b

∂EY > 0 and ∂θ1b∂gx

> 0.

Proof. Appendix A.Because k∗L > 0, we are in Case 1 of Theorem 1 which implies that R∗H > R∗L, i.e.

returns are strictly heterogenous. Therefore, formula (27) for homogenous returns doesnot apply. Consequently, the comparative statics with respect to KY also differ in the twocases: ∂θ1a

∂KY < 0, while ∂θ1b∂KY > 0.

The last two comparative statics in this theorem are consistent with Piketty (2014)’sdiscussion: higher labor income share, or higher growth rate of the economy (since n = 0,

19The production functions Fi implicitly incorporate depreciation. So to measure total output - grossoutput - we add back depreciation to the net output given by Fi. This practice is more consistent with themodern formulation of the neoclassical growth models such as Cass (1965). Krusell and Smith (2015) showthat the distinctions between gross output versus net output and gross saving rate versus net saving rateare important for the predictions of the neoclassical growth models.

22

g = gx), corresponds to higher Pareto tail index and thus lowers top end wealth inequal-ity. However, the first comparative statics differ from Piketty (2014)’s prediction thathigher capital to output ratio is associated with higher wealth inequality.20

Theorem 4 makes restrictive assumptions such as no population growth and log util-ity, however it is still fairly general since it allows for any production function as well asany degree of financial friction. For example, changing m certainly changes the equilib-rium stationary BGP, as shown in Proposition 1 below. However, top end inequality onlychanges to the extent that EY, KY, and r change because of the change in m.

Theorem 4 shows that in general top end inequality depends on factors other than thesimple gap r − g suggested by Piketty (2014). If the assumptions of Theorem 4 are notsatisfied, one should expect that the determinants of the top end wealth inequality arefairly complex, and might not be summarized in a simple formula with simple aggregatestatistics as inputs.

3 AK Growth Model

In this section we assume that u(c) ≡ log c and Fi(k, l) = Aik where AH > AL > 0.We further assume type-preserving wealth redistribution, i.e. Assumption 2 holds, andno population growth. These assumptions allow us to sharply characterize the balancegrowth paths as well as transitional dynamics. In Online Appendix G, we also charac-terize the dynamics of the economy under both idiosyncratic and aggregate shocks toinvestment returns. We show that the aggregate dynamics depend crucially on the pro-cess of idiosyncratic shocks.

Although this section helps grasp some qualitative properties of the general model,impatient readers can skip directly to Section 4 in the first reading of the paper.

3.1 Stationary Equilibrium

Theorem 1 does not tell us which case happens depending on the exogenous parametersof the model. Under log utility, the following proposition completely characterizes the

20A casual observation of cross-country data on capital to income ratio and top end wealth inequalityfrom Piketty (2014), Piketty and Zucman (2014), Alvaredo and Saez (2009), and Saez and Zucman (2016)suggests that countries with higher top end wealth inequality might have lower capital to income ratio. Forexample the U.S. has the highest level of top end wealth inequality, but has lower capital to income ratio,compared to France, Spain, and the U.K. Our preliminary cross-country regression of top 1% wealth shareon KY, controlling for other factors, shows that the coefficient on KY is negative and significant.

23

equilibrium.21

Proposition 1. In a stationary BGP, one of the following three cases happens:Case 1: (Low m) If

ML

1 + AH−ALλHL+λLH

> m > 0, (29)

then r = AL and X∗ < 1−m.Case 2: (Intermediate m) If

ML > m >ML

1 + AH−ALλHL+λLH

, (30)

thenr = AH −

(1−m)λHL −mλLH

m∈ (AL, AH)

and X∗ = 1−m.Case 3: (High m) If

1 > m > ML, (31)

then r∗ = AH and X∗ > 1−m

Proof. Appendix B.1.

The case of log utility also allows us to determine in closed forms the Pareto tail indexof the stationary wealth distribution which is a solution to a quadratic equation similarto (25). Therefore, we can also characterize how the degree of financial friction affects thetail index, or equivalently top end wealth inequality.

Proposition 2 (Financial Kutznet’s Curve). In stationary BGPs, top end wealth inequalityvaries in m differently depending on which case in Proposition 1 the equilibrium belongs to:

Case 1 (Low m): Top end wealth inequality is increasing in m.Case 2 (Intermediate m): Top end wealth inequality is decreasing in m.Case 3 (High m): Top end wealth inequality is independent of m.

Proof. Appendix B.1.

Numerically, Moll (2012) finds a similar result in a production economy with a con-tinuum of investment productivity types and we borrow the term “Financial Kutznet’sCurve” from his analysis. As he explains, in Case 1, top end inequality is increasing in

21One minor difference relative to Theorem 1 is that the growth rate of the economy is now totallyendogenous, instead of being determined exogenously by gx and n.

24

m because of the “leverage effect,” i.e. higher m allows the more productive agents toborrow more at an interest rate determined by the rate of returns to the less productiveagents, this magnifies the differences in returns and increases wealth inequality. How-ever, at higher m, i.e., in Case 2, top end wealth inequality is decreasing because of the“return equalization effect,” i.e. higher m increases the demand to borrow by the moreproductive agents which pushes up the interest rate earned by the less productive agents(which is strictly higher than their own rate of return). This increase in interest rate re-duces the differences in returns and decreases wealth inequality. If m is even higher, onereaches Case 3 in which the interest rate earned by the less productive agents is pushedup to the rate of return of the more productive agents. There is essentially no wealthinequality (in a stationary BGP) in this case.

As Piketty (2014) suggests, a powerful tool to reduce wealth inequality is wealth tax.In the following proposition, we consider a proportional wealth tax, τ > 0. We assumethat the proceed from taxing wealth is distributed among the new borns. To simplify theanalysis, we assume that the redistribution of wealth tax proceeds is distributed withinthe same type, i.e. tax proceeds from the high-type agents is distributed to the new bornsof high type and similarly for low type.22 The advantage of this assumption is that theaggregate dynamics do not change. However, the following result should not depend onthe precise redistribution scheme.

Proposition 3 (Wealth Tax). Assume u(c) = log c. There exists τ > 0, such that the Pareto tailindex is decreasing in τ for τ ∈ (0, τ).

Proof. Appendix B.1. As also shown in the appendix, the effectiveness of the wealth taxin reducing wealth inequality depends on the fundamental of the model, such as m, λHL,λLH and λ.

At first sight, this appears to be a trivial result. However, as shown in Jones (2015),under homogenous returns, wealth taxes do not affect top end wealth inequality becauseof the general equilibrium effect. Therefore, this proposition shows that heterogenousreturns are important for the effectiveness of wealth taxes in reducing top end wealthinequality.

3.2 Transitional Path

In this subsection, we investigate the dynamics of the economy over transitional paths.The initial conditions are such that the economy does not start out at a stationary equilib-

22The formulation of the model with wealth tax is similar to the one with corporate tax presented inOnline Appendix D.

25

rium described in the Proposition 1.First, we focus on the aggregate dynamics. As shown in Lemma 3 (with Qt ≡ 0),

the aggregate dynamics of the economy can be summarized by the evolution of a singleendogenous variable, the conditional wealth share, Xt.

Proposition 4 (Transitional Path). In the long run, the economy converges to a stationary BGPdescribed in Proposition 1. In addition, the speed of transition depends on whether the stationaryBGP belongs to Case 1, 2, or 3 in the proposition. In particular:

Case 1: The speed of transition is given by

∆ =

√(AH − AL

1−m− λHL − λLH

)2

+ 4λLHAH − AL

1−m, (32)

that is Xt − X∗ ∝ exp(−∆t) when t→ ∞.Case 2 and Case 3: X∗ is reached in finite time.

Proof. Appendix B.2.

Armed with the characterization of the transitional dynamics in the last proposition,we now seek to understand how idiosyncratic return risks affect the speed of transitionof the aggregate economy. We focus on the more interesting case - Case 1 - in which thespeed of transition is well-defined and is given by (32).

Proposition 5. Assume that the investment productivity process is symmetric, i.e. λLH = λHL

and we are in Case 1 of the last proposition. Then the speed of transition is decreasing in thepersistence of the idiosyncratic return risks.

Proof. Appendix B.2.

This result echoes the findings in Buera and Shin (2013) and Moll (2014) who shownumerically that more persistence idiosyncratic shocks slow down the convergence speedof the aggregate economy. In our framework, we can show this result analytically. Inaddition, we also show that this result might not hold if the process for idiosyncraticinvestment risks is not symmetric.

4 Quantitative Implications

In this section, we calibrate the neoclassical growth model presented in Section 2. To in-vestigate the transitional path after changes in corporate tax, we use the modified version

26

Parameters Value Target/Source Target Modelm 1/3 Evans and Jovanovic (1989)γ 1-1/1.5 Piketty and Zucman (2015)n 0.01 Population growth rategx 0.02 Labor productivity growth rateδ 0.06 Depreciation rate, King and Rebelo (1999)λ 1/75 Life span

AL 1 NormalizationAH 1.1587 Interest rate 0.04 0.04α 0.2627 Labor income share, Piketty and Zucman (2015) 0.6 0.6

ρ + λ 0.0204 Capital output ratio, Piketty and Zucman (2015) 3.5 3.5λHL 0.1000 Average duration of being of high type 10 yrs 10 yrsλLH 9.6593e-05 Pareto tail index of wealth distribution 2.00 2.00

Table 1: Calibration Targets and Results for the Neoclassical Economy

of the model with corporate tax presented in Online Appendix D. The closed form ex-pression for the Pareto tail index of the stationary distribution allows us to target the tailindex in the data. In the calibrated model, we ask how labor earnings distribution andinitial wealth distribution matter quantitatively for top end wealth inequality. We also in-vestigate the joint dynamics of top end wealth inequality and the aggregate labor incomeshare and capital to output ratio over transitional paths of the calibrated economy aftercapital destruction shocks, corporate tax shocks, and financial deregulation shocks.

4.1 Main Calibration

For the main calibration, we assume that the production function has constant elasticityof substitution between labor and capital described as below,

Fi(k, l) = A(i)(αkγ + (1− α)lγ)1/γ − δk.

where A(i) is the productivity of entrepreneurs of type i ∈ L, H. A(i)’s are allowed tobe different across households. α and γ are the technology parameters governing factorshares and the elasticity of substitution between capital and labor. Capital is depreciatedat a constant rate δ. We also assume that corporate profits (after depreciation and wagepayments ) are taxed at 20%. The tax proceeds together with the wealth of the dyingagents are redistributed to the new born in a type-preserving fashion (see the details inOnline Appendix D).

Table 1 summarizes the parameters and the associated moments targeted. The cali-brated economy is at the equilibrium where the low type is actively producing. Therefore,

27

the interest rate is equal to the after-tax return of of capital for the agents of type L.We set the maximum fraction of capital that can be collateralized m = 1/3, taken

from the estimates by Evans and Jovanovic (1989). We set the elasticity of substitutionbetween labor and capital γ = 1− 1/1.5, which is in middle of the range documented byPiketty and Zucman (2015). As emphasized by Piketty and Zucman (2015), this elasticityallows for the positive co-movement between capital to output ratio and capital incomeshare over transitional paths. We set the annual depreciation rate δ = 0.06, which is thecommon value used in the business cycle literature (see e.g. King and Rebelo (1999)). Wenormalize the productivity of the low type AL = 1, and calibrate the productivity of thehigh type AH to match the annual interest rate of 0.04.

We calibrate the capital share α in the production function to match the aggregate laborincome share of 0.6. We set the discount rate ρ to match the aggregate capital outputratio 3.5. As we have shown in Theorem 4, the labor income share and capital outputratio help determine top end wealth inequality, i.e. the Pareto tail index. We target theaverage estimates of the Pareto tail index for the US economy inferred from Saez andZucman (2016)’s top wealth shares. Note that, as highlighted in Theorem 4, with theother parameters set in Table 1, the only remaining parameters that pin down wealthinequality are the degrees of persistence of investment returns given by λHL and λLH

. Indeed, having calibrated all other parameters, we choose λHL and λLH so that theaverage duration of being of more productive (type H) is 10 years and that the Pareto tailindex of the wealth distribution is 2.

The high level of top end wealth inequality comes from the persistent difference in therates of returns. In the calibration, the after-tax rate of returns of the high type is 8.13%(around 10.2% levered returns) compared to 4% interest rate earned by the low type (orby producing themselves). On average this return difference lasts for 10 years, i.e. 1

λHL.

The return difference and its persistence are very reasonable, yet they are able to deliverhigh top end wealth inequality.23

In Table 1, we have solved for the aggregate variables and the tail index without hav-ing to know the whole stationary distribution of wealth. To solve for the stationary dis-tribution of wealth as a solution to the PDEs (24), we need to choose the distribution ofinitial labor productivity Ψ as well as the redistribution functions Γi. We assume that xt

follows a log normal distribution with standard deviation σx:

Ψ ∝ exp(N(−σ2

x2

, σ2x

)).

23The return difference of around 6% is close to the difference between the 90th percentile and 10th per-centile returns estimated by Fagereng et al. (2016) using Norwegian household data.

28

In the baseline calibration, we choose σx = 1.35 so that the top 1% income share is 15% asdocumented by Piketty and Saez (2003).24

We also assume that the redistribution function follows a log normal distribution:

Γi(ψ) ∝ exp(N(

µi, σ2Γ

)).

The redistribution function also corresponds to the wealth inequality at the beginning ofthe agents’ life. Using PSID, Huggett et al. (2011) estimate the variance of the wealth tomean earning ratio of male household head age 20 to 25 to be around 0.849.

Column 4 (Baseline) in Table 2 shows other statistics for the wealth distribution inthe baseline calibration. Notice that our model not only matches the degree of wealthinequality observed in the U.S. data at the very top (Pareto tail index, top 0.1% and top1% wealth shares) but also matches the statistics at lower wealth levels (top 5% and top10% wealth shares, as well as the Gini coefficient and the fraction of agents with negativewealth). These moments of the wealth distribution are “out-of-sample” in our calibrationin the sense that we do not target them while calibrating the parameters of the model.

4.2 The Decomposition of Wealth Inequality

Given the calibrated model in the previous subsection, we can look at different factorsthat determine wealth inequality. In some sense, we carry out a “decomposition” exercise.We ask, among the factors that potentially affect wealth inequality - return heterogeneity,earnings inequality, and initial wealth distribution - which factors contribute the most towealth inequality. Because the aggregate dynamics are not affected by changing earningsinequality (the distribution Ψ of initial productivity) and initial wealth distribution (theredistribution functions Γi), we are able to provide a very clean answer to this questionin our model. First and foremost, top end wealth inequality as measured by the Paretotail index is not affected by these changes, a direct application of Theorem 2.25 But we arealso interested in other moments of the wealth distribution including top wealth shares,Gini coefficients, and the fraction of agents with negative wealth. The following twosubsections carry out this investigation.

24This level of the standard deviation of earning is higher than Guvenen and Smith (2014)’s estimate ofaround 0.3. However, since we do not model heterogenous growth rates in earning, our higher standarddeviation captures some of the growth rate heterogeneity estimated by Guvenen and Smith (2014). Wecan also use the model with heterogenous earning growth rates in Online Appendix E to match both thestandard deviation in levels and in growth rates in Guvenen and Smith (2014).

25Benhabib et al. (2011) and Benhabib, Bisin, and Luo (2015) also emphasize this point, but in an partialequilibrium model. Benhabib et al. (2011) do not investigate how changing earnings inequality affects othermoments of the wealth distribution beside the tail index but Benhabib, Bisin, and Luo (2015) do.

29

Table 2: Wealth Inequality Statistics As Functions of Income Inequality (Lognormal Dis-tribution)

Standard Deviation of xt

Wealth Statistics σx = 0.1 0.5 1.35 (Baseline) 1.5 2Top 0.1% wealth share 21.4% 21.5% 22.1% 22.2% 22.7%Top 1% wealth share 29.6% 29.7% 29.9% 30.0% 30.2%Top 5% wealth share 44.1% 44.1% 44.3% 44.4% 44.7%Top 10% wealth share 55.1% 55.2% 55.5% 55.6% 56.1%Gini coefficient 0.70 0.71 0.75 0.77 0.82Fraction with negative wealth 10.0% 9.9% 9.2% 9.4% 10.7%

Table 3: Wealth Inequality Statistics As Functions of Income Inequality (Pareto Distribu-tion)

Pareto exponent γx

Wealth Statistics γx = 3.0 2.5 2.0Top 0.1% wealth share 21.5% 21.6% 21.7%Top 1% wealth share 29.6% 29.7% 29.7%Top 5% wealth share 44.1% 44.1% 44.2%Top 10% wealth share 55.2% 55.2% 55.3%Gini coefficient 0.71 0.72 0.72Fraction with negative wealth 10.0% 9.9% 9.1%

4.2.1 Earnings Inequality

In this subsection we investigate the importance of earnings inequality for wealth in-equality. We consider two families of distributions for permanent earning: log normaland Pareto distributions.

First, we consider the log normal distribution of xt as in the baseline calibration. Table2 shows different statistics of the wealth distribution when we vary σx. As we can see inthe table, varying σx does not alter significantly the wealth distribution.

Second, we consider cases in which xt follows a Pareto distribution with Pareto expo-nent γx:

xt ∝ Pareto(γx).

From the top wealth shares estimates of Piketty and Saez (2003), we can back out theestimates of γx to be between 2 and 3 (higher exponent corresponds to lower earningsinequality). Table 3 shows different statistics of the wealth distribution when we varyγx. We can see again that varying earnings inequality by changing the exponent of theproductivity distribution does not changes wealth inequality significantly.

In a recent paper, Kaymak and Poschke (2016) show that rising earnings inequality

30

Table 4: Wealth Inequality Statistics As Functions of Initial Wealth DistributionStandard Deviation of log ψ

Wealth Statistics σ2Γ = 0.1 0.849 (Baseline) 2.0

Top 0.1% wealth share 21.6% 22.1% 23.4%Top 1% wealth share 25.6% 29.9% 38.5%Top 5% wealth share 31.9% 44.3% 60.1%Top 10% wealth share 38.6% 55.5% 73.2%Gini coefficient 0.63 0.75 0.90Fraction with negative wealth 3.8% 9.2% 20.0%

accounts for more than two thirds of the increase in wealth inequality in the U.S. fromthe 1970s until recent years.26 In their model, wealth inequality arises from very large,and temporary earnings shocks a la Castaneda et al. (2003). We have shown that in ourmodel in which wealth inequality is driven by heterogenous returns, changing earningsinequality matters little for wealth inequality.

4.2.2 Initial Wealth Distribution

In this subsection, we look at how wealth redistribution affects wealth inequality. Recallthat in our model, wealth redistribution functions, Γi, also corresponds to wealth inequal-ity at the beginning of the agents’ lifetime. Table 4 shows the statistics of the stationarywealth distribution when we vary σ2

Γ around the baseline value of 0.849 estimated byHuggett et al. (2011). We can see that varying the variance of the redistribution functiondoes not affect top 0.1% wealth share significantly. However, at lower percentiles, suchas the top 1% to the top 10%, as well as Gini coefficients and the fraction of agents withnegative wealth, changing wealth redistribution significantly affects these statistics.

4.3 Transitional Paths and Dynamics of Wealth Inequality

Piketty (2014) and Saez and Zucman (2016) show that top end wealth inequality in themajor advanced economies including the U.S. since 1900 until present follows U-shapepatterns, i.e. wealth inequality was very high in the beginning of the century then de-clined through the Great Depression, WWI and WWII, until the 1970s, and has been risingsince then. At the aggregate level, Piketty (2014) and Piketty and Zucman (2014) find thatthe aggregate wealth to income ratio and capital income share have increased in thesecountries since the 1970s. In Theorem 4, we argue that, in a stationary balanced growth

26Empirically, Saez and Zucman (2016) also argue for changing earning inequality as one of the maincauses of changing wealth inequality, the other being changing saving rates.

31

path, top end inequality and these aggregate variables are closely related.27 In this sub-section, we investigate the joint dynamics of top end wealth inequality and the aggregatevariables over transitional paths, starting from the calibration in the last subsection. Wefocus on three exercises. In the first one, we look at the dynamics after a large destructionof physical capital, which was experienced by the major European countries during WWIand WWII. In the second one, we study the dynamics after an (unexpected) increase, thena (unexpected) decrease in corporate tax, which resembles the U.S. experience of high cor-porate tax during WWII (to finance the war) and up until the 1970s and lower corporatetax thereafter (see the discussions in Piketty (2014) and Saez and Zucman (2016)). In thelast exercise, we look at the transitional path after an increase in m which corresponds tothe deregulation of the financial system in the U.S. since the 1980s.

Our solution of the transitional paths is facilitated by the decoupling of the dynamicsof the aggregate variables and the distribution of wealth. As shown in Lemma 2, we canfirst solve for the dynamics of (WH,t, WL,t, qt) over the transitional paths. Then we backout the dynamics of gt and Xt using Lemma 3, and use these variable to solve for the dy-namics of wealth distribution using Lemma 4. Wealth distributions over the transitionalpaths are computed using the finite difference method detailed in Achdou et al. (2015).

4.3.1 Capital Destruction

We consider the following thought experiment: before period 0, the economy is at itssteady state; at period 0, a unexpected shock hits the economy and destroys a same frac-tion, 20%, of financial wealth of everyone in the economy. We study the recovery to theinitial steady state following this wealth-destruction shock.

Figure 1 plots the Pareto tail index, aggregate growth rate, labor income share, andcapital output ratio along the transition path. We highlight several results: (1) Pareto tailindex gradually rises for the first several decades and declines afterwards, i.e. top endwealth inequality decreases then increases, (2) Labor income share decreases28 and (3)Capital output ratio increases.

These patterns are consistent with the joint dynamics of wealth inequality and capitalto income ratio and labor income share observed by Piketty (2014), Piketty and Zucman(2014), and Saez and Zucman (2016) . However, the magnitude of the changes in wealth

27To be more precise, in Theorem 4, wealth and earning are divided by total output (before depreciation)instead of total income (output after depreciation). But in the context of the model, these ratios are morenatural, as argued by Krusell and Smith (2015).

28Right after the shock, high labor income share leads to the initial rise in the tail index, i.e. reduction inwealth inequality, despite the widened return differences due to the higher levels of return to capital afterthe destruction of capital.

32

inequality are too small compared to the one observed in the data. Simple calculationsbased on Saez and Zucman (2016) show that the Pareto tail index varies between 1.4 and2 between the 1940s and recent years.

t0 100 200 300

Pare

to tail

index

1.98

1.99

2

2.01

2.02

2.03

2.04

t0 100 200 300

Inte

rest ra

te

0.04

0.042

0.044

0.046

0.048

0.05

t0 100 200 300

Ca

pital to

outp

ut ra

tio

3

3.1

3.2

3.3

3.4

3.5

t0 100 200 300

Labor

incom

e s

hare

0.6

0.605

0.61

0.615

0.62

Figure 1: Transition path after a 20% destruction of total capital stock.

4.3.2 Changing Corporate Tax

Now, we consider another experiment in which corporate tax rate increases unexpectedlyfrom 20% to 30% and stays there for 20 years. Then the tax rate decreases back to 20%,unexpectedly, and stays there forever. Figure 2 depicts the dynamics of top end wealthinequality and the aggregate variables over the transitional path. Similar to the previ-

33

ous experiment, Pareto tail index gradually rises for the two decades when corporate taxincreases then declines afterwards. However, the magnitude of variation is significantlylarger than in the previous experiment. The dynamics of capital to output ratio and laborincome share are also different. Capital to output ratio decreases then increases and laborincome share increases then decreases.

t0 100 200 300

Pare

to tail

index

1.95

2

2.05

2.1

2.15

2.2

t0 100 200 300

Inte

rest ra

te0.034

0.036

0.038

0.04

0.042

0.044

t0 100 200 300

Ca

pital to

outp

ut ra

tio

3.2

3.25

3.3

3.35

3.4

3.45

3.5

t0 100 200 300

Labor

incom

e s

hare

0.6

0.602

0.604

0.606

0.608

0.61

0.612

Figure 2: Transition path after an unexpected increase then decrease in corporate tax.

4.3.3 Financial Deregulation

In the last experiment, we examine the transitional path after m increases to 0.4, from itsinitial value of 1/3 in the baseline calibration. In the new steady-state the Pareto tail index

34

is around 1.85 compared to 2 in the baseline calibration, i.e. top end wealth inequalityincreases. As discussed in Proposition 2, we start in a region of parameters in which theleverage effect dominates the return equalization effect. Therefore an increase in m leadsto an increase in top end wealth inequality.

Figure 3 depicts the joint dynamics of top end wealth inequality and the aggregatevariables over the transitional path. While top end wealth inequality increases over timeas observed in U.S. data, the aggregate capital to output ratio decreases and labor incomeshare increases over time, unlike what is documented by Piketty and Zucman (2014).

t0 100 200 300

Pare

to tail

index

1.8

1.85

1.9

1.95

2

2.05

t0 100 200 300

Inte

rest ra

te

0.0395

0.0396

0.0397

0.0398

0.0399

0.04

t0 100 200 300

Cap

ital to

outp

ut ra

tio

3.3

3.35

3.4

3.45

3.5

t0 100 200 300

Labor

incom

e s

hare

0.6

0.601

0.602

0.603

0.604

0.605

0.606

Figure 3: Transition path after an unexpected increase in m.

We also notice that the Pareto tail index evolves very slowly over the transitional pathsin Subsections 4.3.1-4.3.3 despite the heterogeneous rates of returns. In a recent study,

35

Gabaix et al. (2015, Online Appendix) argue that heterogenous rates of returns should de-liver fast transition. However, the authors consider a partial equilibrium setting in whichthe rates of returns are given exogenously. While in our papers, the rates of returns areendogenously determined over transitional paths, which reduce the speed of transitionfor wealth inequality.

5 Empirical Analysis

The calibration in Section 4 shows that the model requires significant persistent hetero-geneous returns to generate the Pareto tail index of the U.S. wealth distribution. In thissection we provide some empirical support for the theoretical assumption of heteroge-neous and persistent returns to wealth. Using PSID surveys, we first show that there isgreat heterogeneity in returns to wealth across households. In particular, wealthy house-holds also receive higher returns. Then we show that the returns are persistent by mea-suring the correlation of returns to wealth across years; we also provide evidence that thepersistence has risen during the last several decades. Next we show that the majority ofheterogeneity is explained by households’ fixed effects instead of observable characteris-tics. Lastly, we show that the wealth mobility of different wealth classes is consistent withthe persistent heterogeneous returns to wealth.

We start by describing the PSID data and definition of variables. Then we present theempirical findings.

5.1 Data

The data of PSID keeps track around 5,000 original families and their descendants. Be-sides its excellent coverage on demographics and income information, the three waves in1984, 1989, and 1994 also include wealth supplements that provide information on house-hold wealth and its compositions. Starting from 1999, PSID becomes biennial and surveyswealth information every wave. The quality of wealth survey in PSID is regarded as quitehigh indicated by the high responsive rate of questions (see e.g. Hurst et al. (1998) for adiscussion).

Our analysis relies on the three early and all later waves that have wealth information.We define the key variables as following:

• Wealth: The sum of net worths29 of (1) home equity; (2) other real estate; (3) business

29PSID does not report worths and debts separately until 2013. Therefore, while in our theoretical anal-ysis return is defined on asset, we measure return on net wealth in the data.

36

and farms; (4) vehicles; (5) stocks; (6) savings, checking accounts, and certificate ofdeposits; (7) bonds, insurance and other assets; (8) IRA accounts; (9) minus otherdebts.

• Core Asset: Wealth defined above, excluding other debts, home equity, and vehi-cles.

• Asset Income: The following income components from head and wife: asset incomefrom farm and business, rent, interest, dividends, income from royalty and trustfunds.

We define the “Return to Wealth” as the ratio “Asset Income” / “Core Asset”. There areseveral important considerations:

1. We exclude capital gains. Though PSID surveys active saving starting from 1989(e.g. purchases and sales of real estate between survey waves), and in theory onecan impute the capital gains as a residual from change of stock variables and flowvariables. Since our focus is on correlation of returns across waves, the measure-ment error will bias downward our estimates. Therefore, we exclude capital gainsin our analysis and contemplate that the returns from capital gains should co-movewith the normal returns.30

2. We exclude home equity and vehicles when computing returns to wealth, becausethese two assets are in essence durable consumption and usually do not generatemonetary returns. The results we report are completely robust to this definition.

Additional description of the data can be found in Online Appendix H. Now we are readyto present the facts.

5.2 Heterogeneity and Persistence in Returns to Wealth

First, there is great heterogeneity in returns to wealth across households. Figure 4 plotsthe histogram of returns to wealth. As shown, most households report returns close to 0,but the distribution exhibits a long right tail.

One most noticeable feature of return heterogeneity is that the returns differ acrosswealth classes. Figure 5 presents the average annual returns to wealth for families in eachwealth quintile. While the bottom 20% families, on average, face an annual return less

30Similar to what we do, despite much more household level data from Norway, Fagereng et al. (2016)have to exclude capital gains.

37

02

04

06

0

Pe

rce

nt

−.2 0 .2 .4 .6

Return to wealth

Note: Return to wealth is defined as asset income divided by core asset (see text fordetailed definitions). This histogram pools all observations from wealth surveys of PSIDfrom 1984 to 2013, and excludes the top and bottom 1% returns as outliers.

Figure 4: Histogram of returns to wealth

than 2%, wealthy families in the top 20%, on average, receive an annual return more thandoubled. Note that since we exclude capital gains from the analysis, the overall return islow but the comparisons across groups should remain valid. The returns for families inevery quintile decrease overtime partly reflecting the declining interest rates during theperiods.

We exploit the panel information and compute the correlation of annual returns towealths between the adjacent two waves. The results are presented in Figure 6. Note thatfor waves 1984-1989-1994-1999, the gap between waves is 5 years. And from 1999 to 2013,the gap is 2 years. The correlation can only be comparable for pairs of waves with samenumber of gap years.

First of all, for all pairs of waves, the correlation is significantly positive, with 5-yearcorrelation at a level of around 0.3, and 2-year correlation at a level of 0.4. Second, the

38

02

40

24

1980 1990 2000 2010

1980 1990 2000 2010 1980 1990 2000 2010

Bottom 20% [20%,40%) [40%,60%)

[60%,80%) Top 20%

Average Annual Return 2SE Bands

Pe

rce

nt

Year

Note: Standard errors are based on bootstrap procedures with 100 repetitions.

Figure 5: Mean annual return to wealth by wealth quintiles. 1984-2013.

correlation is rising over time for the 1984-1989-1994-1999 periods. The rise is significantat one standard error level. The point estimate for persistence continues to rise after 1999,and only starts to drop after the financial crisis. It starts trending up again during therecovery.

How is the heterogeneity in returns to wealth correlated with other household char-acteristics? We regress return to wealth on a set of observable households characteristicsand the results are reported in Table 5 (Note that we do not attempt to claim any causalrelationships). The set of covariates that we are interested include demographics infor-mation such as the head’s age, race, and years of education and households’ financialinformation. Column (1) reports the regression pooling all valid observations together.As shown, returns to wealth do not differ across age groups or education groups, butwhite households, on average, earn a higher return than others. Home owners do not ap-pear to earn higher returns than renters; households who are actively engaged in business

39

Table 5: Returns to wealth and householdcharacteristics

Return to wealth

(1) (2)Age -6.31e-05 -0.00402

(0.000782) (0.00666)Age2 2.37e-06 2.03e-06

(7.56e-06) (1.35e-05)White 0.0128***

(0.00427)Year of education 0.000620 0.00531

(0.000748) (0.00344)

Home owner 0.00216 -0.00161(0.00461) (0.00792)

Has business 0.0594*** -0.00825(0.00488) (0.00789)

Has stocks -0.0205*** -0.0217***(0.00418) (0.00598)

Log family income 0.00650** 0.0114**(0.00264) (0.00457)

Year FE yes yesHousehold FE no yesObservations 25,003 25,003No. of Households 8,965R-squared 0.013 0.428

Notes: only households with the same head mem-ber across waves are included in the regressions.Return to wealth is defined as asset income di-vided by core asset. The bottom and top 1% of re-turns to wealth are removed as outliers. Column(1) pools all valid observations from year 1984 to2013. Column (2) includes households dummiesas regressors. Standard errors in parentheses arerobust and clustered by households.

40

.2.2

5.3

.35

.4.4

5.5

1984−1989

1989−1994

1994−1999

5−Year Correlation 2SE bands

.2.2

5.3

.35

.4.4

5.5

1999−2001

2003−2005

2007−2009

2011−2013

2−Year Correlation 2SE bands

Note: Standard errors are based on bootstrap procedures with 100 repetitions.

Figure 6: Correlation of return to wealth across waves. 1984-2013.

earn a higher return; households who participate in stock markets have a lower return.31

The key property of the regression results is that the covariates altogether only explaina small part of the observed heterogeneity in returns, as indicated by the small R-squarestatistics. The regression reported in Column (2) controls households’ fixed effects byadding dummies for each household in the regression. We can see that the fixed effectsabsorb most of the heterogeneity explained by observables. For example, being an en-trepreneur does not earn a higher return any more. The R-square statistics rises to 42.8%compared to less than 2% without fixed effects: households’ fixed effects explain the ma-jority part of returns heterogeneity. The fixed effects include households’ unobservablecharacteristics such as risk preferences, inherited traits of investment ability, or simplyluck. Finally, Figure 7 plots the histogram for the household dummies in the regressionreported in Column (2). The histogram exhibits similar patterns as the one for returns to

31This might be due to the fact that we do not account for capital gains while most of the returns tostocks come from capital gains.

41

wealth: the distribution has clustered mass around zero and a long right tail.We now discuss on linkage between the set of facts reported in this subsection and as-

sumptions in the theoretical models. First of all, the positive correlation between wealthclass and returns to wealth is consistent with the model implications. Even though in themodel we assume that the draws of investment return shocks do not depend on wealthlevels, since shocks are persistent, the correlation between wealth and productivity levelsmeasured at the stationary distribution is positive. It is worth emphasizing that this posi-tive correlation can only be generated with persistent investment returns rather than i.i.d.shocks. Second, the fact that the majority of returns heterogeneity comes from house-holds’ fixed effect instead of observable characteristics motivates our modeling choicethat the productivity draws do not depend on agents’ characteristics such as level of hu-man capital, i.e., labor productivity in our model.

05

10

15

Pe

rce

nt

−.5 0 .5 1

Figure 7: Histogram of household dummies in the return to wealth regression

42

5.3 Persistence in Returns Implied by Wealth Class Mobility

In this subsection, we show that the conditional wealth mobility is indicative of persis-tent returns to wealth. We exploit the panel feature of PSID and assess wealth mobilityby inspecting the transition probability from one wealth class to another across surveywaves.

We divide households into 3 wealth classes based on their net wealth. Each wealthclass consists of equal number of observations (weights adjusted) at each survey year. Wecall the bottom 1/3 households bottom class, middle 1/3 middle class, and top 1/3 topclass. We further categorize households based on the returns to wealth that they receivein the same year. We call households who receive returns lower than the median withineach wave the low return group, and who receive returns higher than the median thehigh return group. Then we compute the transition probability between wealth classesfrom one wave to the next, conditional on being of a return group in the initial year ofsurvey.

Table 6: 5-year Transition Probability Between Wealth Classes, 1984-1989-1994-1999

t→ Bottom Middle Top Bottom Middle Topt− s ↓ Low Return High Return

Bottom 0.62 0.33 0.05 0.43 0.46 0.11Middle 0.2 0.61 0.19 0.1 0.59 0.31

Top 0.07 0.24 0.7 0.02 0.1 0.88

Table 7: 2-year Transition Probability Between Wealth Classes, 1999-2013

t→ Bottom Middle Top Bottom Middle Topt− s ↓ Low Return High Return

Bottom 0.6 0.38 0.02 0.6 0.33 0.07Middle 0.14 0.71 0.15 0.06 0.66 0.29

Top 0.01 0.16 0.83 0.01 0.06 0.93

Tables 6 and 7 display the results. Each row corresponds to the wealth class in theinitial year and each column corresponds to the wealth class in the next year of survey.For example, in Table 6, 0.62 in the first row and first column means that the probabilityof being in bottom class in year t conditional on being in the bottom class and low returngroup in year t − s is 62%, where s is the number of gap years between waves. Thesestatistics are constructed for each wave of survey and then averaged across waves.

43

We highlight in bold pairs of statistics to compare. First, the upward mobility - theprobability of transiting from bottom class to top class - is higher if households receiveda higher return to wealth in the initial year. This indicates that households who initiallyreceived a higher return to wealth are likely to persist in their higher returns and movedupward the wealth class ladder. We would not expect to observe such patterns if returnsto wealth were i.i.d. draws. Similarly, the downward mobility - the probability of tran-siting from top class to bottom class - is lower if households received a higher return towealth in the initial year.

Table 8: 5-year Transition Probability Between Wealth Classes, 1984-1989-1994-1999

t→ Bottom Middle Top Bottom Middle Top

t− s ↓ Staying Low Low to HighBottom 0.57 0.38 0.05 0.41 0.49 0.1Middle 0.19 0.64 0.17 0.09 0.64 0.26

Top 0.08 0.25 0.67 0.04 0.23 0.73High to Low Staying High

Bottom 0.44 0.46 0.1 0.32 0.54 0.14Middle 0.14 0.55 0.31 0.06 0.59 0.35

Top 0.03 0.11 0.86 0 0.09 0.91

To strengthen the arguments that the persistence in wealth classes is indicative of thepersistence in returns to wealth, we compute the same transition probability further con-ditional on return groups in both initial and latter survey years. Table 8 displays the re-sults. "Staying Low" corresponds to households who started within the low return groupin year t− s and stay in the low return group in year t. Definitions are similar for "Lowto High", "High to Low", and "Staying High" groups. As shown, coming from a low re-turn group, households are more likely to transit out of the bottom class if they switch toa high return group. Similarly, coming from a high return group, households are morelikely to stay in the top class if they continue to receive high returns. Nevertheless, evenconditioning on return groups in both years, households who have received a higher re-turn in the initial year are still more likely to transit out of the bottom class or stay in thetop class. This finding indicates that the initial returns do persist for many years.

6 Conclusion

We put forth a tractable neoclassical growth, incomplete markets model with heteroge-nous investment returns. The model delivers a fat-tail distribution of wealth and can be

44

calibrated to match the Pareto tail index estimated from the U.S. data. We show that thetail index, i.e., depends on factors, observable and unobservable, other than the simplegap between the interest rate and growth rate in the economy put forth by Piketty (2014).Over the transitional paths, the model can replicate qualitative features of the joint dy-namics of wealth inequality and aggregate variables observed in the data. However, thespeed and magnitude of the changes in wealth inequality fall short of those in the data.We also find some empirical evidence supporting persistence differences in returns towealth from PSID surveys.

References

Acemoglu, D. (2009). Introduction to Modern Economic Growth. Princeton University Press.

Acemoglu, D. and D. Cao (2015). Innovation by entrants and incumbents. Journal ofEconomic Theory 157, 255 – 294.

Acemoglu, D. and J. A. Robinson (2015, February). The rise and decline of general lawsof capitalism. Journal of Economic Perspectives 29(1), 3–28.

Achdou, Y., J. Han, J.-M. Lasry, P.-L. Lions, and B. Moll (2015). Heterogeneous agentmodels in continuous time. mimeo.

Aiyagari, S. R. (1994). Uninsured idiosyncratic risk and aggregate saving. The QuarterlyJournal of Economics 109(3), 659–684.

Alvaredo, F. and E. Saez (2009). Income and wealth concentration in spain from a histor-ical and fiscal perspective. Journal of the European Economic Association 7(5), 1140–1167.

Angeletos, G.-M. (2007). Uninsured idiosyncratic investment risk and aggregate saving.Review of Economic Dynamics 10(1), 1 – 30.

Barro, R. and X. Sala-i Martin (2004). Economic Growth (2nd Edition ed.). MIT Press.

Benhabib, J. and A. Bisin (2016). Skewed wealth distributions: Theory and empirics.Working Paper 21924, National Bureau of Economic Research.

Benhabib, J., A. Bisin, and M. Luo (2015). Wealth distribution and social mobility in the us:A quantitative approach. Working Paper 21721, National Bureau of Economic Research.

Benhabib, J., A. Bisin, and S. Zhu (2011). The distribution of wealth and fiscal policy ineconomies with finitely lived agents. Econometrica 79(1), 123–157.

45

Benhabib, J., A. Bisin, and S. Zhu (2015). The wealth distribution in bewley economieswith capital income risk. Journal of Economic Theory 159, Part A, 489 – 515.

Benhabib, J., A. Bisin, and S. Zhu (2016). The distribution of wealth in the blanchard-yaarimodel. Macroeconomic Dynamics 20, 466–481.

Buera, F. J. (2009). A dynamic model of entrepreneurship with borrowing constraints:theory and evidence. Annals of Finance 5(3), 443–464.

Buera, F. J. and B. Moll (2015). Aggregate implications of a credit crunch: The importanceof heterogeneity. American Economic Journal: Macroeconomics 7(3), 1–42.

Buera, F. J. and Y. Shin (2011). Self-insurance vs. self-financing: A welfare analysis of thepersistence of shocks. Journal of Economic Theory 146(3), 845 – 862. Incompleteness andUncertainty in Economics.

Buera, F. J. and Y. Shin (2013). Financial frictions and the persistence of history: A quan-titative exploration. Journal of Political Economy 121(2), 221–272.

Cagett, M. and M. De Nardi (2006). Entrepreneurship, frictions, and wealth. Journal ofPolitical Economy 114(5), 835–870.

Calvet, L. E., J. Y. Campbell, and P. Sodini (2007). Down or out: Assessing the welfarecosts of household investment mistakes. Journal of Political Economy 115(5), 707–747.

Cass, D. (1965). Optimum growth in an aggregative model of capital accumulation. TheReview of Economic Studies 32(3), 233–240.

Castaneda, A., J. Diaz-Gimenez, and J.-V. Rios-Rull (2003). Accounting for the u.s. earn-ings and wealth inequality. Journal of Political Economy 111(4), pp. 818–857.

Evans, D. S. and B. Jovanovic (1989). An estimated model of entrepreneurial choice underliquidity constraints. Journal of Political Economy 97(4), 808–827.

Fagereng, A., L. Guiso, D. Malacrino, and L. Pistaferri (2016). Heterogeneity in returns towealth and the measurement of wealth inequality. Stanford University Working Paper.

Gabaix, X. (2009). Power laws in economics and finance. Annual Review of Economics 1(1),255–294.

Gabaix, X., J.-M. Lasry, P.-L. Lions, and B. Moll (2015, July). The Dynamics of Inequality.Working Paper 21363, National Bureau of Economic Research.

46

Guvenen, F. and A. A. Smith (2014). Inferring labor income risk and partial insurancefrom economic choices. Econometrica 82(6), 2085–2129.

Huggett, M. (1993). The risk-free rate in heterogeneous-agent incomplete-insuranceeconomies. Journal of Economic Dynamics and Control 17(5-6), 953–969.

Huggett, M. (1996). Wealth distribution in life-cycle economies. Journal of Monetary Eco-nomics 38(3), 469 – 494.

Huggett, M., G. Ventura, and A. Yaron (2011, December). Sources of lifetime inequality.American Economic Review 101(7), 2923–54.

Hurst, E., M. C. Luoh, F. P. Stafford, and W. G. Gale (1998). The Wealth Dynamics ofAmerican Families, 1984-94. Brookings Papers on Economic Activity 1998(1), 267–337.

Jones, C. I. (2015). Pareto and piketty: The macroeconomics of top income and wealthinequality. Journal of Economic Perspectives 29(1), 29–46.

Kaymak, B. and M. Poschke (2016). The evolution of wealth inequality over half a century:The role of taxes, transfers and technology. Journal of Monetary Economics 77, 1 – 25.“Inequality, Institutions, and Redistribution”held at the Stern School of Business,NewYork University,April 24-25, 2015.

King, R. G. and S. T. Rebelo (1999). Chapter 14 Resuscitating real business cycles. In J. B.T. a. M. Woodford (Ed.), Handbook of Macroeconomics, Volume 1, Part B, pp. 927–1007.Elsevier.

Krusell, P. and A. A. Smith (2015). Is piketty s second law of capitalism fundamental?Journal of Political Economy 123(4), 725–748.

Mankiw, N. G. (2015, May). Yes, r > g. so what? American Economic Review 105(5), 43–47.

Moll, B. (2012). Inequality and financial development: A power-law kuznets curve. Work-ing paper, Princeton University.

Moll, B. (2014). Productivity losses from financial frictions: Can self-financing undo cap-ital misallocation? American Economic Review 104(10), 3186–3221.

Nirei, M. and S. Aoki (2016). Pareto distribution of income in neoclassical growth models.Review of Economic Dynamics 20, 25 – 42.

Piketty, T. (2014). Capital in the 21st century. Harvard University Press.

47

Piketty, T. and E. Saez (2003). Income inequality in the united states, 1913-1998. TheQuarterly Journal of Economics 118(1), 1–41.

Piketty, T. and G. Zucman (2014). Capital is back: Wealth-income ratios in rich countries,1700-2010. The Quarterly Journal of Economics.

Piketty, T. and G. Zucman (2015). Chapter 15 - wealth and inheritance in the long run. InA. B. Atkinson and F. Bourguignon (Eds.), Handbook of Income Distribution, Volume 2 ofHandbook of Income Distribution, pp. 1303 – 1368. Elsevier.

Quadrini, V. (2000). Entrepreneurship, saving, and social mobility. Review of EconomicDynamics 3(1), 1 – 40.

Ray, D. (2015). Nit-piketty: A comment on thomas piketty s capital in the twenty firstcentury. CESifo Forum 16(1), 19–25.

Saez, E. and G. Zucman (2016). Wealth inequality in the united states since 1913: Evidencefrom capitalized income tax data. The Quarterly Journal of Economics.

Stiglitz, J. E. (1969). Distribution of income and wealth among individuals. Economet-rica 37(3), 382–397.

Toda, A. A. (2014). Incomplete market dynamics and cross-sectional distributions. Journalof Economic Theory 154, 310 – 348.

Appendix

A Stationary Equilibrium

Proof of Theorem 1. From the HBJ equations (19), we can solve separately for the portfoliochoice of the agents with investment productivity i as:

maxk,b

Ri(e)k + b (33)

subject to constraints (9).In a BGP, if r > RH(e) > RL(e), then (33) implies that

(k∗H, b∗H

)=(k∗L, b∗L

)= (0, 1) .

48

This however, contradicts the labor market clearing condition, (23).Similarly, if r < RL(e), (33) implies that

(k∗H, b∗H

)=(k∗L, b∗L

)=

(1

1−m,− m

1−m

),

which also contradicts the bond market clearing condition. Therefore in a BGP, r ∈[RL(e), RH(e)].

Case 1: If r = RL(e) < RH(e), then (33) implies that(k∗H, b∗H

)=(

11−m ,− m

1−m

).

Case 2: If r ∈ (RL(e), RH(e)), then (33) implies that(k∗H, b∗H

)=(

11−m ,− m

1−m

)and(

k∗L, b∗L)= (0, 1).

Case 3: if r = RH(e) > RL(e), then(k∗L, b∗L

)= (0, 1).

In Case 1 and Case 2, Lemma 5 shows that RH(e)−r1−m − c∗H > r− c∗L.

Lemma 5. In Case 1 and Case 2 in the proof of Theorem 1,

RH(e)−mr1−m

− c∗H > r− c∗L. (34)

Proof. We show this inequality for three cases separately: σ = 1, σ < 1, and σ > 1.First, for the case σ = 1, from the HJB equations (19), we have c∗H = c∗L = 1

ρ+λ,

therefore (34) immediately follows from RH(e) > r.Second, for the case σ < 1: we first show that vH > vL > 0. Indeed, from the definition

of value functions and from the fact that u(c) > 0 for all c > 0, we have vH, vL > 0.Assume by contradiction that vH ≤ vL. From the first-order condition in c in (19), weobtain

(c∗i )−σ = (1− σ)vi. (35)

Plugging this into (19) and let r∗i = Ri(e)k∗i + rb∗i , (since we are in Case 1 or Case 2,r∗H > r∗L) we rewrite (19) as

ρ + λ + λi,−i

(1− v−i

vi

)− σc∗i = (1− σ)r∗i (36)

Because 0 < vH ≤ vL , vLvH≥ 1 ≥ vH

vL. In addition, (35) implies that c∗H ≥ c∗L. Therefore,

ρ + λ + λHL

(1− vL

vH

)− σc∗H < ρ + λ + λLH

(1− vH

vL

)− σc∗L,

which, together with (36) and σ < 1, contradicts the fact that r∗H > r∗L. Therefore bycontradiction vH > vL. (35) implies that c∗H < c∗L. (34) then immediately follows from

49

r∗H > r∗L.Lastly, for the case σ > 1: As for the case σ < 1, we first show that vL < vH < 0. From

the definition of value functions and from the fact that u(c) < 0 for all c > 0, we havevH, vL < 0. Assume by contradiction that vH ≤ vL. Because vH ≤ vL < 0 , (35) impliesthat c∗H ≤ c∗L. So

ρ + λ + λHL

(1− vL

vH

)− σc∗H ≥ ρ + λ + λLH

(1− vH

vL

)− σc∗L,

which, together with (36) and σ > 1, contradicts the fact that r∗H > r∗L. Therefore bycontradiction vH > vL.32 We rewrite (7) as

ρ + λ + λi,−i

(1− v−i

vi

)− r∗i = −σ (r∗i − c∗i ) . (37)

Because vL < vH < 0, vLvH

> 1 > vHvL

. In addition r∗H > r∗L, therefore

ρ + λ + λHL

(1− vL

vH

)− r∗H < ρ + λ + λLH

(1− vH

vL

)− r∗L.

This inequality, together with (37) and with the fact that σ > 0, implies r∗H − c∗H > r∗L −c∗L.

Proof of Theorem 2, Part 1. Since the supports of Φ and Γi are bounded, there exists x suchthat Φ′(x) = 0 for all x ≥ x and γ such that Γi,t(ψ) = 0 for all ψ ≥ ψ and t ≥ 0. Let ω bedefined by

ω (WH + WL + q)

(WH + WL)λ

λ+n

− q

(WH + WL)λ

λ+n

x = ψ

Then from the definition of Ji

Ji

(ω (WH + WL + q)

(WH + WL)λ

λ+n

,q

(WH + WL)λ

λ+n

)= 0

for all ω ≥ ω.The PDEs (24) then become, for ω ≥ ω and i ∈ H, L

0 = −ω(g∗i + n)dp∗i (ω)

dω− p∗i (ω)

(λi.−i + λ

)+ p∗−i (ω) λi,−i.

(38)

32(35) then implies that c∗H > c∗L.

50

We use the following change of variable

z = log(ω)

pi(z) = p∗i (ez).

We rewrite (38) as

p′H (z) =λHL

g∗H + npL (z)−

λHL + λ

g∗HpH (z)

p′L (z) =λHL

g∗LpH (z)− λLH + λ

g∗LpL (z) ,

or equivalently,

[p′H (z)p′L (z)

]=

−λHL+λ+ng∗H+n

λHLg∗H+n

λLHg∗L+n −λLH+λ+n

g∗L+n

[ pH (x)pL (x)

]. (39)

The eigenvalues of the matrix are the root of the quadratic equation given in the statementof the theorem.

When n is close to 0, the quadratic equation has two distinct real roots, one positiveand one negative. Denote them by η1 < 0, η2 > 0.

Given η1 and η2, the general solution of (39) is

pH (x) = ΨH exp (η1x) + ΨH exp (η2x)

pL (x) = ΨL exp (η1x) + ΨL exp (η2x)

Since

limx→+∞ pH (x) = 0

limx→+∞ pL (x) = 0, we must have

pH (x) = ΨH exp (η1x)

pL (x) = ΨL exp (η1x)with ΨH, ΨL

positive.

This is equivalent to

p∗H (ω) = ΨHωη1

p∗L (ω) = ΨLωη1.

Proof of Theorem 2, Part 2. In a stationary BGP, Wt = NtGtW and Qt = NtGtq. From thedefinition of total wealth,

ωht =

wht + qtxh

t(Wt + Qt)/Nt

=wh

t + qxht

GtW + Gtq.

51

Therefore

Pr

(wh

tWt/Nt

≥ w

)=∫

xPr

(wh

tGtW

≥ w,xh

tGt

= x

)dΦ(x) =

∫x

Pr

(ωh

t ≥wW + qx

W + q,

xht

Gt= x

)dΦ(x)

Now

Pr(

ωht ≥

wW + qxW + q

)≥∫

xPr

(ωh ≥ wW + qx

W + q,

xht

Gt= x

)dΦ(x) ≥ Pr

(ωh

t ≥wW + qx

W + q

)

By Part 1 in Theorem 2, when w is sufficiently high,

Pr(

ωht ≥

wW + qxW + q

)= Ψ

(wW + qx

W + q

)−θ

and

Pr(

ωht ≥

wW + qxW + q

)= Ψ

(wW + qx

W + q

)−θ

.

Taking w→ ∞, we obtain the desired bounds.

Proof of Theorem 3. Since n = 0, from the equations (15) and (16) evaluated at the steady-state, we have

g∗HX∗ + g∗L (1− X∗) = 0

and(g∗H − g∗L) (1− X∗) X∗ − X∗λHL + (1− X∗)λLH = 0.

We also have g∗H ≥ 0 ≥ g∗L. If g∗L = 0 then g∗H = g∗L = 0. Consequently there is nowealth inequality in a stationary equilibrium, i.e. η1 = −∞. We now show the result inthis theorem assuming g∗H > 0 > g∗L.

After simplification, we can write g∗H as a function of g∗L:

g∗H = gH(g∗L) :=λHL

1− λLHg∗L

.

Plugging this expression for g∗H into the equations that determines the tail index, (25) withn = 0, we obtain f (η∗1 , g∗L) = 0, where

f (η, gL) = η2 + η

(1 +

λ(gL + gH(gL))

gL gH(gL)

)+

λ(λ + λLH + λHL)

gL gH(gL). (40)

52

By the Implicit Function Theorem: ∂η1∂g∗L

= − ∂ f∂gL

/ ∂ f∂η . We have ∂ f

∂η |η=η∗1< 0. Therefore, to

prove that ∂η1∂g∗L

< 0, it suffices to show ∂ f∂gL

< 0.Differentiating (40), we obtain

∂ f∂gL

= −η1λ

g2H + g2

L∂gH∂gL

(gL gH)2

− λ(λ + λLH + λHL)

(gL gH)2 (gH + gL∂gH

∂gL).

From the expression for gH∂gH

∂gL= − λLHλHL

(gL − λLH)2 .

Therefore,

gH + gL∂gH

∂gL=

λLHλHL(gL − λLH) + λHL(gL − λLH)2 − λLHλHLgL

(gL − λLH)2

=λHL(gL − λLH)

2 − λ2LHλHL

(gL − λLH)2 > 0

since gL < 0.Now

∂ f∂gL

< 0⇔ η1λ

g2H + g2

L∂gH∂gL

(gL gH)2

> − λ(λ + λLH + λHL)

(gL gH)2 (gH + gL∂gH

∂gL).

Ifg2

H+g2L

∂gH∂gL

(gL gH)2 ≤ 0, this trivially holds since η1 < −0.

Ifg2

H+g2L

∂gH∂gL

(gL gH)2 > 0, this is equivalent to

η1 > η∗ = −(λ + λLH + λHL)

(gH + gL

∂gH∂gL

)g2

H + g2L

∂gH∂gL

.

Simplify we get

η∗ = −(λ + λLH + λHL)

λHLgL(gL−2λLH)(gL−λLH)2

λHLg2L(λHL−λLH)

(gL−λLH)2

= −(λ + λLH + λHL)gL − 2λLH

gL(λHL − λLH).

Now η1 > η∗ is equivalent to f (η∗, g∗L) > 0. Plug η = η∗ into f (η, g∗L), and simplify we

53

get, at gL = g∗L,

f (η∗, gL) =g2

L[(λ + λLH + λHL)(2λLHλHL + λLHλ)]

g2L(λHL − λLH)2λHL

−gL[2λLHλHLλ + 6λ2

LHλHL + 2λLHλ2HL + 2λ2

LHλ](λ + λLH + λHL)

g2L(λHL − λLH)2λHL

+λHL(λ + λLH + λHL)

24λ2LH + λλLH(λ + λLH + λHL)(λHL − λLH)

2

g2L(λHL − λLH)2λHL

.

Since every term > 0 (recall g∗L < 0), we have f (η∗, g∗L) > 0.

Therefore, η1 > η∗, which implies ∂ f∂gL

< 0 and thus, ∂η1∂g∗L

< 0.

Now, we have θ1 = −η1 and g∗H = λHL

1− λLHg∗L

. Therefore

θ1 = θ1(g∗H; λHL, λLH, λ) = −η1

λLH

1− λHLg∗H

; λHL, λLH, λ

.

So

∂θ1

∂g∗H= − ∂η1

∂g∗L∗

− λLH(1− λHL

g∗H

)2λHL(g∗H)2

< 0.

Proof of Theorem 4. Because of log utility c∗i = ρ + λ. Thus

g∗i = Ri k∗i + rb∗i −(ρ + λ

)− gx.

When kL > 0, we haver = RL < RH

and kH = 11−m . Consequently

g∗L = RL −(ρ + λ

)− gx = r−

(ρ + λ

)− gx. (41)

andg∗H =

RH −mr1−m

−(ρ + λ

)− gx. (42)

54

From the bond market clearing condition:

0 = (1− k∗H)X∗ + (1− k∗L)(1− X∗)− EY/(r− gx)

KY + EY/(r− gx). (43)

Lastly, at the steady-state with n = 0, (15) becomes

0 = g∗HX∗ + g∗L(1− X∗). (44)

After lengthy algebra manipulations using (28), (41), (42), and, (43) we arrive at:

g∗L = r− gx−(1− KY(gx + δ)) (r− gx)

(r− gx)KY + EY(45)

andX∗ =

(1− KY(r + δ)− EY) (r− gx)

(g∗H − g∗L)(KY(r− gx) + EY). (46)

As shown in the proof of Theorem 3, because n = 0,

g∗H =λHL

1− λLHg∗L

.

Plugging this expression for g∗H into the equations that determines the tail index, (25)with n = 0, we obtain f (η∗1 , g∗L) = 0, where f is given by (40). Since g∗L is a function ofKY, EY, r, gx, δ, the tail index is a function of these variables in addition to λHL, λLH, λ thatdetermines f .

From expressions for g∗L in (45), we have

∂g∗L∂KY

= (r− gx)(gx + δ)EY + r− gx

((r− gx)KY + EY)2 ,

since r− gx = eq > 0, it is immediate that ∂g∗L

∂KY > 0.Similarly,

∂g∗L∂EY

= (r− gx)(1− KY(gx + δ))

(rKY + EY)2 .

From the equation for X∗, (46), since X∗ > 0 and r− gx > 0, 1−KY(gx + δ) > 1−KY(r +δ)− EY > 0. Therefore, ∂g∗L

∂EY > 0.

55

Lastly, we rewrite g∗L in (45) as

g∗L = r + δ− 1− EYKY

+EYKY (1− (r + δ)KY− EY)

(r− gx)KY + EY.

So∂g∗L∂gx

=EY(1− (r + δ)KY− EY)

((r− gx)KY + EY)2 > 0

since (1− (r + δ)KY− EY) > 0, from X∗ > 0 as argued above.The last three results, combined with Theorem 3, give us the desired comparative

statics stated in this theorem.

B Proofs for the AK Growth Model

B.1 Balanced Growth Path

Proof of Proposition 1 . Because u(c) = log c. From the HJB equation, (19), the first ordercondition in consumption implies that c∗i = ρ + λ.

We look at three cases in Theorem 1 and derive the conditions that determine whichcase actually happens in equilibrium for each set of exogenous parameters.

Case 1: In this case r = AL < AH. Therefore(k∗H, b∗H

)=(

11−m ,− m

1−m

). From the

bond market clearing condition (22),

− m1−m

X∗ + b∗L(1− X∗) = 0

Therefore,b∗L = m1−m

X∗1−X∗ and k∗L = 1− b∗L. Because k∗L ≥ 0, X∗ ≤ 1−m.

Equation (78) becomes:

0 =AH − AL

1−mX∗ (1− X∗) + (1− X∗) λLH − X∗λHL. (47)

Let f (x) denote the right hand-side (with xstanding for X∗). f (x) is quadratic in x and thecoefficient on the leading term, x2, is negative. In addition f (0) > 0 > f (1). Therefore,there exists a unique X∗ ∈ (0, 1) that solves f (X∗) = 0 (the other root is negative).

In order for X∗ < 1− m, it is necessary and sufficient that f (1− m) < 0, which isequivalent to (29).

Case 2: In this case r∗ ∈ (AL, AH). Therefore(k∗H, b∗H

)=(

11−m ,− m

1−m

)and

(k∗L, b∗L

)=

(0, 1). The bond market clearing condition (22) implies that X∗ = 1− m. Together with

56

equation (47), we obtain

0 = (AH − r)m− (1−m) λHL + mλLH.

OrAH − r =

(1−m)λHL −mλLH

m< AH − AL,

since r > AL. This inequality is equivalent to (30).Case 3: In this case r = AH. Following the steps for Case 1, we arrive at (31).

Proof of Proposition 2. By Theorem 3, the tail index is a function of g∗L (and λHL, λLH).Case 1: As argued above, to show that dη1

dm > 0 in this case, we just need to show∂g∗L∂m > 0.

Indeed, from the portfolio choices in Case 1 of Theorem 1 and by definition, the growthrate of the economy is given by

g∗ = X∗(

AH −mAL

1−m− ρ− λ

)+ (1− X∗)

(AL − ρ− λ

).

Thereforeg∗L = AL − ρ− λ− g∗ = −AH − AL

1−mX∗.

Differentiating the last expression with respect to m, we get

dg∗Ldm

= −AH − AL

(1−m)2 X∗ − AH − AL

1−mdX∗

dm.

In addition, from the equation that determines X∗, (47), using the Implicit Function The-orem, we obtain

dX∗

dm= −

AH−AL(1−m)2 X∗ (1− X∗)

AH−AL1−m (1− 2X∗)− (λLH + λHL)

> 0,

where the inequality comes from the fact that X∗ is the higher root of (47), and thus,AH−AL

1−m (1− 2X∗)− (λLH + λHL) < 0.

Therefore, dg∗Ldm < 0. By Theorem 3, dη1

dm > 0.Case 2: Similar to Case 1, to show that dη1

dm < 0 in this case, we just need to showdg∗Ldm > 0.

From the portfolio choices in Case 2 of Theorem 1 and by definition, the growth rate

57

of the economy is given by

g∗ = X∗(

AH −mr1−m

− ρ− λ

)+ (1− X∗)

(r− ρ− λ

).

Therefore

g∗L =r− ρ− λ− g∗ = −X∗AH − r1−m

=− (1−m)λHL −mλLH

m.

Sodg∗Ldm

=1

m2 λHL > 0.

By Proposition 3, dη1dm < 0.

Case 3: In this case g∗H = g∗L = 0, therefore relative wealth distribution does notchange over time.

Proof of Proposition 3. Since the redistribution is type preserving, the aggregate dynamics(Xt, gH,t, gL,t) do not change in the presence of wealth tax. However, the dynamics ofwealth distribution change. We assume that τ is sufficiently small so that gH,t > τ for allt. In particular, in a steady-state, we assume that τ < g∗H = τ.

Following the derivations in the proof of Lemma 4, we can show that, for i ∈ L, H

∂pi (t, ω)

∂t= −ω

(g∗i,t − τ

) ∂pi (t, ω)

∂ω−(λi,−i + λ

)pi (t, ω) + λ−i,i p−i (t, ω)

+ λΓi,t

(Xt(1 + τ/λ)

Mi

), (48)

where the last term in (48) captures the facts that the tax proceeds τXt and τ(1− Xt) perunit of time is equally redistributed to the new borns (of the same type). pi also satisfiesthe boundary conditions limω→0 pi(t, ω) = 1 and limω→∞ pi(t, ω) = 0.

Following the steps in the proof of Theorem 2, at the steady state pH(t, ω) ≡ pH(log ω)

and pL(t, ω) ≡ pL(log ω) and for x > max

log(

X∗(1+τ/λ)MH

), log

((1−X∗)(1+τ/λ)

ML

), and

the system (48) can be rewritten as:

[p′H (x)p′L (x)

]=

−λHL+λg∗H−τ

λLHg∗H−τ

λHLg∗L−τ −λLH+λ

g∗L−τ

[ pH (x)pL (x)

].

58

Therefore the steady-state right tail index η1 is the lower roof of the quadratic equation:

f (η; τ) =

(λHL + λ

g∗H − τ+ η

)(λLH + λ

g∗L − τ+ η

)− λLH

g∗H − τ

λHL

g∗L − τ= 0.

As in the proof of Proposition 3, to show that dη1dτ < 0, we just need to show that ∂ f

∂τ < 0.After lengthy algebra, we arrive at

∂ f∂τ

= −η

(λLH + λHL + 2λ(g∗H − τ

) (g∗L − τ

) + η

(1

g∗H − τ+

1g∗L − τ

))

We show that ∂ f∂τ < 0 in two cases separately:

Case 1:(

1g∗H−τ + 1

g∗L−τ

)> 0: Since g∗H − τ > 0 > g∗L − τ and η1 < 0, ∂ f

∂τ < 0 .

Case 2:(

1g∗H−τ + 1

g∗L−τ

)< 0, ∂ f

∂τ < 0 is equivalent to

η > −(λLH + λHL + 2λ

)(g∗H + g∗L − 2τ

) .

This is true if

f

(η = −

(λLH + λHL + 2λ

)(g∗H + g∗L − 2τ

) ; τ

)> 0.

After lengthy algebra, we verify that this is indeed the case.

B.2 Transitional Dynamics

Proof of Proposition 4. From the bond market clearing condition, (10), in each instant, theeconomy is in one of the three cases:

i) AH > rt > AL: agents’ optimization yields (kH, bH) =(

11−m ,− m

1−m

)and (kL, bL) =

(0, 1). The bond market clearing condition then implies:

− m1−m

Xt + (1− Xt) = 0.

or Xt = 1−m.ii) rt = AH:

b∗(t, st, H)Xt + (1− Xt) = 0.

Since b∗(t, st, H) ≥ − m1−m , this implies Xt ≥ 1−m.

59

iii) rt = AL:− m

1−mXt + b∗(t, st, L)(1− Xt) = 0.

Since b∗(t, st, H) ≤ 1, this implies Xt ≤ 1−m.Now we use this classification to characterize the dynamics of the economy in transi-

tional paths.Let x1 > x2 denote the two roots of the quadratic equation (47). As shown in the proof

of Proposition 1, we have 1 > x1 > 0 > x2.In Case 1: X∗ = x1 < 1−m. When Xt is sufficiently close to x1, Xt < 1−m

dXt

dt=

(AH − AL

1−m

)(1− Xt)Xt − XtλHL + (1− Xt) λLH

=−(

AH − AL

1−m

)(Xt − x1) (Xt − x2) .

We rewrite the equation as

dXt

−(

AH−AL1−m

)(Xt − x1) (Xt − x2)

= dt.

Notice that1

(Xt − x1) (Xt − x2)=

1x1 − x2

(1

Xt − x1− 1

Xt − x2

)and

dXt

Xt − xi= d (log (Xt − xi)) .

Therefore1

(x2 − x1)AH−AL

1−m

d(

log(

Xt − x1

Xt − x2

))= dt.

When X0 < 1−m, this equation applies for all t > 0. So

Xt − x1

Xt − x2=

X0 − x1

X0 − x2exp

((x2 − x1)

AH − AL

1−mt)

.

After rearranging, we arrive at an explicit expression for Xt:

Xt =x1 − x2

X0−x1X0−x2

exp((x2 − x1)

AH−AL1−m t

)1− X0−x1

X0−x2exp

((x2 − x1)

AH−AL1−m t

) .

60

Given this expression for Xt and because x2 − x1 < 0, limt→∞ Xt = x1. In addition,

Xt − x1 =x1

X0−x1X0−x2

− x2X0−x1X0−x2

1− X0−x1X0−x2

exp((x2 − x1)

AH−AL1−m t

) exp((x2 − x1)

AH − AL

1−mt)

. (49)

Therefore, the speed of transition is given by

∆ = (x1 − x2)AH − AL

1−m=

√(AH − AL

1−m− λHL − λLH

)2

+ 4λLHAH − AL

1−m.

Now if X0 > 1−m, for t such that Xt > 1−m:

dXt

dt= −XtλHL + (1− Xt) λLH

< λLH − (λLH + λHL) (1−m) < 0, (50)

since m < ML. Therefore Xt is strictly decreasing until it reaches 1−m in finite time, i.e.there exists t∗ > 0 such that Xt∗ = 1− m. From then on, we can apply (49) to calculatethe time to reach X∗.

Case 2: x1 > 1−m and X∗ = 1−m with AL < r < AH in the stationary equilibrium.Starting from X0 < 1−m, (49) also applies as long as Xt < 1−m. Therefore, Xt reachesX∗ = 1−m at T given by

x1 − (1−m) =x1 − X0

X0 − x2

x1 − x2

1− X0−x1X0−x2

exp((x2 − x1)

AH−AL1−m T

) exp((x2 − x1)

AH − AL

1−mT)

(51)Starting from X0 > 1 − m, Xt reaches X∗ = 1 − m in finite time given that dXt

dt <

λLH − (λLH + λHL) (1−m) < 0 as shown in (50).Case 3: x1 > 1−m and X∗ > 1−m and r = AH in the stationary equilibrium. Starting

from X0 ≥ 1−m , rt = AH for all t ≥ 0 and Xt = X0 for all t ≥ 0 since both types earnthe same rate of returns on wealth.

Starting from X0 < 1−m. (49) applies as long as Xt < 1−m. Therefore, as argued inCase 2, Xt reaches X∗ = 1−m at T given by (51). From there on, the economy becomesstationary with X∗ = 1−m.

Proof of Proposition 5. From the definition of the idiosyncratic investment risk process, af-ter lengthy algebras, we obtain

Corr(At, At+s) = exp(−(λLH + λLH)s)

61

Therefore the instantaneous persistence of At is given by λHL + λLH.When λHL = λLH = λ, the expression for the speed of convergence becomes

∆ =AH − AL

1−m+ 2λ,

which is strictly increasing in 2λ, which is the persistence of At given above.However, when the process for idiosyncratic investment risk is not symmetric, it is

possible that the speed of convergence is not monotone in the persistence of the idiosyn-cratic investment risk. Indeed, we have

∂∆∂λLH

=1

2∆

(−2(

AH − AL

1−m− λHL − λLH

)+ 4

AH − AL

1−m

)=

1∆

(AH − AL

1−m− λHL − λLH

)Therefore ∂∆

∂λLH< 0 if

m < 1− AH − AL

λHL + λLH.

Notice that this is also consistent with the condition for Case 1 in Proposition 1 if m issufficiently small.

In this case an increase in λLH, while keeping λHL constant will reduce the speed ofconvergence while increasing the persistence of the idiosyncratic investment risks. There-fore the speed of convergence and persistence move in opposite directions.

62

Online Appendix

C Continuous Time Derivations

Proof of Lemma 1. By the Principle of Optimality and the fact that after an infinitesimaltime interval ∆t, the survival probability of an entrepreneur is e−λ∆t, we have

V(t, it, xt, wt) = maxctkt,bt

u(ct)∆t + e−(ρ+λ)∆tEt [V(t + ∆t, it+∆t, xt+∆t, wt+∆t)] + o(∆t) (52)

subject to kt ≥ 0 and mkt + bt + Qt ≥ 0, and

wt+∆t = wt + (Rit(et)kt + rtbt + etxt − ct)∆t + o(∆t).

Furthermore, under the Poisson arrival rates of idiosyncratic shocks

Et [V(t + ∆t, it+∆t, xt+∆t, wt+∆t)] = e−λit ,−it ∆tV(t + ∆t, it, xt+∆t, wt+∆t)

+(

1− e−λit ,−it ∆t)

V(t + ∆t,−it, xt+∆t, wt+∆t) + o(∆t).

Using the fact that ez = 1+ z + o(z) for z close to 0, we rewrite the previous expression as

Et [V(t + ∆t, it+∆t, xt+∆t, wt+∆t)] = (1− λit,−it ∆t)V(t + ∆t, it, xt+∆t, wt+∆t)

+ λit,−it ∆tV(t + ∆t,−it, xt+∆t, wt+∆t) + o(∆t).

Plugging this into equation (52), we obtain

V(t, it, xt, wt) = maxctkt,bt

u(ct)∆t + V(t + ∆t, it, xt+∆t, wt+∆t)−

(ρ + λ

)∆tV(t + ∆t, it, xt+∆t, wt+∆t)

+λit,−it ∆t (V(t + ∆t,−it, xt+∆t, wt+∆t)−V(t + ∆t, it, xt+∆t, wt+∆t)) + o(∆t) .

Subtracting both sides by V(t + ∆t, it, xt, wt+∆t) and dividing both sides by ∆t then takethe limit ∆t→ 0, noticing that

lim∆t→0

V(t, it, xt, wt)−V(t + ∆t, it, xt+∆t, wt+∆t)

∆t= −∂V

∂t− ∂V

∂wdwt

dt− ∂V

∂xgxt,

we obtain (6).

Proof of Lemma 2. The total wealth conditional on investment productivity type, Wi,t is the

63

sum of wealth across agents h ∈ [0, Nt] whose type equal to i:

Wi,t+∆t =∫

h:iht =i

wht+∆tdt

=∫

h≤Nt :iht+∆t=i,ih

t =iwh

t+∆tdt +∫

h≤Nt :iht+∆t=i,ih

t =−iwh

t+∆tdt

+∫

h≥Ntiht+∆t=i

wht+∆tdt + o(∆t)

We write the first term as∫h≤Nt :ih

t+∆t=i,iht =i

wht+∆tdt

=∫

h≤Nt :iht+∆t=i,ih

t =i,jht =1

((wh

t + Qht

) (1 + ∆t

(Ri(et)k∗i,t + rtb∗i,t − c∗i,t

))−Qh

t+∆t

)dt

+∫

h≤Nt :iht+∆t=i,ih

t =i,jht =0wh

t+∆tdt + o(∆t)

= e−λi,−i∆te−λ∆t∫

h≤Nt :iht =i

((wh

t + Qht

) (1 + ∆t

(Ri(et)k∗i,t + rtb∗i,t − c∗i,t

))−Qh

t+∆t

)dt

+(

1− e−λ∆t)

MiNt

∫ψ

ψdΓi,t(ψ)λ (WH,t + WL,t)

(λ + n)Nt+ o(∆t)

where∫h≤Nt :ih

t =i

((wh

t + Qht

) (1 + ∆t

(Ri(et)k∗i,t + rtb∗i,t − c∗i,t

))−Qh

t+∆t

)dt

= Wi,t + qtMi MxGtNt

+ ∆t(k∗i,tRi(et) + b∗i,trt − c∗i,t) (Wi,t + qtMi MxGtNt)−(

qt +dqt

dt∆t)

Mi MxGtNt(1 + g∆t) + o(∆t).

Therefore,∫h≤Nt :ih

t+∆t=i,iht =i

wht+∆tdt

= Wi,t + ∆t(Ri(et)k∗i,t + rtb∗i,t − c∗i,t) (Wi,t + qtMi MxGtNt)−dqt

dtMi MxGtNt∆t− qtMi MxGtNtg∆t

− (λi,−i + λ)∆tWi,t + λ∆tMi

∫ψ

ψdΓi,t(ψ)λ (WH,t + WL,t)

λ + n+ o(∆t).

64

We write the second term as∫h≤Nt :ih

t+∆t=i,iht =−i

wht+∆tdt

=∫

h≤Nt :iht+∆t=i,ih

t =−i,jht =1

((wh

t + Qht

) (1 + ∆t

(R−i(et)k∗−i,t + rtb∗−i,t − c∗−i,t

))−Qh

t+∆t

)dt

=(

1− e−λ−i,i∆t)

e−λ∆t∫

h≤Nt :iht =−i

((wh

t + Qht

) (1 + ∆t

(R−i(et)k∗−i,t + rtb∗−i,t − c∗−i,t

))−Qh

t+∆t

)dt

where ∫h≤Nt :ih

t =−i

((wh

t + Qht

) (1 + ∆t

(R−i(et)k∗−i,t + rtb∗−i,t − c∗−i,t

))−Qh

t+∆t

)dt

= W−i,t + o(1).

Therefore ∫h≤Nt :ih

t+∆t=i,iht =−i

wht+∆tdt = λ−i,i∆tW−i,t + o(∆t).

And lastly, ∫h≥Ntih

t+∆t=iwh

t+∆tdt = Mi (Nt+∆t − Nt)∫

ψψdΓi,t(ψ, t) (WH,t + WL,t)

= Min∆tNt

∫ψ

ψdΓi,t(ψ)λ (WH,t + WL,t)

(λ + n)Nt.

Therefore

Wi,t+∆t −Wi,t = ∆t(Ri(et)k∗i,t + rtb∗i,t − c∗i,t) (Wi,t + qtMi MxGtNt)

− (λi,−i + λ)∆tWi,t −dqt

dtMi MxGtNt∆t− qtMi MxGtNtg∆t

+ Ntλ∆tMi

∫ψ

ψdΓi,t(ψ)λ (WH,t + WL,t)

(λ + n)Nt

+ λ−i,i∆tW−i,t

+ Min∆tNt

∫ψ

ψdΓi,t(ψ)λ (WH,t + WL,t)

(λ + n)Nt+ o(∆t).

65

Dividing both sides by ∆t and taking the limit ∆t→ 0,

dWi,t

dt= (Ri(et)k∗i,t + rtb∗i,t − c∗i,t) (Wi,t + qtMi MxGtNt)

− λi,−iWi,t + λ−i,iW−i,t −dqt

dtMi MxGtNt − qtMi MxGtNt

+ Mi

∫ψ

ψdΓi,t(ψ)λ (WH,t + WL,t)− λWi,t.

Simplifying using the fact that ∑i Mi∫

ψ ψdΓi,t(ψ) = 1 we obtain (12).

Proof of Lemma 4. To derive the partial differential equations for the wealth distributions,we use the following measure

Mi(t + ∆t, ω, x) =∫

h1(ωh

t+∆t ≥ ω and iht+∆t = i and

xht+∆t

Gt+∆t= x)dh

where 1 is the set indicator function. Since the population grows, we decompose themeasure to two components, one with h ∈ [0, Nt] and the other with h ∈ (Nt, Nt+∆t]:∫

h1(ωh

t+∆t ≥ ω, iht+∆t = i, xh

t+∆t = x)dh =∫

h≤Nt1(ωh

t+∆t ≥ ω, iht+∆t = i, xh

t+∆t = x)dh

+∫

h>Nt1(ωh

t+∆t ≥ ω, iht+∆t = i, xh

t+∆t = x)dh.

Conditioning on the type at t, we write the first term as∫h≤Nt

1(ωht+∆t ≥ ω, ih

t+∆t = i, xht+∆t = x)dh

=∫

h≤Nt1(ωh

t+∆t ≥ ω, iht+∆t = i, ih

t = i,xh

tGt

= x)dh

+∫

h≤Nt1(ωh

t+∆t ≥ ω, iht+∆t = i, ih

t = −i,xh

tGt

= x)dh.

Conditioning on whether the death shock hits between t and t + ∆t, we obtain

∫h≤Nt

1(ωht+∆t ≥ ω, ih

t+∆t = i, iht = i,

xht

Gt= x)dh

=∫

h≤Nt1(ωh

t+∆t ≥ ω, iht+∆t = i, ih

t = i, jht = 0,

xht

Gt= x)dh

+∫

h≤Nt1(ωh

t+∆t ≥ ω, iht+∆t = i, ih

t = i, jht = 1,

xht+∆t

Gt+∆t= x)dh.

66

Expanding further, we arrive at

∫h≤Nt

1(ωht+∆t ≥ ω, ih

t+∆t = i, iht = i, jh

t = 0,xh

tGt

= x)dh

= e−λi,−i∆te−λ∆t∫

h≤Nt1(ωh

t

1 + ∆t(

Ri(et)k∗i,t + rtb∗i,t − c∗i,t)

(Wt+Qt)(1+gt∆t)Nt(1+n∆t)

≥ ω, iht = i,

xht

Gt= x)dh

= (1−(λi,−i + λ

)∆t)Mi

t,ω

1 + ∆t(

Ri(et)k∗i,t + rtb∗i,t − c∗i,t − gt + n) , x

+ o(∆t)

and

∫h≤Nt

1(ωht+∆t ≥ ω, ih

t+∆t = i, iht = i, jh

t = 1,xh

t+∆tGt+∆t

= x)dh

= e−λi,−i∆t(

1− e−λ∆t)

MiNt

∫ψ

1

ψ Wtλ(λ+n)Nt

+ qt+∆txGt+∆t

Wt+∆t+Qt+∆tNt+∆t

≥ ω

dΓi(ψ)φ(x)

= λ∆tMiNtΓi

ω(Wt+∆t+Qt+∆t)

Nt+∆t− qt+∆txGt+∆t

WtλNt(λ+n)

φ(x) + o(∆t)

= λ∆tMiNtΓi

(ω Wt+Qt

Nt− qtxGt

WtNt

λλ+n

)φ(x) + o(∆t).

Similarly

∫h≤Nt

1(ωht+∆t ≥ ω, ih

t+∆t = i, iht = −i,

xht+∆t

Gt+∆t= x)dh

=∫

h≤Nt1(ωh

t+∆t ≥ ω, iht+∆t = i, ih

t = −i, jht = 0,

xht

Gt= x)dh.

67

Expanding further, we arrive at

∫h≤Nt

1(ωht+∆t ≥ ω, ih

t+∆t = i, iht = −i, jh

t = 0,xh

tGt

= x)dh

=(

1− e−λ−i,i∆t)

e−λ∆t∗∫h≤Nt

1(ωht(1 + ∆t

(R−i(et)k∗−i,t + rtb∗−i,t − c∗−i,t

)− gt∆t + n∆t

)≥ ω, ih

t = −i,xh

tGt

= x)dh

= λ−i,i∆tM−i

t,ω

1 + ∆t(

R−i(et)k∗−i,t + rtb∗−i,t − c∗−i,t − gt + n) , x

+ o(∆t).

Lastly

∫h>Nt

1(ωht+∆t ≥ ω, ih

t+∆t = i,xh

t+∆tGt+∆t

= x)dh

= Mi (Nt+∆t − Nt)∫

ψ1

ψ Wtλ(λ+n)Nt

+ qt+∆txGt+∆t

Wt+∆t+Qt+∆tNt+∆t

≥ ω

dΓi(ψ)φ(x)

= Mi∆tNtΓi

ω(Wt+∆t+Qt+∆t)

Nt+∆t− qt+∆txGt+∆t

WtNt

λλ+n

φ(x) + o(∆t)

= Min∆tNtΓi

(ω Wt+Qt

Nt− qtxGt

WtNt

λλ+n

)φ(x) + o(∆t).

Therefore

Mi(t + ∆t, ω, x)−Mi(t, ω, x)

= −∆t∂Mi

∂ωω(

Ri(et)k∗i,t + rtb∗i,t − c∗i,t − gt + n)−(λi,−i + λ

)∆tMi(t, ω, x)

+ λ∆tMiNtΓi

(ω Wt+Qt

Nt− qtxGt

WtNt

λλ+n

)φ(x) + λ−i,i∆tM−i (t, ω, x)

+ Min∆tNtΓi

(ω Wt+Qt

Nt− qtxGt

WtNt

λλ+n

)φ(x) + o(∆t).

68

LetM0i (t, ω, x) = Mi(t,ω,x)

Mi Nt, the last equation implies

M0i (t + ∆t, ω, x)−M0

i (t, ω, x)∆t

=M0

i (t + ∆t, ω, x)−M0i (t, ω, x)

MiNt− nM0

i (t, ω, x) + o(1)

= −∂M0

i∂ω

ω(

Ri(et)k∗i,t + rtb∗i,t − c∗i,t − gt + n)−(λi,−i + λ

)M0

i (t, ω, x)

+ λΓi

(ω Wt+Qt

Nt− qtxGt

WtNt

λλ+n

)φ(x) + λ−i,i

M−i

MiM0−i (t, ω, x)

+ nΓi

(ω Wt+Qt

Nt− qtxGt

WtNt

λλ+n

)φ(x) + o(1).

Taking the limit ∆t→ 0, we obtain

∂M0i

∂t= −

∂M0i

∂ωω(

Ri(et)k∗i,t + rtb∗i,t − c∗i,t − gt + n)−(λi,−i + λ

)M0

i + λ−i,iM−i

MiM0−i

+(λ + n

)Γi

(ω Wt+Qt

Nt− qtxGt

WtNt

λλ+n

)φ(x)− nM0

i .

Now integrate both-side in x and let pi(t, ω) =∫M0

i (t, ω, x)dx, we arrive at (17).

Proof of Lemma 3. From the definition of Wi,t we have

dWt

dt= ∑

i(k∗i,tRi(et) + b∗i,trt − c∗i,t) (Wi,t + qtMiGtNt)−

dqt

dtGtNt − gxqtGtNt

anddQt

dt=

dqt

ddtMxGtNt + gxQt + nQt.

Therefore

dWt

dt+

dQt

dt= ∑

i(k∗i,tRi(et) + b∗i,trt − c∗i,t) (Wi,t + qtMi MxGtNt) + nQt.

Dividing both sides by Wt + Qt, we obtain

gt = ∑i(k∗i,tRi(et) + b∗i,trt − c∗i,t)

Wi,t + qtMi MxGtNt

Wt + Qt+ n

Qt

Wt + Qt,

which is equivalent to (13).

69

Now, we turn to the dynamics of Xt. From the definition of Xt, we have

dXt =ddtWH,t + MH

dQtdt

Wt + Qt− Xtgt.

From the dynamics of WH,t

dWH,t

dt1

Wt + Qt= (k∗H,tRH(et) + b∗H,trt − c∗H,t)Xt − λHL

WH,t

Wt + Qt+ λLH

WL,t

Wt + Qt

+ λ

(MH

∫ψ

ψdΓH,t(ψ)WL,t

Wt + Qt−ML

∫ψ

ψdΓL,t(ψ)WH,t

Wt + Qt

)−

dqtdt MHGtNt + gxqtMHGtNt

Wt + Qt.

So

ddtWH,t + MH

dQtdt

Wt + Qt= (k∗H,tRH(et) + b∗H,trt − c∗H,t)Xt − λHLXt + λLH(1− Xt)

+ λ

(MH

∫ψ

ψdΓH,t(ψ)(1− Xt)−ML

∫ψ

ψdΓL,t(ψ)Xt

)− λ

(MH ML

∫ψ

ψdΓH,t(ψ)−MLMH

∫ψ

ψdΓL,t(ψ)

)Qt

Wt + Qt

+ nMHQt

Wt + Qt.

Consequently,

dXt

dt= g∗H,tXt − λHLXt + λLH(1− Xt)

+ λ

(MH

∫ψ

ψdΓH,t(ψ)(1− Xt)−ML

∫ψ

ψdΓL,t(ψ)Xt

)− λ

(MH ML

∫ψ

ψdΓH,t(ψ)−MLMH

∫ψ

ψdΓL,t(ψ)

)Qt

Wt + Qt

+ nMHQt

Wt + Qt.

Plugging the expression for gt back to the early expression for dXtdt , we obtain (78).

70

D Model with Corporate Tax

The output (net of depreciation and wage) is taxed at the rate τc. The return to one unitof capital to agent of investment productivity type i becomes:

(1− τc)maxl

Fi(1, l)− etl

The constraints to the agents (2) and (3) become

dwht

dt= (1− τc)

max Fih

t

(kh

t , lht

)− etlh

t

+ rtbh

t + etxht − ch

t (53)

wht = kh

t + bht . (54)

The HJB equation for V, (6), becomes

(ρ + λ

)V − ∂V

∂t= max

c,k,bu(c) +

∂V∂x

gxxt +∂V∂w

((1− τc)Rit(et)k + rtb− c + etxt)

+ λiht ,−ih

t(V(t,−it, xh

t , wht )−V(t, it, xh

t , wht ))

where the maximization problem is subject to the constraints wht = k + b and 0 ≤ k and

0 ≤ mk +(b + Qh

t).

Under the functional form:

Vh(t, iht , xh

t , wht ) =

v(t, iht )(wh

t + xht qt)1−σ

if σ 6= 1

v(t, iht ) +

1ρ+λ

log(wh

t + xht qt)

if σ=1.

The HJB equation simplifies to

(ρ + λ)v(t, iht )−

∂v(t, iht )

∂t= max

c,k,bHτc

iht(c, k, b; v(t, ih

t ))

+ λiht ,−ih

t(v(t,−ih

t )− v(t, iht )) (55)

where

Hτci (c, k, b; t, v) =

u(c) + (1− σ)v((1− τc)Ri(et)k + rtb− c) when σ 6= 1

log(c) + 1ρ+λ

((1− τc)Ri(et)k + rtb− c) when σ = 1

(56)

71

and the maximization problem is subject to

1 = k + b and 0 ≤ k and 0 ≤ mk + b. (57)

A recursive equilibrium is characterized by:

• The policy functions are given by

cht = c∗ih

t ,t(wht + xh

t qt)

kht = k∗ih

t ,t

(wh

t + xht qt

)bh

t = b∗iht ,t(w

ht + xh

t qt)− xht qt,

where(

k∗it,t, b∗i,t, c∗i,t)

solves (56).

• The aggregate dynamics of aggregate conditional wealth satisfy, for i ∈ H, L

dWi,t

dt= ((1− τc)Ri(et)k∗i,t + rtb∗i,t − c∗i,t) (Wi,t + qtMi MxGtNt)

− λi,−iWi,t + λ−i,iW−i,t −dqt

dtMi MxGtNt − qtMi MxGtNt

+ Mi

∫ψ

ψdΓi,t(ψ)

(λWt + τc ∑

j

∫k∗j,tRj(et)(Wj,t + MjQt)

)− λWi,t (58)

anddqt

dt= (rt − gx) qt − et. (59)

To simplify the analysis, we assume that the revenue from corporate tax as well the wealthof the dying agents are redistributed to the same investment productivity type. Underthis assumption, the dynamics of conditional wealth Wi,t, (58) simplifies to (12). Noticethat the dynamics of human wealth stays the same as without corporate tax. However,the equilibrium interest rate rt used to discounted future earnings is affected by corporatetax. For example when the low type is producing

rt = (1− τc)RL(et).

Let g∗i,t denote the relative growth rate (before corporate tax):

g∗i,t = Ri(et)k∗i,t + rtb∗i,t − c∗i,t − gt.

72

The PDEs that characterize the evolution of the wealth distribution over time, (17), be-come

∂pi(t, ω)

∂t= −∂pi(t, ω)

∂ωω(

g∗i,t − τcRi(et)k∗i,t + n)−(λi,−i + λ + n

)pi(t, ω) + λi,−i p−i(t, ω)

+(λ + n

)Ji,t

(ω (Wt + Qt)

Wtλ

λ+n

,qtGtNt

Wtλ

λ+n

)(60)

with the same definition of Ji,t and the boundary conditions.

E Model with Heterogenous Labor Productivity Growth

jht captures the growth rate of labor productivity. Let Qt denote the present discounted

value of future labor income for each agent, i.e. human wealth,

Qht =

∫ ∞

0exp

(−∫ t′

0rt+t′1

dt′1

)et+t′xh

t+t′dt′

Since xht grows at the rate gx(jh

t ), we also have Qht = qt(jh

t )xht , where

qt(jht ) =

∫ ∞

0exp

(−∫ t′

0

(rt+t′1

− gx(jht ))

dt′1

)et+t′dt′.

The dynamics of qt(jht ) can be described by

dqt(jht )

dt=(

rt − gx(jht ))

qt(jht )− et.

The growth rate of total labor supply is given by:

gx = ∑j∈J

gx(j)Px(j)

The HJB equation for V is

(ρ + λ

)V − ∂V

∂t= max

c,k,bu(c) +

∂V∂x

gx(jht )xt +

∂V∂w

(Riht(et)k + rtb− c + etxt)

+ λiht ,−ih

t(V(t,−ih

t , jht , xh

t , wht )−V(t, ih

t , jht , xh

t , wht )) (61)

73

where the maximization problem is subject to the constraints wht = k + b and 0 ≤ k and

0 ≤ mk + (b + Qt).We conjecture and verify that the value function has the form:

Vh(t, iht , jh

t , xht , wh

t ) =

v(t, iht )(wh

t + xht qt(jh

t ))1−σ

if σ 6= 1

v(t, iht ) +

1ρ+λ

log(wh

t + xht qt(jh

t ))

if σ=1.

Working with bt = bt + Qht , we obtain the same HJB equation for v(t, ih

t ) as in the baselinemodel with homogenous growth rate for labor productivity, i.e. equations (7)-(9).

This implies that the policy functions are linear in total wealth, i.e. financial wealthplus human wealth,

kht = k∗ih

t(t, st)

(wh

t + Qht

)bh

t = b∗iht(t, st)

(wh

t + Qht

)ch

t = c∗iht(t, st)

(wh

t + Qht

).

Let Wi,t =∫

h:iht =i wh

t dh denote the total wealth conditional on investment productivitytype i.

The market clearing condition in the bond market becomes:

∑i∈I

b∗i (t, st)

(Wi,t + MiGtNt ∑

j∈JPx(j)qt(j)

)− GtNt ∑

j∈JPx(j)qt(j) = 0, (62)

and in the labor market becomes:

∑i∈I

(Si(et)k∗i,t(Wi,t + MiGtNt ∑

j∈JPx(j)qt(j))

)= GtNt (63)

which depends only on WH.t, WL,t, qt(j)j∈J . This suggests that (WH,t, WL,t, qt(j)j∈J )

are a sufficient endogenous state variables to determine interest rate and wage at time t.

F Wealth Inequality in the Standard Neoclassical Growth

Model

Consider the special case of our model in which the agents have identical productionfunction. This model is equivalent to the decentralized version of the standard optimal

74

neoclassical growth model in continuous time (commonly referred to as the Ramsey-Cass-Koopman model, see Barro and Sala-i Martin (2004) or Acemoglu (2009) for an extensiveexposition) with two differences. First, we assume that the agents do not put weight onthe utility of their offsprings (otherwise there would be no wealth inequality). Second, weassume that the agents die at a constant rate λ as in our main paper, and the wealth of thedying agents in each instant is redistributed equally to the new borns. This assumptionguarantees that wealth inequality does not grow unbounded over time.

Given the assumption that the agents share the same production function and theproduction has constant returns to scale, the model is equivalent to the one in which theagents rent capital and labor to a representative firms at the rental rate rt and wage rateet. In equilibrium

rt = FK(Kt, Lt) (64)

andet = FL(Kt, Lt) (65)

where Kt and Lt are the aggregate supply of capital and labor respectively.We assume that each agent is endowed with xt units of labor, where xt grows at the

rate gx over time. Each dying agent is replaced by 1 + nλ

new borns so that population Nt

grows at the rate n: Nt = N0ent. We index the agents at time t by h ∈ Nt = [0, Nt].Given the sequence of interest rates and wage rates, each agent in the economy solves:

maxct∞

s

∫ ∞

se−(ρ+λ)tu(ct)dt (66)

subject to:dwt = rtwt + etxt − ct. (67)

New borns at time t receives an equal transfer of wealth, WtNt

λλ+n , from the dying agents

where Wt is the aggregate wealth in the economy.

Definition 4. Given an initial distribution of wealth

wh0

h∈N0. A competitive equi-

librium consists of sequences of interest rates and wage rates rt, et∞t=0 and allocation

cht , wh

t

h∈Ntsuch that:

1. For each h ∈ Nt, the allocation

cht , wh

t

solves (66) subject to (67).2. At time t, interest rate and wage rate are determined by (64) and (65) with Kt =∫

h∈Ntwh

t dt and Lt = Ntxt.

As in Huggett (1993) and Aiyagari (1994), we look for a special form of competitiveequilibrium, stationary equilibrium, in which interest rate, wage rate, and the wealth

75

distribution are constant over time.

Definition 5. A stationary BGP is a competitive equilibrium in which interest rate andand wage rates are constant over time, rt ≡ r and et ≡ e and the distribution of relative

wealth

wht

Wt/Nt

is constant over time.

We assume CRRA utility function:

u(c) =

c1−σ

1−σ if σ 6= 1

log(c) if σ=1.

Let Qt denote the present discounted value of future wages (human wealth) of the anyagent at time t:

Qt =∫ ∞

te−∫ t1

t rsdset1 xt1dt1,

which implies Qt = qtxt where

qt =∫ ∞

te−∫ t1

t (rs−gx)dset1dt1,

ordqt = (rt − gx)qt − et. (68)

Let wht denote total wealth, financial and human wealth, of agent h at time t: wh

t = wht +

Qt. Combining (67) and (68), the dynamics of wht is given by

dwht = rtwh

t − cht . (69)

Because of the functional form for the utility function, the optimization problem of theagents is homogenous in total wealth. This implies that the consumption function islinear in wealth:

cht = ctwh

t .

In a stationary equilibrium as Definition 5, rt ≡ r, and the optimization of agents is

max∫ ∞

0e−(ρ+λ)tu(ct)dt

s.t.dwt = rwt − ct.

Because of the power utility function, this problem is homogenous and the (normalized)

76

value function can be described by the Hamiltonian:

(ρ + λ)v = maxcHi(c; v) (70)

where

Hi(c, k, b; v) =

u(c) + (1− σ)v(r− c) when σ 6= 1

log(c) + 1ρ+λ

(r− c) when σ = 1

(71)

which yields the optimal policy function ct = c(r)ωt where c solves (70).Let Ωt =

∫h wh

t dt. From (69), we have

dΩt

dt= (r− c)Ωt + nNtxtq.

Given that Ωt = Wt + Ntxtq, this implies

dΩt

dt=

dWt

dt+ (n + gx)Ntxtq = (r− c)Ωt + nNtxtq

SoWt

Wt=

(r− c)(Wt + Ntxtq)Wt

− gxNtxtqWt

.

Since wealth distribution is also constant over time, WtWt

= n + gx. Thus

(r− c)(w + q)− gx q = (n + gx)w,

where w = WtNtxt

is the average wealth per efficiency unit of labor. Equivalently,

(r− c− gx)q = (n + gx + c− r)w. (72)

Since q and w are strictly positive, we must have:

c + gx < r < c + gx + n.

In the BGP, (68) implies that

q =e

r− gx.

77

Plugging this into (72), we obtain

w =e

r− gx

r− c(r)− gx

n + c(r) + gx − r. (73)

From (64), we haver = FK(wNtxt, Ntxt)− δ = f ′(w)− δ. (74)

Similarly,e = FL(wNtxt, Ntxt) = f (w)− w f ′(w). (75)

The three equations (73)-(75) determine the three unknowns, r, e, w which fully character-ize a stationary BGP.

Now let EY denote labor income share, then

e = EY · f (w)

Thereforew(r + δ) = w f ′(w) = (1− EY) f (w) =

1− EYEY

e

Plugging this back to (73), we obtain

1 =EY

1− EYr + δ

r− gx

r− c− gx

n + c + gx − r(76)

Let ωht =

wht +qxt

(Wt+Nt qxt)/Ntand p(ω) = Pr(ωh

t ≥ ω). Following the derivations in themain paper, in a BGP, the PDE for stationary wealth distribution, (17), becomes

0 = −∂ p(ω)

∂ωω (r− c− gx)− (λ + n) p(ω) + (λ + n)1

ω≤w λ

λ+n+q

w+q

.

Therefore the Pareto tail index of p is given by θ = λ+nr−c−gx

.From (76), we have

r− c− gx =n(1− EY)(r− gx)

(1− EY)(r− gx) + EY(r + δ).

Therefore,

θ =

(1 +

λ

n

)(1 +

EY1− EY

r + δ

r− gx

).

Noticing that r + δ = 1−EYKY , we obtain (27).

78

G AK Model with Idiosyncratic and Aggregate Shocks

In this subsection, we investigate how idiosyncratic return risks affect the aggregate dy-namics of the economy in the presence of aggregate shocks.

Time - t - is continuous and runs from 0 to ∞. Let st denote the aggregate state of theeconomy at time t. We assume that st follows a two-state Markov chain st ∈ S = B, Gwith Poisson transition rate γBG and γGB from one state to the other.

the agents can produce output using a linear technology in capital

f ht (k

ht ) = Ah

t kht .

The productivity of agent h at time t is determined by both aggregate shock and idiosyn-cratic shock, Ah

t = A(st, iht ), and we assume that

A(s, iht = H) > A(s, ih

t = L) ∀s ∈ S .

We assume that u(c) = log c and type-preserving wealth redistribution. In additionwe focus on a perturbation around a stationary equilibrium in which r∗ = AL, i.e. rt =

A(st, L) for all t.The aggregate dynamics of the economy are characterized by

g(s, X) =

(A(s, H)−mA(s, L)

1−m− ρ− λ

)X

+(

A(s, L)− ρ− λ)(1− X) , (77)

and

dX(t, s)dt

=A (s, H)− A (s, L)

1−m(1− X(t, s))X (t, s)− X (t, s) λHL + (1− X (t, s)) λLH. (78)

In the absence of idiosyncratic shocks, i.e. A(s, H) = A(s, L) = A(s), where s ∈G, B, the aggregate growth rate of the economy is only a function of the aggregateshocks, g(s, X) = g(s), where

g(s) = A(s)− ρ− λ.

However, when there is idiosyncratic shocks, i.e. A(s, H) > A(s, L), equations (77) and(78) show that the conditional wealth distribution Xt affects the aggregate growth rate ofthe economy, in addition to the aggregate shocks.

In addition, equation (78) tells us that there are two ways that the process for idiosyn-

79

cratic shocks influences the aggregate dynamics, in particular the evolution of wealthshare Xt. The first term on the right hand side of (78) shows that the dispersion of id-iosyncratic productivity, i.e. AH−AL

1−m (and financial friction, m), matters. The last two termson the right side of (78) shows that the persistence of idiosyncratic productivity, i.e, λHL

and λLH, matters.The following proposition characterizes the ergodic distribution of the conditional

wealth share Xt and shows that the distribution depends on the underlying process foridiosyncratic return risks.

Proposition 6. The ergodic distribution of wealth share, Xt, has the support (xB, xG) ⊂ (0, 1)with the following density function

φB(x) =Φ

DB(x)

(x− xB

x− xB

) γBG(1−m)

(A(B,H)−A(B,L))(xB−xB)(

xG − xx− xG

) γGB(1−m)

(A(G,H)−A(G,L))(xG−xG)

andφG(x) = −DB(x)

DG(x)φB(x),

where Ds and xs, xs are defined by:

Ds(x) ≡ A (s, H)− A (s, L)1−m

(1− x)x− xλHL + (1− x) λLH

≡ A (s, H)− A (s, L)1−m

(xs − x) (x− xs) ,

and the constant Φ is uniquely determined such that∫ xG

xB(φB(x) + φG(x)) dx = 1.

Proof. Let PB(t, x) denote Pr(Xt ≤ x, st = B). We have

PB(t + ∆t, x) = Pr(Xt+∆t, st+∆t = B)

= Pr(Xt+∆t ≤ x, st = B)(1− γBG∆t) + Pr(Xt+∆t ≤ x, st = G)γGB∆t + o(∆t).

Now recall that

dX(s, t)dt

= [gH(s, Xt)− gL(s, Xt)](1− X(s, t))X(s, t)− X(s, t)λHL + (1− X(s, t))λLH.

Denoted the RHS as Ds(x) for s = B, G, we have

PB(t + ∆t, x) = Pr(Xt + DB(Xt)∆t + o(∆t) ≤ x, st = B)(1− γBG∆t)

+Pr(Xt + DG(Xt)∆t + o(∆t) ≤ x, st = G)γGB∆t + o(∆t).

80

Notice that

Pr(Xt + DB(Xt)∆t + o(∆t) ≤ x, st = B) = Pr(Xt ≤ x− DB(x)∆t + o(∆t), st = B)

= PB(t, x)− ∂PB

∂xDB(x)∆t + o(∆t) = PB(t, x− DB(x)∆t + o(∆t)).

Therefore,

PB(t + ∆t, x) = (PB(t, x)− ∂PB

∂xDB(x)∆t + o(∆t))(1− γBG∆t)

+(PG(t, x)− ∂PG

∂xDG(x)∆t + o(∆t))γGB∆t + o(∆t)

= PB(t, x)− ∂PB

∂xDB∆t− PBγBG∆t + PGγGB∆t + o(∆t).

We obtain a similar expression for PG(t + ∆t, x). Subtracting Ps(t, x) from both sides ofthe last equations (for s = B and s = G respectively) and dividing both sides by ∆t, thentaking the limit ∆t→ 0, we obtain the system of equations for PB, PG:

∂PB

∂t= −∂PB

∂xDB − PBγBG + PGγGB

∂PG

∂t= −∂PG

∂xDG + PBγBG − PGγGB.

At stationary distribution, ∂Ps∂t = 0, we have (now PB, PG are stationary distributions to

save notations):dPB

dxDB = −PBγBG + PGγGB

dPG

dxDG = PBγBG − PGγGB.

(79)

We rewrite Ds(x) = −∆gs(x − xs)(x − xx), where ∆gs = A(s,H)−A(s,L)1−m . As shown in

Lemma 6 below, xB < xG < 0 < xB < xG.We look for the solution PG, PB to the system of ODEs (79) with the support over

(xB, xG). Therefore, PG, PB the boundary conditions

PB(xB) = 0 and PB(xG) =γGB

γBG + γGB

PG(xB) = 0 and PG(xG) =γBG

γBG + γGB.

(80)

81

Indeed, we decouple the two ODEs (79) by observing that

dPB

dxDB +

dPG

dxDG = 0.

Differentiate the first ode both side:

d2PB

dx2 DB +dPB

dxdDB

dx= −γBG

dPB

dx+ γGB

dPG

dx.

For DB 6= 0,d2PB

dx2 = −dPB

dx

(d log(DB)

dx+

γBG

DB+

γGB

DG

).

And therefore,

dPB

dx= C exp

(−∫ (d log(DB)

dx+

γBG

DB+

γGB

DG

)dx)

.

Recall that∫ 1Ds(x)

dx =∫ 1

∆gs(xs − xs)

(1

x− xs− 1

x− xs

)dx =

1∆gs(xs − xs)

(log(x− xs)− log(xs− x)).

And therefore, for x ∈ (xB, xG),

dPB

dx= C

1DB

(x− xB

x− xB

)γBG

∆gB(xB−xB)

(xG − xx− xG

)γGB

∆gG(xG−xG)

dPG

dx= −C

1DG

(x− xB

x− xB

)γBG

∆gB(xB−xB)

(xG − xx− xG

)γGB

∆gG(xG−xG) .

We can choose the constant C such that the boundary conditions (80) are satisfied.

Lemma 6. xB < xG < 0 < xB < xG.

Proof. First, observe that Ds(0) > 0, Ds(−∞) < 0. Therefore, xs < 0. Then consider

DG(xB) = ∆gG(1− xB)xB − xBλHL + (1− xB)λLH

= ∆gB(1− xB)xB − xBλHL + (1− xB)λLH + (∆gG − ∆gB)(1− xB)xB

= 0 + (∆gG − ∆gB)(1− xB)xB < 0.

By the Intermediate Value Theorem, xB < xG < 0.

82

Similarly, Ds(0) > 0, Ds(1) < 0. Therefore, 0 < xs < 1. Consider

DG(xB) = ∆gG(1− xB)xB − xBλHL + (1− xB)λLH

= ∆gB(1− xB)xB − xBλHL + (1− xB)λLH − (∆gG − ∆gB)(1− xB)xB

= 0 + (∆gG − ∆gB)(1− xB)xB > 0.

By the Intermediate Value Theorem, xB < xG < 1.

H Empirical Analysis

Data is acquired directly from the official website of PSID (https://psidonline.isr.umich.edu/),using the variable list after looking up relevant variables in each survey year’s codebook.

H.1 Definition of variables

Since data structure varies from year to year, we try our best to pertain a consistent defi-nition of variables.

Net worth. 1984 and 1989: sum of net worth of home equity, real estate, vehicles,business, stocks, savings, bonds, minus debts. 1994-2011: include value of IRA accountsas well. 2013: now value and debts are reported separately for different categories ofassets, so the net worth can be computed as value minus remained debts.

Core asset. The above defined net worth excluding home equity and vehicles.Asset income. 1984 and 1989: sum of asset part of business income, head’s income

from rent, head’s income from dividend, interest, trust fund, and wife’s other incomefrom assets. 1994-2001: sum of asset part of business income of head and wife (reportedseparately; similarly for the following entries), dividend income of head and wife, interestincome of head and wife, income from trust funds of head and wife, rent income of head(rent income is only reported for head), other asset income of wife. 2003-2013: sum ofasset part of business income of head and wife, dividend income of head and wife, interestincome of head and wife, income from trust funds of head and wife, rent income of headand wife (rent income is now reported for both head and wife, no other asset income ofwife is reported).

Return to wealth. Asset income / Core asset.Has business. Answer "yes" to the following questionnaire: Did you (or anyone else

in the family there) own a business at any time in [YEAR] or have a financial interest inany business enterprise?

83

Has stocks. Report net worth stocks not equal to 0.Home owners. Answer "Owns or is buying home, either fully or jointly" to the fol-

lowing questionnaire: Do you own the (home/apartment), pay rent, or what?SameHead. Answer one of the following: (1) No change; no movers-in or movers-out

of the family, (2) Change in members other than Head or Wife/"Wife" only, (3) Head isthe same person as in 1983 but Wife/"Wife" left or died; Head has new Wife/"Wife"; (4)Wife/"Wife" from 1983 is now Head; to the questionnaire "Family Composition Changebetween 1983 and 1984".

H.2 Imputations, sample selection, and outliers

The full sample from 1984 to 2013 comes with 29023 unique family IDs. For preliminarysample selection, we drop the Latino families introduced in 1990, which leaves us with26258 unique family IDs.

We do not allow any imputations of each wealth component and asset income compo-nent, which means that if any of the component is missing in a household-year observa-tion, the return to wealth variable is not defined. The rationale for no imputation is thatthe aspect of data of interest - the correlation of returns across years is very sensitive tomeasurement error. Furthermore, to further minimize measurement error we constrainthe households to have the same head when computing the wealth class mobility andcorrelation of returns.

We remove household with wealth smaller than $350 dollars. We remove the top andbottom 1% returns to wealth observations in each survey year deemed as outliers. Thisleaves us with 26,687 valid entries of annual returns from pooling observations from allyears. This is the sample we use to construct the histograms of returns to wealth andcorrelation of returns to wealth between survey years.

For the regression sample, we remove households with missing covariates whichleaves us with 25,003 valid observations with 8,965 unique household IDs. Therefore theaverage length of panel is around 3 years. Table 9 reports the means statistics of obser-vations used in regressions. There are more valid observations in earlier wealth surveys,but overall the summary statistics are stable over time.

84

Table 9: Mean statistics for observations used in regressions

Year Obs Age White YrsEduc HomeOwner HasBusiness HasStocks Income

1984 3842 42.29 0.72 12.44 0.61 0.10 0.22 28306.641989 3704 47.33 0.76 12.56 0.73 0.17 0.32 42979.551994 3882 43.55 0.73 12.94 0.64 0.13 0.32 46214.361999 1644 45.67 0.80 13.54 0.87 0.20 0.32 83507.602001 1591 46.74 0.82 13.57 0.87 0.18 0.34 93363.442003 1424 48.61 0.82 13.62 0.89 0.17 0.31 95566.062005 1591 48.27 0.81 13.57 0.88 0.16 0.29 100372.712007 1672 48.77 0.80 13.48 0.86 0.14 0.27 103921.032009 1872 48.84 0.79 13.81 0.87 0.17 0.25 116541.502011 1889 49.57 0.80 13.95 0.86 0.16 0.22 110423.732013 1892 49.59 0.80 14.03 0.83 0.14 0.22 115967.89

85