measuring economic inequalities

14
Julien Barlan, institut d’etudes politiques de paris https://docs.google.com/Doc? docid=0AXe2E1Mm09WIZGhzazhxaDRfMjUzZ25nMjdkZzY&hl=en Julien R. Barlan (IEP Paris) Austin, Texas, USA Department of Economics -April 2010 [email protected] http://jrbpagetravaux.blogspot.com/ Measuring economic inequalities: Lorenz curve, coefficient of variation and Gini coefficient In this handout, I am at introducing a classical economic tool, the Lorenz curve, which measures inequalities in a society focusing on income distribution. It has been developed by American economist Max Lorenz in the early 20 th century when he was a graduate student at the University of Wisconsin at Madison. First of all, download the excel spreadsheet clicking here You are strongly advised to reproduce what you are reading while going over this document. Eventually, you should be capable of running you own experiments. If you are using this handout to review and exam or just doing homework, make sure you understand the concepts and the calculations behind all the notions. Moreover, do not skip hand calculations. It is unlikely you will be asked to compute complicated curves or coefficients while taking an exam. Thus, you should use my excel spreadsheet for more complicated experiments. I will first go over the theory and the mathematic formalization, before running some experiments about income inequality in the United States. The third part consist of an application to NBA statistics as an illustration of the usefulness of that tool to measure the degree of equality of any kind of distribution.

Upload: luismiraya

Post on 02-Dec-2015

66 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Measuring Economic Inequalities

Julien Barlan, institut d’etudes politiques de parishttps://docs.google.com/Doc?

docid=0AXe2E1Mm09WIZGhzazhxaDRfMjUzZ25nMjdkZzY&hl=enJulien R. Barlan (IEP Paris) Austin, Texas, USADepartment of Economics -April [email protected]://jrbpagetravaux.blogspot.com/ Measuring economic inequalities: Lorenz curve, coefficient of variation and

Gini coefficient

In this handout, I am at introducing a classical economic tool, the Lorenz curve, which measures inequalities in a society focusing on income distribution. It has been developed by American economist Max Lorenz in the early 20 th century when he was a graduate student at the University of Wisconsin at Madison.

First of all, download the excel spreadsheet clicking here

You are strongly advised to reproduce what you are reading while going over this document. Eventually, you should be capable of running you own experiments. If you are using this handout to review and exam or just doing homework, make sure you understand the concepts and the calculations behind all the notions. Moreover, do not skip hand calculations. It is unlikely you will be asked to compute complicated curves or coefficients while taking an exam. Thus, you should use my excel spreadsheet for more complicated experiments.

I will first go over the theory and the mathematic formalization, before running some experiments about income inequality in the United States. The third part consist of an application to NBA statistics as an illustration of the usefulness of that tool to measure the degree of equality of any kind of distribution.

This handout is accessible through my web site http://jrbpagetravaux.blogspot.com/ and the following Wikipedia pages:

       The Lorenz curve http://en.wikipedia.org/wiki/Lorenz_curve       The Gini coefficient http://en.wikipedia.org/wiki/Gini_coefficient       Income inequality metrics http://en.wikipedia.org/wiki/Income_inequality_metrics

Credited reproduction is permitted for academic purpose only. If you plan to use this handout in class or in a presentation, please don’t forget to reference my university, Sciences Po Paris (France), and myself. 

I)                Theory and mathematics of the Lorenz curve 

Page 2: Measuring Economic Inequalities

It actually is nothing else than a very nice way to measure the degree of inequality among a society. We only need two sets of data to compute it, which are revenues and population. If we know what is owned and by who, we can organize our knowledge in the following way: we will have two vectors, one representing the cumulative shares of the population, the other being the cumulative shares of the total income with respect to population.  

The entries in X are shares of the populations, the entries in y i Y are shares of the total income corresponding the xi. Let’s take an example: in a society composed of X = six individuals, one owns $100, two own $200 and the three remaining possess $300. Does it make a total income Y = $600?The answer is no. This is the first mistake to avoid since we have to scale. For example the third group overalls 3 x 300 = 900. So our total income ($) is: 100 + 400 + 900 = $1,400.

Looking at the population, actually earns . The second group

weights 33% of the total population and earns of total income.Similarly, half the population is the owner of 64.29% of the society’s wealth. If we add those proportions in order to get the cumulative shares of income, one can argue that 16.7% of the population represents 7.14% of the total revenue.

16.67% + 33.33% = 50% so half the population just owns 7.14% + 28.57% of Y. Eventually, adding the third and last portion of X, we end up with the 2-tuple {X(%),Y(%)} = (100, 100). If you plug the values into the attached spreadsheet, you would end up with the following:

Table 1

Now we can compute our graphical representation of the Lorenz curve, labeling the x-axis as the cumulative shares of the population and the y-axis of the cumulative shares of income. Note: you should draw those tables every time, either in exam, doing homework or conducting some researches!

Page 3: Measuring Economic Inequalities

Graph 1

We clearly identify our four coordinates, taking into account that (0,0) and (100,100) are obvious. Here are the pairs: (0,0) (16.67,7.14) (50,35.71) (100,100)

How can we interpret the previous graph? Actually it might be uneasy, or at least not obvious, without the knowledge that the perfect equality line represents a society in which the outcome is equally shared among the population. We have a continuous straight line on the interval [0,100]. Mathematically, we have the easy function f(a) = a. The interpretation

is the following: X% of the population would earn X% of Y or .

Consequently, the more a particular Lorenz curve lies far under the perfect equality line, the less the society is equal. But that is just a graphical interpretation. Based on this piece of information, one might only say that the studied society reached some point in the inequality scale. To go into more details, we have to introduce two statistical tools.

1.   The coefficient of variation.

It measures the spread of our weighted income distribution. It invokes some basic statistics tools, such as the weighted mean and the standard deviation. It actually is the ratio of the later over the first. The standard deviation measures how far is a value from the mean. One might argue that the overall sum of the deviations should be zero and that is exactly right. This is also why we must square the subtractions (why? See a statistics text is you are not familiar with those analyses). But since A2 is a more than proportional increasing function, we want the deviation from the mean to be weighted by the proportion of the population earning yj. Eventually we divide the standard deviation by the weighted mean mu .

Page 4: Measuring Economic Inequalities

In our example, C is roughly 32%. Remember that it is just a spread. Nevertheless, the more the spread is important, the more the society is likely to be unequal. To give an example, in 1950 Mexico, C was equal to 2.5, i.e. 250%. So one can argue that the society we are looking at is comparatively less unequal than mid-20th century Mexico, but is comparatively less equal that a society experiencing the perfect equality situation and where C = 0 (since all deviation from the mean are necessary 0!), or any more realistic 0 < C < 0.32 .

2.   The Gini coefficient

Developed by Corrado Gini, an Italian statistician in the first part of the 20 th century, it “takes the difference between all pairs of income and totals the absolute differences”[1] devided by twice the product of the population squared and the weighted mean.

Despite that apparent unfriendly calculation, the Gini Coefficient is very useful and widely used in social sciences - not only economics - since it measures the area A between the perfect equality line (E) and the Lorenz curve in portion of the area which lies between E and the x-axis. Let’s make it simple. E(x)=x on the interval [0,100]:

Since the anti derivative F(x)= and . In other words we have .

We face two extreme values for G. Obviously, so when the Gini coefficient for a Lorenz curve is null, this means that the later is xi=yi, which is E(x) the equal society.

On the other hand, if G = 1A = 5000. The society is as unequal as possible since people in the last category own all the income Note that we need at least two people vamped in two categories for the following to hold. If one person composes the society, it can only be equal though it does not make a lot of mathematical sense to talk about equality for a single variable.Anyway, those two situations are just the two extremes of the model, they just have to be understood but it is very unlikely that they will happen in real situations.

II) Income inequality in the United States

Page 5: Measuring Economic Inequalities

1.   Geographic

Based on a 2009 report of the Census bureau[2], we are going to compute a Lorenz curve for income distribution in the US with respect to geography. The population, expressed in terms of households, will be organized in administrative locations. There are four of them: Northeast, Midwest, South and West. First of all, we need to order them by increasing income. If we do not pay attention to that organizational matter, we might end up with a Lorenz curve, partially or totally, lying above E(x). This is a mistake since it has to lie under E(x), think about it in term of the ratio of integrals. Table 2 has consequently ordered the regions as the following: South, Midwest, Northeast, and West using median incomes.

Table 2

Not to get confused, note that we gradually group income values per individuals. For example, each household in Texas is considered to earn $47,961, which is the lowest individual income, but total income in the South is overwhelmingly higher than any other part of the country. Make sure that you organize your data in consequence.We can now have a look to the Lorenz curve and our two statistical measures of inequality.

Graph 2

It turns out we are likely to rule out the hypothesis United Sates is geographically unequal when it comes to income. The spread is about 6.3% and the G=3.5%. We ran this study using median

Page 6: Measuring Economic Inequalities

incomes. Recall that if there are n households in the Midwest, the median income is the

th in the distribution, when it is gradually organized from the lowest to the highest value.For those who are familiar with empirical studies, it is not surprising since the median, unlike the mean, is not affected by outlier values. Since we are dealing with very large samples, it is likely the distribution is normally distributed around the mean but using the median is generally a safer way to estimate inequality. See a statistics textbook if you want to go over those issues in more details. 

2.   Age inequalities

As it appears that income is pretty equally spread around the United States, we might now be interested in measuring inequalities among citizens. Let’s make the hypothesis that we might encounter a larger level of inequality when looking at age rather than location. It seems to be a straightforward hypothesis since young people are not so likely to make money and retired folks earn less that workers. Based on the same Census report, it actually turns out that young people aged 15 to 24 make individually more than elders over 65. Note that 45-54 means people from age 45 to 54. Income is measured in dollars.

Table 3

The population looks less equally distributed than in the geographical test. Let’s have a look at the Lorenz curve.

Graph 3

Page 7: Measuring Economic Inequalities

Compared to our previous experiment where we had C = 6%, we observe a coefficient of variation more than four times bigger. The Gini coefficient confirms what one could graphically visualize: the area between the Lorenz curve has significantly increased.

Question: do you conclude the United States is a country where it does not matter where one lives, for example they would find as much poor places in the Midwest as they would in New England? 

3.   Interpretation and comparisons issues

The answer is no. You may have notice that all is bout categories. The geographical slicing of the United States is just a rough cutting in four pieces. Consider the West part of the country: it bears California and Idaho. Do you believe that income distribution is the same between those two states? Do you think we are likely to find as much rich places in volume in California as we could in Idaho? Once again, the answer is negative. It all depends on categories. We are now dealing with volume issues since we had run a proportional study. To do so, we picked up California and Idaho, grouped them with others and created “West”. We could always make up our statistics to argue Idaho is as rich as California, we could try to argue there are more cities in Idaho that earn more than a certain proportional amount of revenue than California but let’s make it clear: California is richer than Idaho. We are just trying to underline that scaling, grouping, and setting up criteria might shade reality.

All that matters is what does “geography” actually mean. If it means four regions created in order to have four groups more or less equal in terms of income, then the United States does not know inequality. If it means fifty different states, our conclusion does not hold any longer.

Let’s take an other example: does a state like New York experiences an income distribution like the one we observe for the all United States? Or looking closer, does New York City do? Or even closer, does the borough of Manhattan do? Zooming as much as we can, is income equally distributed on Fifth Avenue, between Harlem and Rockefeller center? We might stick to a non-positive answer. Everything is just a matter of scaling. While looking into details, we can find some huge evidence of inequality.

If you are about to run your own experiments, you have to clearly define your criterions. You have to remember that the more large samples you take, the more large are your groups, the more everything tends to be balanced.

We can now deal with another issue: comparisons over time. Our studies can lead us to compare two Gini coefficients at t0 and t1. We might conclude that inequality has reduced within a country over a period or time, or on the contrary underline that it has risen. The challenge is to find data and my purpose is not to wrestle with the Census website. If you are interested in those questions, you might want to compute it yourself. You would just have to run the experiment plugging values on the excel file going with this handout. Nevertheless, you might want to be aware of the following problem:

Page 8: Measuring Economic Inequalities

Table 4

  Gini coefficient Coefficient of variationPeriod 1 0.20 0.45Period 2 0.30 .030

 Don’t you notice something unexpected? Since the Gini coefficient has thrived, we might conclude that the society has become more unequal between to and t1. But on the other hand, we can also notice that the coefficient of variation is reduced. It is confusing, because it would suggest that inequality has shrunk.

Such an empiric observation is not hypothetical, it can actually happen. To deal with it, we need to introduce the notions of progressive and regressive transfers. In the first case, some income has been transferred from a group to another one, relatively poorer. This obviously reduces inequality holding the total income fixed. A regressive transfer is the opposite situation, where we transfer some money from a group to a relatively richer one. Consequently, one can argue that holding total income fixed, it has increased inequality within the society. In a nutshell, if we have such contradictory results, we are dealing with a mix of both transfers. Looking into details at the quantity of what is transfers every time a transfer occurs can be helpful to determine if the society has become more equal or not. But there is no general rule. We are reaching the limits of the efficiency the Lorenz curve.

Note: Such transfers are redistribution processes because we are not considering an increase in the economy’s wealth. If we were comparing inequality in the United States between 1960 and 2010 using a Lorenz curve, we would have to do it in proportion and not in volume. And we would have to take into account inflation.

II) Applicability of the Lorenz curve: basketball example

The Lorenz curve can actually measure any kind of inequality. One just has to redefine what are population and income. Let’s focus on Los Angeles Lakers scorers. It is known they rely a lot on Kobe Bryan who, so far, has been scoring 27.1 points per game in the 2009-2010 season. It might be interesting to highlight how important he is for his team. The point is to evaluate how unequal is the Lakers shooting distribution. We take the ten leading scorers of the team. Necessarily, population is 1 for every player. We just replace “income” by “points per game”. Let’s check out the Excel output.

Table 5

Page 9: Measuring Economic Inequalities

It is clear that Kobe Bryant has a major impact over the Lakers season. But it is even more interesting to compare this result to other teams statistics in order to derive an overall conclusion. The 2009 NBA final opposed the Orlando Magic’s and the LA Lakers. The Lakers won 100-75, and Bryant score 40% of their points. Let’s run the same experiment we just did to estimate if the Magic’s rely as much on a single player as the Lakers do. 

Page 10: Measuring Economic Inequalities

It turns out that the Magics’ scoring distribution is definitely less unequal than the Lakers’. What could we conclude?  That the Lakers are better? Or that Kobe Bryant is better than the all Orlando team? We can be sure is that if they were to play again, Kobe would be very likely to be the best scorer. We are also pretty confident when we argue the Lakers would be seriously harmed if Kobe were to be injured of could not play for any reason. You can try to run the experiment for the San Antonio Spurs and you would highlight that their scoring distribution is even more equal that the Magics’.

Now have a look to your college basketball team data and repeat the exercise once again. Assume a player is very good and you find a very unequal distribution. For some reason, your team is about to play the Lakers tonight. According to the Gini coefficient, would you be ready to bet some money you will win the game? Or would you bet that your best scorer will end up with more field goals than Kobe? Probably not much, once again we are facing a comparison problem. We are not really talking about the same thing when comparing NCAA to NBA. Players are different, the level of experience is completely different, and even the rules have nothing to do with each others! We could not seriously derive a conclusion. Eventually, one might even argue that if economics is a science, even imperfect, sport is much more unpredictable.   

[1] RAY Debraj, Development economics, 1998, Princeton university press, Princeton NJ[2] http://www.census.gov/prod/2009pubs/p60-236.pdf