lecture 3-4 summarizing relationships among variables ©

25
Lecture 3-4 Lecture 3-4 Summarizing Summarizing relationships relationships among variables among variables ©

Upload: dion-keedy

Post on 28-Mar-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lecture 3-4 Summarizing relationships among variables ©

Lecture 3-4Lecture 3-4

Summarizing Summarizing relationships relationships

among variablesamong variables

©

Page 2: Lecture 3-4 Summarizing relationships among variables ©

Biases in OLS coefficientsBiases in OLS coefficients Suppose you are interested in estimating

the effects of education on wage.

If your data contain IQ, then you can estimate: Model 1: (wage)=β0+β1(education)+β2(experience)+

β3(IQ)

If your data do not have IQ, then you can only estimate:

Model 2: (wage)=β0+β1(education)+β2(experience)

IQ not included in the model

Page 3: Lecture 3-4 Summarizing relationships among variables ©

By using the `Returns on Education 2’ data set the estimated models are:

Model 1: Wage= -539.4 + 58.1(education)+17.4(experience)+5.6(IQ)

Model 2: Wage= -272.5+ 76.2(education)+ 17.6(experience)

Thus, the effect of education appear to be much larger for model 2.

Page 4: Lecture 3-4 Summarizing relationships among variables ©

The reason why the effect of education appear to be higher for model 2 is that model 2 is suffering from `omitted variable biases’.

When there is an omitted variable that affects both education and wage directly, the estimated effect of education will be biased.

Page 5: Lecture 3-4 Summarizing relationships among variables ©

Model 2: (wage)=β0+β1(education)+β2(experience)

IQ

(+)positively affects education(+)positively affects wage

Since IQ affects education positively, and at the same time, affects wage positively, β1 captures the mixed effects of education and IQ on wage.

In this case, β1 is biased upward (i.e., β1 overstates the true effects of education)

Page 6: Lecture 3-4 Summarizing relationships among variables ©

In general, suppose that there is an omitted variable Z, that affects both X and Y.

β1 is biased upward if Y=β0+β1X

             or Y= β0+β1X   

β1 is biased downward if Y=β0+β1X

or Y=β0+β1X

Z

(+)(+)

Z(+)

(‒)

Z(+)(‒)

(‒)Z

(‒)

Page 7: Lecture 3-4 Summarizing relationships among variables ©

When a coefficient suffers from the `omitted bias’ problem, the coefficient does not show the causal effects.

There are many other situations where a coefficient can be biased.

Many econometric techniques focus on eliminating these biases (in order to estimate causal effects).

Page 8: Lecture 3-4 Summarizing relationships among variables ©

1. Panel Data1. Panel DataIntroductionIntroduction

Panel Data is a data set that contains repeated observations over time.

Panel data is often used by researchers to extract the `causal’ effect of one variable on another variable.

The purpose of this lecture, however, is to familiarize you with this form of data.

Page 9: Lecture 3-4 Summarizing relationships among variables ©

Panel Data Panel Data -Example--Example-

Open “Panel Data Exercise”.

This data set contains production data of several construction companies for the period between 1990 and 1997. Production for each company is measured by the total material moved in tones. Employment is measured by the number of persons employed. Equipment is measured by the sum of engine powers for all the equipment used.

Page 10: Lecture 3-4 Summarizing relationships among variables ©

Panel Data Panel Data -Example--Example-

Notice that for each company, observations are collected for several years: you have repeated observations for the same company over time. This is an example of a panel data.

Suppose you would like to know how many employees you have to hire in order to achieve a certain level of production.

The simplest model would be:

(Production)=β0+β1(Employment)+β2(Equipment)

Page 11: Lecture 3-4 Summarizing relationships among variables ©

Panel Data Panel Data -Example--Example-

However, when we use panel data, we consider the “year effects” as well.

“Year effect” refers to the aggregate effect of unobserved factors that affect production of all the companies equally in a particular year. For example, the government may have relaxed the requirements for environmental regulation for the construction industry in a particular year. Then, such a policy would affect the production of all the construction companies equally. Next Slide

Page 12: Lecture 3-4 Summarizing relationships among variables ©

Panel Data Panel Data -Example: Year effect--Example: Year effect-

If such a change in governmental regulation is not observed by the analysts and if we (as data analysts) do not take such an unobserved factor into consideration, we may mistakenly attribute such year effects to “employment” or “equipment”. This may give an inflated (or deflated) image of the effects of employment or equipment on the production level. Next Slide

Page 13: Lecture 3-4 Summarizing relationships among variables ©

Panel Data Panel Data -Incorporating year effects in -Incorporating year effects in

the model-the model-

The simplest way to incorporate the “year effects” in the model is to incorporate “year dummy variables” in the model.

Often year dummy variables are called “year dummies”.

The following slides show how to construct year dummy variables.

Page 14: Lecture 3-4 Summarizing relationships among variables ©

Panel Data Panel Data -Constructing “year dummy -Constructing “year dummy

variables”-variables”-

We take the “Panel Data exercise: Data A” as an example. This panel data covers the period between 1990 and 1999. Then for each year, except the first year in the data, you construct the dummy variable in the way described in the box.

otherwise 0

1999 isyear if 199

otherwise 0

1992 isyear if 192

otherwise 0

1991 isyear if 191

dummiesyear ofon Constructi

year

year

year

Page 15: Lecture 3-4 Summarizing relationships among variables ©

Panel Data Panel Data -Incorporating “year dummy -Incorporating “year dummy

variables” in the model-variables” in the model-

After constructing the year dummies, we can incorporate these dummy variables in the model in the following way:

(Production)=β0+β1(Employment)+β2(Equipment)+β3Year91+ β4Year92+ β5Year93+ β6Year94+ β7Year95+ β8Year96+ β9Year97+ β10Year98+ β11Year99

Page 16: Lecture 3-4 Summarizing relationships among variables ©

Year dummies: exerciseYear dummies: exercise

Use “Panel Data Exercise” Data A to construct the year dummy variables.

Page 17: Lecture 3-4 Summarizing relationships among variables ©

More exercisesMore exercises Exercise 1. Use the data you constructed in the

previous exercise, estimate the effect of employment and equipment on the production level using the following model. Make sure to incorporate year dummy variables in your model.

(Production)=β0+β1(Employment)+β2(Equipment)+β3Year91+ β4Year92+ β5Year93+ β6Year94+ β7Year95+ β8Year96+ β9Year97+ β10Year98+ β11Year99

Page 18: Lecture 3-4 Summarizing relationships among variables ©

More exercisesMore exercises Exercise 2: Using the results of exercise 1

answer the following questions.

Exercise 2-1: If a firm hires 600 workers and use the equipment equal to 4000, what would be the expected production of the firm. Assume that the year effect is equal to the year effect of 1998. (For this type of question use all the coefficients, even if some of them are not statistically significant.)

Exercise 2-2: Suppose that the firm is using equipment equal to 5000. If the firm would like to achieve 7000 tones of production, how many workers does it have to hire? Assume that the year effect is the same as the year effect of 1998.

Page 19: Lecture 3-4 Summarizing relationships among variables ©

Notes about year dummy Notes about year dummy variablesvariables

When you use panel data, construct year dummy variables except the first year. (More precisely speaking, there must be at least one year for which you do not use year dummy.)

If you include a year dummy for all the years, including the first year, you will have a problem called perfect multi-colinearity. If this happens, OLS regression procedure will not work anymore. (Excel will automatically drop one year dummy.)

Page 20: Lecture 3-4 Summarizing relationships among variables ©

2.Policy analysis using panel 2.Policy analysis using panel datadata

Regression analysis is widely used for policy analysis.

Examples of policy analysis include the analysis of:

Effect of governmental subsidies on small-medium enterprises , on the growth of these enterprises.

Effect of job training on the wage of workers. Effect of changing the package of a product on

the revenue from the product. Effect of changing the compensation scheme on

the productivity of firms.

Page 21: Lecture 3-4 Summarizing relationships among variables ©

Example: The effect of changing the Example: The effect of changing the compensation scheme on the compensation scheme on the

productivity productivity

We continue using the “Panel Data Exercise” data set.

Some of the construction companies in the data set began to introduce a new compensation scheme called “productivity bonus”. The productivity bonus is tied to the amount of production (i.e., The company pays $0.003 for each tone of material moved, etc).

We would like to see if the productivity bonus scheme has increased the productivity of these companies, and if so by how much.

Page 22: Lecture 3-4 Summarizing relationships among variables ©

Example: The effect of changing the Example: The effect of changing the compensation scheme on the compensation scheme on the

productivity, contdproductivity, contd The simplest way to evaluate the effect of

productivity bonus is to incorporate a dummy variable for productivity bonus. We can construct a dummy variable for productivity bonus in the following way.

(Productivity bonus dummy)=1 if productivity bonus exists. =0 if productivity bonus does

not exists.

Such a dummy variable is often called the “policy dummy variable” since this dummy variable shows if a particular policy (compensation scheme in this example) exists or not.

Page 23: Lecture 3-4 Summarizing relationships among variables ©

Example: The effect of changing the Example: The effect of changing the compensation scheme on the compensation scheme on the

productivity, contdproductivity, contd Open the data “Panel Data Exercise,

Data C”. This data contain the productivity bonus dummy.

Notice that from 1993 some of the companies began to introduce the productivity bonus scheme. At the end of the sample period (year 1999), productivity bonus has become fairly prevalent. (6 out of 13 firms are using the productivity bonus)

Page 24: Lecture 3-4 Summarizing relationships among variables ©

Example: The effect of changing the Example: The effect of changing the compensation scheme on the compensation scheme on the

productivityproductivity

Estimate the effect of the productivity bonus on the production by estimating the following model:

(Production)=β0+β1(Employment)+β2(Equipment) +β3(Productivity Bonus Dummy) +β4(Year91) +β5(Year92) +β6(Year93) +β7(Year94) +β8(Year95) +β9(Year96) +β10(Year97) +β11(Year98) +β12(Year99)

Page 25: Lecture 3-4 Summarizing relationships among variables ©

Summary for policy analysis Summary for policy analysis using panel datausing panel data

Construct a policy dummy variable (productivity bonus dummy for our example)

Construct year dummies for all years except the first year.

Estimate a model including the policy dummy variable and year dummies. The coefficient for the policy dummy variable can be interpreted as the effect of the policy.