lecture 3-3 summarizing r relationships among variables © 1

42
Lecture 3-3 Lecture 3-3 Summarizing Summarizing relationships relationships among variables among variables © 1

Upload: damon-davis

Post on 28-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 3-3 Summarizing r relationships among variables © 1

Lecture 3-3Lecture 3-3

Summarizing Summarizing

rr relationships relationships among variablesamong variables

© 1

Page 2: Lecture 3-3 Summarizing r relationships among variables © 1

Topics covered in this Topics covered in this lecture notelecture note

We will cover several topics about ordinary least square estimation.

1. Testing the statistical significance of the estimated coefficient using t-statistics (i.e., testing whether advertisement spending has any effect on revenue).

2. Ordinary Least Square estimation when there are more explanatory variables.

3. An introduction to panel data (repeated observations over time)

2

Page 3: Lecture 3-3 Summarizing r relationships among variables © 1

1. Testing the statistical significance 1. Testing the statistical significance of the estimated coefficient: Exampleof the estimated coefficient: Example

Advertisement and revenue Product II

y = 13.451x + 15440

0

5000

10000

15000

20000

25000

30000

35000

0 20 40 60 80 100 120

Advertisement spending in 1000 yen

Revenue in 1

000 y

en

•The graph above shows a relationship between advertisement spending and revenue along with the estimated linear equation.

•The estimated slope coefficient is 13.4. This means that every 1000 yen you spend on advertisement, revenue increases by 13.4 thousand yen. Next Page 3

Page 4: Lecture 3-3 Summarizing r relationships among variables © 1

Testing the statistical significance of Testing the statistical significance of the estimated coefficient: Example, the estimated coefficient: Example,

contdcontdAdvertisement and revenue Product II

y = 13.451x + 15440

0

5000

10000

15000

20000

25000

30000

35000

0 20 40 60 80 100 120

Advertisement spending in 1000 yen

Revenue in 1

000 y

en

However, the graph also seems to indicate that there is not much relationship between advertisement spending and revenue.

When we estimate a linear equation, we typically would like to know if advertisement has any effect on the revenue. To answer such a question, just estimating β0 and β1 is not enough. We need more information. 4

Page 5: Lecture 3-3 Summarizing r relationships among variables © 1

Testing the statistical significance of Testing the statistical significance of the estimated coefficient: Example, the estimated coefficient: Example,

contdcontdAdvertisement and revenue Product II

y = 13.451x + 15440

0

5000

10000

15000

20000

25000

30000

35000

0 20 40 60 80 100 120

Advertisement spending in 1000 yen

Revenue in 1

000 y

en

The following slides describe the procedure to answer the following question: “Would the advertisement have any impact on the revenue?”

5

Page 6: Lecture 3-3 Summarizing r relationships among variables © 1

Testing the statistical significance of Testing the statistical significance of the estimated coefficient: Example, the estimated coefficient: Example,

contdcontd

To test if advertisement spending has any impact on the revenue, we need to test whether the slope coefficient is “significantly” different from zero.

1. If the slope coefficient is significantly different from zero, we may conclude that advertisement spending has some effect on the revenue.

2. If the slope coefficient is not significantly different from zero, we may conclude that advertisement spending has no effect on the revenue.

Then, what would be the criterion to decide whether the slope coefficient is “significantly” different from zero?

See next slide

6

Page 7: Lecture 3-3 Summarizing r relationships among variables © 1

Testing the statistical significance of Testing the statistical significance of the estimated coefficient: Example, the estimated coefficient: Example,

contdcontd

To decide whether the slope coefficient is significantly different from zero, we use “t-statistic”.

OLS estimation procedure estimates much more than β0 andβ1 , also it includes t-statistic. Now, we will obtain some of extra information from OLS estimation using Excel.

7

Page 8: Lecture 3-3 Summarizing r relationships among variables © 1

Testing the statistical significance of Testing the statistical significance of the estimated coefficient: Example, the estimated coefficient: Example,

contdcontd

Open Data set “OLS Exercise 2-Advertisement and Revenue”.

This is the data set used to produce the graph in the previous slides.

Now, use “Data Analysis” to estimate the following Model

(Revenue)= β0+β1(Advertisement Spending)

8

Page 9: Lecture 3-3 Summarizing r relationships among variables © 1

Testing the statistical significance of Testing the statistical significance of the estimated coefficient: Example, the estimated coefficient: Example,

contdcontd

• The table above is the result of OLS regression.

1. Intercept Coefficient (β0)=15440.18

2. Slope Coefficient(β1)=13.45

3. We have some extra information, such as standard error and t statistic (t-Stat in the table). These are pieces of information needed to test whether slope coefficient is significantly different from zero. See next slides

 Coefficie

ntsStandard

Errort Stat

P-value

Lower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

Intercept15440.1

82796.81

1

5.520639

5.87E-05

9478.923

21401.45

9478.923

21401.45

Advertisement Spending

13.45107

60.32826

0.222965

0.826571

-115.13

6

142.0377

-115.13

6

142.0377

9

Page 10: Lecture 3-3 Summarizing r relationships among variables © 1

Testing the statistical significance of Testing the statistical significance of the estimated coefficient: Examplethe estimated coefficient: Example

-Standard Error- -Standard Error-

Since data contain a lot of noise (unexpected rises and falls in revenue, etc), the effect of advertisement on revenue (β1) is estimated with some error.

Standard errors show the expected error in the estimation of the coefficients. Next Slides

 Coefficie

ntsStandard

Errort Stat

P-value

Lower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

Intercept15440.1

82796.81

1

5.520639

5.87E-05

9478.923

21401.45

9478.923

21401.45

Advertisement Spending

13.45107

60.32826

0.222965

0.826571

-115.13

6

142.0377

-115.13

6

142.0377

10

Page 11: Lecture 3-3 Summarizing r relationships among variables © 1

Testing the statistical significance of Testing the statistical significance of the estimated coefficient: Examplethe estimated coefficient: Example

-Standard Error, contd- -Standard Error, contd-

For example, the standard error for the slope coefficient is 60.3. This means that there would be an error in the estimate of the slope coefficient (β1) of about ± 60.3 on average.

Thus, the smaller the standard error for (β1) is , the more precise the estimate of the impact of advertisement is.

 Coefficien

tsStandard

Errort Stat

P-value

Lower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

Intercept15440.1

82796.81

1

5.520639

5.87E-05

9478.923

21401.45

9478.923

21401.45

Advertisement Spending

13.45107

60.32826

0.222965

0.826571

-115.13

6

142.0377

-115.13

6

142.0377

11

Page 12: Lecture 3-3 Summarizing r relationships among variables © 1

Testing the statistical significance of Testing the statistical significance of the estimated coefficient: Examplethe estimated coefficient: Example

-t statistic- -t statistic-

•t-statistic is obtained by dividing the coefficient by its standard error. For example, t-statistic for the slope coefficient is

13.45107/60.32825=0.222965

•Our confidence that the advertisement spending has some impact on revenue increases if t-statistic increases (because this happens when the standard error decreases or the coefficient increases)

•We use t-statistic to test whether the slope coefficient is significantly different from zero.

 Coefficie

ntsStandard

Errort Stat

P-value

Lower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

Intercept15440.1

82796.81

1

5.520639

5.87E-05

9478.923

21401.45

9478.923

21401.45

Advertisement Spending

13.45107

60.32826

0.222965

0.826571

-115.13

6

142.0377

-115.13

6

142.0377

12

Page 13: Lecture 3-3 Summarizing r relationships among variables © 1

The procedure to test the statistical The procedure to test the statistical significance of the estimated significance of the estimated

coefficientcoefficient

The following is the procedure to test if a coefficient is significantly different from zero.

1. Obtain t-statistic2. Check if the absolute value of the t-statistic is

greater than or equal to 2 (that is, t-stat≤‒2 or t-stat≥+2)

3. If the absolute value of the t-statistic is greater than (or equal to) 2, the coefficient is statistically significantly different from zero

4. If the absolute value of the t-statistic is smaller than 2, then the coefficient is not statistically significantly different from zero

13

Page 14: Lecture 3-3 Summarizing r relationships among variables © 1

A note on the test of statistical A note on the test of statistical significance of the estimated significance of the estimated

coefficient 1coefficient 1When the coefficient is statistically

significantly different from zero, we simply say “the coefficient is statistically significant”.

1.If the coefficient is statistically significant, we conclude that the advertisement spending has some impact on the revenue.

2.If the coefficient is not statistically significant, we concluded that the advertisement spending has no impact on the revenue.

14

Page 15: Lecture 3-3 Summarizing r relationships among variables © 1

A note on the test of statistical A note on the test of statistical significance of the estimated significance of the estimated

coefficient 2 (Optional)coefficient 2 (Optional)

The criterion value for t-statistic that we used for testing the statistical significance was 2. More precisely speaking, this criterion value depends on the number of observations and the number of parameters to be estimated. This topic will be discussed more in detail later in the class. When you use the criterion value of 2, roughly speaking, you are testing the statistical significance of the slope coefficient at the 5% significance level.

15

Page 16: Lecture 3-3 Summarizing r relationships among variables © 1

ExerciseExercise Exercise 1: Open data “Statistical

Significance Exercise”. Use Product A data to estimate the effect of promotion on the revenue by estimating the following model. Pay particular attention to the statistical significance of the slope coefficient.

(Revenue)=β0+β1(Number of promotion) Exercise 2: Use data “Statistical

Significance Exercise”. Use Product C data to estimate the same model.

16

Page 17: Lecture 3-3 Summarizing r relationships among variables © 1

Exercise 1 AnswerExercise 1 Answer

The estimated effect of the promotion on the revenue is 99060.15, with t-statistic equal to 5.07. Since t-statistic is greater than 2, we conclude that the effect of the promotion on the revenue is statistically significant. Given the statistical significance of the coefficient, the estimated slope coefficient of 99060 indicates that, if we increase the number of promotion by one, the revenue is likely to increase by 99060 yen.

 Product A

Coefficients

Standard Error

t Stat P-valueLower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

Intercept

105827.1

254311.3

0.416132

0.689775

-495524707177.

8-495524

707177.8

Number of promotions

99060.15

19523.94

5.073779

0.001441

52893.37

145226.9

52893.37

145226.9

17

Page 18: Lecture 3-3 Summarizing r relationships among variables © 1

Exercise 2 AnswerExercise 2 Answer

The estimated effect of promotion on the revenue is -11751.1 with t-statistic equal to -1.3. Since the absolute value of t-statistic is smaller than 2, we conclude that the slope coefficient is not statistically significant. In other word, we did not find evidence that promotion has any impact on the revenue from the product C.

Product C 

Coefficients

Standard Error

t Stat P-valueLower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

Intercept

341540.1

111203.4

3.07131

0.018034

78585.82

604494.4

78585.82

604494.4

Number of promotions

-11751.

1

8970.74

-1.3099

3

0.231567

-32963.

5

9461.373

-32963.

5

9461.373

18

Page 19: Lecture 3-3 Summarizing r relationships among variables © 1

2. OLS with multiple 2. OLS with multiple explanatory variablesexplanatory variables

IntroductionIntroduction

So far, we have considered a model with only one explanatory variable.

Y=β0+β1X

Often, we have more than one explanatory variable. For example, in addition to promotion, the company may increase the number of sales persons. If we have data about the number of sales persons, we can also incorporate such a variable.

19

Page 20: Lecture 3-3 Summarizing r relationships among variables © 1

OLS with multiple regressorsOLS with multiple regressors-Example: Returns on Education--Example: Returns on Education-

Suppose you are considering to pursue more education (going to graduate school, etc). Then you may want to know if this is worth your effort.

20

Page 21: Lecture 3-3 Summarizing r relationships among variables © 1

OLS with multiple regressorsOLS with multiple regressors-Example: Returns on Education--Example: Returns on Education-

To investigate by how much the extra education increases your future salary we can utilize OLS regression.

Open data “Returns on education”. This data contain three variables. These are data collected for 935 persons. For each person, data contain information about weekly wage in dollars, number of years of education, and number of years of work experience.

As an exercise, find the mean, variance and standard deviation for the three variables.

21

Page 22: Lecture 3-3 Summarizing r relationships among variables © 1

OLS with multiple regressionOLS with multiple regression-Example: Returns on Education--Example: Returns on Education-

To investigate the effect of education on wage, we may estimate the OLS regression: (wage)=β0+β1(education).

However, wage is affected not only by education, but also the number of years of work experience. Therefore, it seems better to incorporate “work experience” in the model.

The simplest way to incorporate experience in the model is the following:

(wage)=β0+β1(education)+β2(experience)

Notice, that this OLS equation has two explanatory variables on the right hand side of the equation.

22

Page 23: Lecture 3-3 Summarizing r relationships among variables © 1

OLS with multiple regressorsOLS with multiple regressors-Example: Returns on Education--Example: Returns on Education-

Excel estimates coefficients β0, β1 and β2 automatically

(wage)=β0+β1(education)+β2(experience)

The estimated β1 is the effect of education on wage, holding experience constant. This is the big advantage of OLS with multiple explanatory variables. When we look at data, education and experience vary at the same time, so it is difficult see the effect of education separately from the effect of experience just by looking at the data. By incorporating these two variables we can separate the effect of experience from the effect of education.

Exercise: Estimate the model above using Excel.23

Page 24: Lecture 3-3 Summarizing r relationships among variables © 1

OLS with multiple regressorsOLS with multiple regressors-Example: Returns on Education--Example: Returns on Education-

•Estimated β0=-272.5, β1=76.2 and β2=17.6

•Also notice that t-statistic for β1 is 12.1, which is bigger than 2. Therefore, the estimated β1 is statistically significant. Therefore, education does have an impact on wage.

•Given the statistical significance of β1, we can say that, holding experience constant, increasing the year of education by one year would increase the weekly wage by $76.2.

•This also means that if you go to graduate school for 2 years, your annual salary would increase by $76.2*(52 weeks)*(2 years)=$7924.8

 Coefficien

tsStandard Error t Stat P-value

Lower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

Intercept-

272.528107.262709

4-2.54075

0.01122266

-483.0

32

-62.02

34

-483.0

32

-62.02

34

education (in years)

76.21639

6.296603998

12.104361.98778E-31

63.85922

88.57355

63.85922

88.57355

work experience in years)

17.63777

3.1617754 5.5784393.18016E-08

11.43275

23.84279

11.43275

23.84279

24

Page 25: Lecture 3-3 Summarizing r relationships among variables © 1

Exercise 2Exercise 2 Open Data “Returns on education 2”

This is the same data set as “Returns on education 1”, except that it has more variables. This data set contains information about the age of the person, and IQ test score of the person.

Exercise: Add IQ to the model. Does this change the results?

25

Page 26: Lecture 3-3 Summarizing r relationships among variables © 1

OLS with multiple variables: OLS with multiple variables: ApplicationApplication

-Making a model more -Making a model more flexible-flexible-

When you specify a model for OLS estimation, the first criterion is the simplicity.

(Revenue)=β0+β1(Promotion)

Such a simple equation gives a clear idea of the effect of promotion on revenue.

However, simplicity comes with a cost: It is often not flexible.

26

Page 27: Lecture 3-3 Summarizing r relationships among variables © 1

OLS with multiple variables: OLS with multiple variables: ApplicationApplication

-Making a model more flexible--Making a model more flexible-

The model implicitly assumes that the effect of increasing the number of promotion by one does not change revenue. That is, the model assumes that the effect of increasing the number of promotion from 10 to 11 is the same as the effect of increasing the number of promotion from 40 to 41.

However, it is reasonable to think that the effect of promotion would diminish due to the law of diminishing marginal return.

See the next example.

27

Page 28: Lecture 3-3 Summarizing r relationships among variables © 1

-Making a model more flexible. -Making a model more flexible. An exampleAn example

Open the data set “Making a model more flexible”. This data show the relationship between number of promotion and revenue for product D.

Plot the relationship between the number of promotion and revenue, then describe the relationship.

28

Page 29: Lecture 3-3 Summarizing r relationships among variables © 1

-Making a model more flexible: -Making a model more flexible: An exampleAn example

Product D: Number of promotions and revenue

0

500,000

1,000,000

1,500,000

2,000,000

2,500,000

3,000,000

3,500,000

0 5 10 15 20 25 30 35 40 45

Number of promotions

Reve

nue in y

en

•The relationship seems to be a curve, not a straight line.

•The effectiveness of promotion seems to be diminishing as the number of promotion increases.

•How do we incorporate the“diminishing effectiveness” of promotion in the model?

29

Page 30: Lecture 3-3 Summarizing r relationships among variables © 1

-Making a model more flexible: -Making a model more flexible: An example-An example-

To incorporate the “diminishing effectiveness” in the model we need to specify the model that can “curve”.

A simple way to achieve this is to estimate the following model:

(Revenue)=β0+β1(Number of promotion)

+β2(Number of promotion)2

30

Page 31: Lecture 3-3 Summarizing r relationships among variables © 1

-Making a model more flexible:-Making a model more flexible: Exercise- Exercise-

Use the data “Making a model more flexible” and estimate the following model:

(Revenue)=β0+β1(Number of promotion)

+β2(Number of promotion)2

31

Page 32: Lecture 3-3 Summarizing r relationships among variables © 1

Exercise: AnswerExercise: Answer

•The estimated equation is

(Revenue)=-295299.7+181554.72(Number of promotion)

‒2629.38(Number of promotion)2

•Note the both β1 and β2 are statistically significant.

 Coefficien

tsStandard Error t Stat

P-value

Lower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

Intercept-

295299.7

166846.4598

-1.7698888

7

0.093683

-64583

1

55231.71

-64583155231.7

1

Number of promotions

181554.72

16497.65368

11.00488133

2.01E-09

146894.4

216215146894.

4216215

(Number of promotion)^2

-2629.83

8

359.2650349

-7.3200508 8.48E-07

-3384.

63

-1875.05

-3384.63

-1875.05

32

Page 33: Lecture 3-3 Summarizing r relationships among variables © 1

More exercisesMore exercises

Exercise 1: Using the estimated equation compute “predicted” revenue for each observation.

Exercise 2: Now plot the predicted revenue and the number of promotions. Also plot the actual revenue and promotions, on the same graph. See how well the model predicts the outcome.

33

Page 34: Lecture 3-3 Summarizing r relationships among variables © 1

More exercisesMore exercises

Exercise 3: Using the estimated results, compute the expected increases in revenue when you increase the number of promotion from 10 to 11, and 25 to 26.

34

Page 35: Lecture 3-3 Summarizing r relationships among variables © 1

OLS with multiple variables:OLS with multiple variables:Application 2Application 2

-Dummy Variables--Dummy Variables-

Often, our data contain qualitative variables. For example, if you have data about your clients, for each client you may have data about whether the person is male or female. Such data (about gender) is not a quantitative variable but a qualitative variable.

35

Page 36: Lecture 3-3 Summarizing r relationships among variables © 1

OLS with multiple variables:OLS with multiple variables:Application 2Application 2

-Dummy Variables--Dummy Variables-

However, such a qualitative variable is also important in analyzing data. For example, you would like to answer the following question: “which gender consumes more?”

36

Page 37: Lecture 3-3 Summarizing r relationships among variables © 1

To incorporate such a qualitative variable into the OLS equation, we first convert qualitative information into a quantitative variable called a “dummy variable”.

A dummy variable is a variable that takes 1 if a particular criterion is satisfied, and takes 0 otherwise.

If you would like to incorporate gender information in your model, create the following dummy variable:

Female =1 if the client is female =0 if the client is maleThen you can estimate(Consumer spending)=β0+β1(Number of promotion)

+β2(Female)

37

Page 38: Lecture 3-3 Summarizing r relationships among variables © 1

OLS with multiple variables:OLS with multiple variables:Application 2Application 2

-Dummy Variables--Dummy Variables-

A dummy variable is very versatile. Suppose you would like to know if there is any wage differentials among different races (for example between white and black), then you can use a dummy variable that takes 1 if the person is black, and 0 otherwise.

A dummy variable can be created for many other occasions. The use of a dummy variable is one of the most important techniques in regression analysis.

38

Page 39: Lecture 3-3 Summarizing r relationships among variables © 1

Dummy variable exerciseDummy variable exercise Open Data. “Dummy variable Exercise”. This

data set contains three dummy variables.Black =1 if the person is black =0 otherwiseMarried =1 if the person is married =0 otherwiseSouth =1 if the person lives in South of USA =0 otherwiseUrban =1 if the person lives in urban area =0 otherwise.

39

Page 40: Lecture 3-3 Summarizing r relationships among variables © 1

Dummy variable exerciseDummy variable exercise

Exercise 1: Estimate the following model:

(Wage)=β0+β1(Education)+β2(Experience)

+β3(Age)+ β4(IQ) +β5(Black)

Then interpret the results. 40

Page 41: Lecture 3-3 Summarizing r relationships among variables © 1

Dummy variable Dummy variable exerciseexercise 、、 AnswerAnswer

The coefficient for the dummy variable for black person is -124.6. The t-statistic is -3.19;the absolute value of t-statistic is greater than 2. Therefore, the coefficient is statistically significant. The results indicate that, holding education, experience, age, and IQ constant, the weekly wage is lower for a black person by $124.6. There seems to exist a large wage gap among white and black races.

 Coefficien

tsStandard

Errort Stat P-value

Lower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

Intercept -726.121 165.4365 -4.38912 1.27E-05 -1050.79 -401.448 -1050.79 -401.448

education 52.70889 7.266112 7.25407 8.52E-13 38.44899 66.96879 38.44899 66.96879

experience 11.27217 3.699921 3.046597 0.00238 4.010995 18.53334 4.010995 18.53334

age 13.38011 4.646612 2.879541 0.004074 4.261035 22.49918 4.261035 22.49918

IQ 4.119113 0.997874 4.127889 3.99E-05 2.160765 6.077462 2.160765 6.077462

black -124.653 39.04528 -3.19253 0.001458 -201.28 -48.0259 -201.28 -48.0259

41

Page 42: Lecture 3-3 Summarizing r relationships among variables © 1

Dummy variable:Dummy variable:More exercisesMore exercises

Use data “Dummy Variable Exercise”. Specify your own model, estimate, and interpret the results.

42