multiple regression & ols violations week 4 lecture mg461 dr. meredith rolfe

76
Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Upload: jeremy-richard

Post on 13-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Multiple Regression & OLS violations

Week 4 Lecture

MG461

Dr. Meredith Rolfe

Page 2: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Which group are you in?Which group are you in?

1 2 3 4 5 6 7 8

10%

11%

21%

8%

13%

8%

16%

11%

1. Group 12. Group 23. Group 34. Group 45. Group 56. Group 67. Group 78. Group 8

Page 3: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Key Goals of the Week

• What is multiple regression?• How to interpret regression results:

• estimated regression coefficients• significance tests for coefficients

• Violations of OLS assumptions• Diagnostics• What to do

MG461, Week 3 Seminar 3

Page 4: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

MULTIPLE REGRESSION

Page 5: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

When to use Regression

• We want to know whether the outcome, y, varies depending on x

• Continuous variables (but many exceptions)• Observational data (mostly)• The relationship between x and y is linear

MG461, Week 3 Seminar 5

Page 6: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Simple Linear Model

MG461, Week 3 Seminar 6

Page 7: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Regression is a set of statistical tools to model the conditional expectation…

1 2

76%

24%

1. of one variable on another variable.

2. of one variable on one or more other variables.

Page 8: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Multiple Regression

Compensation

PerformanceSize of

Company Years worked

Ratings of Supervisor

Opportunity to learn

Critical of poor

performance

Handles complaints

Page 9: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Which best accounts for variation in supervisor ratings?

1 2 3 4

5%

21%

47%

28%

1. Does not allow special privileges.

2. Opportunity to learn.

3. Too critical of poor performance.

4. Handles employee complaints.

Page 10: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Simple linear model: Rating vs. No Special Privileges

Estimate (s.e.)

(Constant) 42.11***(9.27)

No special privileges

0.42*(0.17)

n=R2=

300.15

Note on significance of coefficients:***p < 0.001 **p < 0.01 *p < 0.05 . p < 0.1

Source: Chatterjee et al, Regression Analysis by Example

Page 11: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

SPSS output -> Regression Table

Estimate (s.e.)

(Constant) 42.11***(9.27)

No special privileges

0.42*(0.17)

n=R2=

300.15

βhat0

βhat1

se(βhat0)

se(βhat1)

ignoret(βhat0-0)t(βhat1-0)x variable

Page 12: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

42% of employees value supervisors who don’t grant special privileges?

1. Yes2. No

32%68%

Estimate (s.e.)

(Constant) 42.11***(9.27)

No special privileges

0.42*(0.17)

n=R2=

300.15

Page 13: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Simple linear model #2:Rating vs. Opportunity to Learn

Estimate (s.e.)

(Constant) 28.17***(8.81)

Opportunity to learn

0.65*(0.15)

n=R2=

300.37

Note on significance of coefficients:***p < 0.001 **p < 0.01 *p < 0.05 . p < 0.1

Page 14: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Model 1 Model 2 Model 3 Model 4 Model 5 Model 6

(Constant) 42.11***(9.27)

28.17***(8.81)

14.38*(6.62)

19.98(11.69)

50.24**(17.31)

56.76***(9.74)

No special privileges 0.42*(0.17)

Opportunity to learn 0.65*(0.15)

Handles complaints 0.75***(0.15)

Raises based on performance

0.69***(0.18)

Too critical of poor performance

0.19(0.23)

Rate of advancing to better jobs

0.18(0.22)

n=R2=

300.15

300.37

300.68

300.35

300.02

300.02

Page 15: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Are these good estimates of the relationship between x and y?

1 2

44%

56%1. Yes2. No

Page 16: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Multiple potential explanations…

• Experimental Controls:• Random

assignment• Experimental

Design• Observational

data analysis:• Statistical

Controls

Ratings of Supervisor

No special privileges

Opportunity to learn

Critical of poor

performance

Handles complaints

Page 17: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Multiple Regression Model

MG461, Week 3 Seminar 17

DependentVariable

IndependentVariables

Intercept

Coefficients

Error

Observation or data point, i, goes from 1…n

Page 18: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

WHICH MODEL PARAMETER DO WE NOT NEED TO ESTIMATE?

1 2 3 4

5%

20%

5%

70%

1. Β0

2. x1,i

3. βp

4. σ2

Page 19: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Multiple RegressionOLS Estimates (matrix)

Y = Xβ +ε

Page 20: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 ALL

(Constant) 42.11***(9.27)

28.17***(8.81)

14.38*(6.62)

19.98(11.69)

50.24**(17.31)

56.76***(9.74)

10.79(11.59)

No special privileges 0.42*(0.17)

-0.07(0.14)

Opportunity to learn 0.65*(0.15)

0.32(0.16)

Handles complaints 0.75***(0.15)

0.61***(0.16)

Raises based on performance

0.69***(0.18)

0.082(0.22)

Too critical of poor performance

0.19(0.23)

0.038(0.14)

Rate of advancing to better jobs

0.18(0.22)

-0.21(0.17)

n=R2=

300.15

300.37

300.68

300.35

300.02

300.02

300.73

Page 21: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Significance of Results

Model Significance• H0: None of the 1 (or more)

independent variables covary with the dependent variable

• HA: At least one of the independent variables covaries with d.v.

• Application: compare two fitted models

• Test: Anova/F-Test • **assumes errors (ei) are

normally distributed

Coefficient Significance• H0: ß1=0, there is no

relationship (covariation) between x and y

• HA: ß1≠0, there is a relationship (covariation) between x and y

• Application: a single estimated coefficient

• Test: t-test**assumes errors (ei) are

normally distributed

MG461, Week 3 Seminar 21

Page 22: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Comparing Models: AnovaComplaints

onlyComplaints

& LearnALL

(Constant) 14.38*(6.62)

9.87(7.06)

10.79(11.59)

No special privileges -0.07(0.14)

Opportunity to learn 0.21(0.13)

0.32(0.16)

Handles complaints 0.75***(0.15)

0.64***(0.12)

0.61***(0.16)

Raises based on performance

0.082(0.22)

Too critical of poor performance

0.038(0.14)

Rate of advancing to better jobs

-0.21(0.17)

n=R2=

300.68

300.71

300.73

Anova Model Comparison

All Variables (Full) vs.Complaints & Learn:F=0.53 p=0.72

Complaints & Learn vs. Complaints:F=2.47 p=0.13

Page 23: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

SPEED PRACTICE: INTERPRETING REGRESSION RESULTS

1) p-values & significance2) Coefficients significant from tables2) substantive interpretation of coefficients

Page 24: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Does “Critical” have an effect on supervisor ratings?

33%67%

Coefficient s.e. t p-value (sig)

(Constant) 10.79 11.59 0.93 0.36

No special privileges -0.07 0.14 -0.54 0.60

Opportunity to learn 0.32 0.16 3.81 0.07

Handles complaints 0.61 0.16 1.90 0.009

Raises based on performance 0.082 0.22 0.26 0.80

Too critical of poor performance 0.038 0.14 0.37 0.72

Rate of advancing to better jobs -0.21 0.17 -1.22 0.24

R2

n0.73336

1. Yes2. No

0%0%

Page 25: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Coefficient s.e. t p-value (sig)

(Intercept) -149.6 117.9e+02 -1.27 0.21

Average Income 5.077e-06 1.640e-03 0.003 0.998

% Metropolitan -5.062e-03 3.129e-01 -0.016 0.987

Average Taxes -3.974e-02 1.505e-02 -2.64 0.012

Average Education 2.73 1.22 2.25 0.030

Temperature 0.76 0.90 0.84 0.41

R2

n0.2848

Does Income have an effect on Immigration Rate?

50%50%

1. Yes2. No

0%0%

Page 26: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Does having a HS Degree affect salary?

Coefficient s.e. t p-value (sig)

Intercept 11031.81 383.22 28.79 0.000

Years Experience 546.18 30.52 17.90 0.000

HS Degree -2996.21 411.75 -7.28 0.000

B.S. Degree 147.82 387.66 0.38 0.705

Management (1=Yes) 6883.53 313.9 21.90 0.000

R2

n0.95746

1. Yes2. No

0%0%

Countdown

10

Page 27: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Coefficient s.e. t p-value (sig)(Intercept) 5.32 0.10 50.86 0.000

Runs 0.0045 0.004 1.00 0.32

Hits 0.012 0.002 5.14 0.00

Home Runs 0.039 0.008 4.81 0.00

Strike Outs -0.008 0.002 -3.63 0.0003

R2

n0.49337

Do strike outs affect salary?

95%5%

1. Yes2. No

0%0%

Page 28: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Coefficient s.e. t p-value (sig)(Intercept) 103.3 245.6 0.42 0.67

Average age 4.52 3.22 1.40 0.17

% with HS Degree -0.062 0.81 -0.076 0.94

Average Income 0.019 0.010 1.86 0.070

% Black 0.36 0.48 0.73 0.47

% Female -1.05 5.56 -0.19 0.85

Avg. Price of Cigarettes -3.25 1.03 -3.16 0.0029

R2

n0.3250

Does %Female affect Cigarette Sales?

11%89%

1. Yes2. No

0%0%

Page 29: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

PRACTICE 2:SIGNIFICANT COEFFICIENTS IN TABLES

Page 30: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Does Total Employment affect CEO Compensation?

1. Yes2. No

86%14%

Page 31: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Does Restructuring Affect Firm ROA?

1. Yes2. No

14%86%

Page 32: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Does firm sales growth affect the length of CEO tenure?

1. Yes2. No

75%25%

Page 33: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Does Total Employment affect CEO Compensation?

1. Yes2. No

82%18%

Page 34: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Are employees more aggressive when their job is stressful?

1. Yes2. No

44%56%

Page 35: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Does employee turnover affect Firm Productivity?

1. Yes2. No

91%9%

Page 36: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

PRACTICE 3:INTERPRETING COEFFICIENTS

Page 37: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

High values of 1983 centralization product a(n) ….. in current centralization

1. Increase2. Decrease

2%98%

Page 38: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Corporations are more likely to enter petitions when their market share is…

1. High2. Low

81%19%

Page 39: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Starting compensation is a good predictor of current compensation?

1. True2. False

68%32%

Page 40: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Managers at larger firms get paid more?

1. True2. False

18%82%

Page 41: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

More centralized companies invest more in Research?

1. True2. False

60%40%

Page 42: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Participant Scores15 Participant 313C7D15 Participant 313C9915 Participant 254CFE15 Participant 313C4115 Participant 313CB2

Page 43: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Fastest Responders (in seconds)

Page 44: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Team Scores14.24 Group 213.23 Group 413.15 Group 712.48 Group 812.13 Group 111.72 Group 311.7 Group 511.17 Group 6

Page 45: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Team MVPPoints Team Participant15 Group 2 313C7D 15 Group 4 313C99 15 Group 7 313CB2 14 Group 8 313D44 15 Group 1 313C41 14 Group 3 313C84 14 Group 5 2D180F 14 Group 6 254D62

Page 46: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

OLS VIOLATIONS & OTHER ISSUES

Page 47: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Assumptions of OLS Regression

• .• correctly specified model• linear relationship Errors are normally distributed

• Errors have mean of 0: E(εi)=0

• Homoscedastic: Var(εi)=σ2

• Uncorrelated Errors: Cov(εi,εi)=0• No multicollinearityMG461, Week 3 Seminar 47

Page 48: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

When is a model linear?

• Linear in the parameters

• Transformations of x and/or y variables can turn a relationship that isn’t linear initially into one that is linear in the parameters

Page 49: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Example: The Challenger disaster

Page 50: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Example: Challenger

Shuttle disaster

30°

Page 51: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

What the m

anagers didn’t see…

Page 52: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Diagnosis of Non-linearity and/or Errors not normally distributed

• Theoretical expectations• Scatterplots of y against x variables prior to

estimating model• Scatterplot of yi-hat against ei-hat (predicted y-

values against predicted residuals)• Normal Probability Plot

Page 53: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Example: Number of Supervisors & Number of Employees

Page 54: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Re-estimated, including x2

Page 55: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Solutions to Non-linearity

• Better Model of Structure (transformations)• Exponential (squared, cubed)• Logs or natural logs (heteroscedasticity)• Proportional scaling (divide by x or y)

• If outliers cause the problem, omit them or use robust regression

Page 56: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Assumptions of OLS Regression

• .• correctly specified model• linear relationship

• Errors have mean of 0: E(εi)=0

• Homoscedastic: Var(εi)=σ2

• Uncorrelated Errors: Cov(εi,εi)=0• No multicollinearity

MG461, Week 3 Seminar 56

Page 57: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Diagnosis of Heteroscedasticity (like non-linearity)

• Theoretical expectations• Scatterplots of y against x variables prior to estimating

model• Scatterplot of yi-hat against ei-hat (predicted y-values

against predicted residuals)• Scatterplot of xi against ei-hat (observed x-values

against predicted residuals)• Normal Probability Plot• Statistical Tests (Breusch Pagan, White, Goldfeld Quant)

Page 58: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

OLS estim

ates of Regression Line

MG461, Week 3 Seminar 58

Salary = -34 + 27.47*Runs

Page 59: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Distribution of D.V.

(Salary)

Page 60: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Norm

al Probability Plot of Salary

Page 61: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Baseball Salary and Performance:Residuals vs. Fitted Values

Page 62: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Transformed D

ependent Variable

log(Salary) = 5.3 + 0.026*Runs

Page 63: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Residual Plot of m

odel with Log (Salary)

Page 64: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Norm

al Probability Plot of Residuals

Page 65: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Another Example: SalaryCoefficient s.e. t p-value (sig)

Intercept 11031.81 383.22 28.79 0.000

Years Experience 546.18 30.52 17.90 0.000

HS Degree -2996.21 411.75 -7.28 0.000

B.S. Degree 147.82 387.66 0.38 0.705

Management (1=Yes) 6883.53 313.9 21.90 0.000

R2

n0.95746

Page 66: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Plot of Residuals vs. Education (I.V.)

Page 67: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Plot of Residuals vs. Education

× Manager

Page 68: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Solution: Include Interaction TermCoefficient s.e. t p-value (sig)

Intercept 11023.50 79.07 141.7 0.000

Years Experience 496.98 5.57 89.3 0.000

HS Degree -1730.69 105.33 -16.4 0.000

B.S. Degree -349.03 97.57 -3.6 0.0009

Management (1=Yes) 7047.32 102.60 68.7 0.000

HS + Management -3066.04 149.33 -20.5 0.000

BS + Management 1836.49 131.17 14.0 0.000

R2

n0.99946

Page 69: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Results from

Salary Model

Page 70: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Solutions for Heteroscedasticity:

• Better Model of Structure:• Interaction terms• Transformation

• Robust Standard Errors• Weighted GLM• ARCH models (in time series)

Page 71: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Assumptions of OLS Regression

• .• correctly specified model• linear relationship

• Errors have mean of 0: E(εi)=0

• Homoscedastic: Var(εi)=σ2

• Uncorrelated Errors: Cov(εi,εi)=0• No multicollinearity

MG461, Week 3 Seminar 71

Page 72: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Violation 2: Errors not Independent

• Across time• Across cases (diffusion, network models)• Time series data, panel data, cluster samples,

hierarchical data, repeated measures data, longitudinal data, and other data with dependencies

Page 73: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Example: Consum

er Spending vs. M

oney

Page 74: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Diagnosis & Solutions:

Diagnosis• Type of Data• Durbin-Watson Statistic• Residual Plots

Solution• Incorporate dependencies

into estimates• Difference Variables

(Cochrane-Orcutt)• Variables for Seasonality• Various Time Series Models• Various network/spatial

dependence models• Structural Models (SUR, SEM)

• GLS (generalized least squares)

Page 75: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Assumptions of OLS Regression

• .• correctly specified model• linear relationship

• Errors have mean of 0: E(εi)=0

• Uncorrelated Errors: Cov(εi,εi)=0

• Homoscedastic: Var(εi)=σ2

• No multicollinearity

MG461, Week 3 Seminar 75

Page 76: Multiple Regression & OLS violations Week 4 Lecture MG461 Dr. Meredith Rolfe

Problem: Multicollinearity

Diagnosis• High Correlation between

two or more IVs • Standard errors “blow up”• Large changes in

coefficients between estimated models

• Statistical tests (VIF)

Solutions• Are the two x’s measuring

the same thing: create an index or use PCA

• Get more data!• Centering of x variables• Instrumental variables