multiple regression & ols violations week 4 lecture mg461 dr. meredith rolfe
TRANSCRIPT
Multiple Regression & OLS violations
Week 4 Lecture
MG461
Dr. Meredith Rolfe
Which group are you in?Which group are you in?
1 2 3 4 5 6 7 8
10%
11%
21%
8%
13%
8%
16%
11%
1. Group 12. Group 23. Group 34. Group 45. Group 56. Group 67. Group 78. Group 8
Key Goals of the Week
• What is multiple regression?• How to interpret regression results:
• estimated regression coefficients• significance tests for coefficients
• Violations of OLS assumptions• Diagnostics• What to do
MG461, Week 3 Seminar 3
MULTIPLE REGRESSION
When to use Regression
• We want to know whether the outcome, y, varies depending on x
• Continuous variables (but many exceptions)• Observational data (mostly)• The relationship between x and y is linear
MG461, Week 3 Seminar 5
Simple Linear Model
MG461, Week 3 Seminar 6
Regression is a set of statistical tools to model the conditional expectation…
1 2
76%
24%
1. of one variable on another variable.
2. of one variable on one or more other variables.
Multiple Regression
Compensation
PerformanceSize of
Company Years worked
Ratings of Supervisor
Opportunity to learn
Critical of poor
performance
Handles complaints
Which best accounts for variation in supervisor ratings?
1 2 3 4
5%
21%
47%
28%
1. Does not allow special privileges.
2. Opportunity to learn.
3. Too critical of poor performance.
4. Handles employee complaints.
Simple linear model: Rating vs. No Special Privileges
Estimate (s.e.)
(Constant) 42.11***(9.27)
No special privileges
0.42*(0.17)
n=R2=
300.15
Note on significance of coefficients:***p < 0.001 **p < 0.01 *p < 0.05 . p < 0.1
Source: Chatterjee et al, Regression Analysis by Example
SPSS output -> Regression Table
Estimate (s.e.)
(Constant) 42.11***(9.27)
No special privileges
0.42*(0.17)
n=R2=
300.15
βhat0
βhat1
se(βhat0)
se(βhat1)
ignoret(βhat0-0)t(βhat1-0)x variable
42% of employees value supervisors who don’t grant special privileges?
1. Yes2. No
32%68%
Estimate (s.e.)
(Constant) 42.11***(9.27)
No special privileges
0.42*(0.17)
n=R2=
300.15
Simple linear model #2:Rating vs. Opportunity to Learn
Estimate (s.e.)
(Constant) 28.17***(8.81)
Opportunity to learn
0.65*(0.15)
n=R2=
300.37
Note on significance of coefficients:***p < 0.001 **p < 0.01 *p < 0.05 . p < 0.1
Model 1 Model 2 Model 3 Model 4 Model 5 Model 6
(Constant) 42.11***(9.27)
28.17***(8.81)
14.38*(6.62)
19.98(11.69)
50.24**(17.31)
56.76***(9.74)
No special privileges 0.42*(0.17)
Opportunity to learn 0.65*(0.15)
Handles complaints 0.75***(0.15)
Raises based on performance
0.69***(0.18)
Too critical of poor performance
0.19(0.23)
Rate of advancing to better jobs
0.18(0.22)
n=R2=
300.15
300.37
300.68
300.35
300.02
300.02
Are these good estimates of the relationship between x and y?
1 2
44%
56%1. Yes2. No
Multiple potential explanations…
• Experimental Controls:• Random
assignment• Experimental
Design• Observational
data analysis:• Statistical
Controls
Ratings of Supervisor
No special privileges
Opportunity to learn
Critical of poor
performance
Handles complaints
Multiple Regression Model
MG461, Week 3 Seminar 17
DependentVariable
IndependentVariables
Intercept
Coefficients
Error
Observation or data point, i, goes from 1…n
WHICH MODEL PARAMETER DO WE NOT NEED TO ESTIMATE?
1 2 3 4
5%
20%
5%
70%
1. Β0
2. x1,i
3. βp
4. σ2
Multiple RegressionOLS Estimates (matrix)
Y = Xβ +ε
Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 ALL
(Constant) 42.11***(9.27)
28.17***(8.81)
14.38*(6.62)
19.98(11.69)
50.24**(17.31)
56.76***(9.74)
10.79(11.59)
No special privileges 0.42*(0.17)
-0.07(0.14)
Opportunity to learn 0.65*(0.15)
0.32(0.16)
Handles complaints 0.75***(0.15)
0.61***(0.16)
Raises based on performance
0.69***(0.18)
0.082(0.22)
Too critical of poor performance
0.19(0.23)
0.038(0.14)
Rate of advancing to better jobs
0.18(0.22)
-0.21(0.17)
n=R2=
300.15
300.37
300.68
300.35
300.02
300.02
300.73
Significance of Results
Model Significance• H0: None of the 1 (or more)
independent variables covary with the dependent variable
• HA: At least one of the independent variables covaries with d.v.
• Application: compare two fitted models
• Test: Anova/F-Test • **assumes errors (ei) are
normally distributed
Coefficient Significance• H0: ß1=0, there is no
relationship (covariation) between x and y
• HA: ß1≠0, there is a relationship (covariation) between x and y
• Application: a single estimated coefficient
• Test: t-test**assumes errors (ei) are
normally distributed
MG461, Week 3 Seminar 21
Comparing Models: AnovaComplaints
onlyComplaints
& LearnALL
(Constant) 14.38*(6.62)
9.87(7.06)
10.79(11.59)
No special privileges -0.07(0.14)
Opportunity to learn 0.21(0.13)
0.32(0.16)
Handles complaints 0.75***(0.15)
0.64***(0.12)
0.61***(0.16)
Raises based on performance
0.082(0.22)
Too critical of poor performance
0.038(0.14)
Rate of advancing to better jobs
-0.21(0.17)
n=R2=
300.68
300.71
300.73
Anova Model Comparison
All Variables (Full) vs.Complaints & Learn:F=0.53 p=0.72
Complaints & Learn vs. Complaints:F=2.47 p=0.13
SPEED PRACTICE: INTERPRETING REGRESSION RESULTS
1) p-values & significance2) Coefficients significant from tables2) substantive interpretation of coefficients
Does “Critical” have an effect on supervisor ratings?
33%67%
Coefficient s.e. t p-value (sig)
(Constant) 10.79 11.59 0.93 0.36
No special privileges -0.07 0.14 -0.54 0.60
Opportunity to learn 0.32 0.16 3.81 0.07
Handles complaints 0.61 0.16 1.90 0.009
Raises based on performance 0.082 0.22 0.26 0.80
Too critical of poor performance 0.038 0.14 0.37 0.72
Rate of advancing to better jobs -0.21 0.17 -1.22 0.24
R2
n0.73336
1. Yes2. No
0%0%
Coefficient s.e. t p-value (sig)
(Intercept) -149.6 117.9e+02 -1.27 0.21
Average Income 5.077e-06 1.640e-03 0.003 0.998
% Metropolitan -5.062e-03 3.129e-01 -0.016 0.987
Average Taxes -3.974e-02 1.505e-02 -2.64 0.012
Average Education 2.73 1.22 2.25 0.030
Temperature 0.76 0.90 0.84 0.41
R2
n0.2848
Does Income have an effect on Immigration Rate?
50%50%
1. Yes2. No
0%0%
Does having a HS Degree affect salary?
Coefficient s.e. t p-value (sig)
Intercept 11031.81 383.22 28.79 0.000
Years Experience 546.18 30.52 17.90 0.000
HS Degree -2996.21 411.75 -7.28 0.000
B.S. Degree 147.82 387.66 0.38 0.705
Management (1=Yes) 6883.53 313.9 21.90 0.000
R2
n0.95746
1. Yes2. No
0%0%
Countdown
10
Coefficient s.e. t p-value (sig)(Intercept) 5.32 0.10 50.86 0.000
Runs 0.0045 0.004 1.00 0.32
Hits 0.012 0.002 5.14 0.00
Home Runs 0.039 0.008 4.81 0.00
Strike Outs -0.008 0.002 -3.63 0.0003
R2
n0.49337
Do strike outs affect salary?
95%5%
1. Yes2. No
0%0%
Coefficient s.e. t p-value (sig)(Intercept) 103.3 245.6 0.42 0.67
Average age 4.52 3.22 1.40 0.17
% with HS Degree -0.062 0.81 -0.076 0.94
Average Income 0.019 0.010 1.86 0.070
% Black 0.36 0.48 0.73 0.47
% Female -1.05 5.56 -0.19 0.85
Avg. Price of Cigarettes -3.25 1.03 -3.16 0.0029
R2
n0.3250
Does %Female affect Cigarette Sales?
11%89%
1. Yes2. No
0%0%
PRACTICE 2:SIGNIFICANT COEFFICIENTS IN TABLES
Does Total Employment affect CEO Compensation?
1. Yes2. No
86%14%
Does Restructuring Affect Firm ROA?
1. Yes2. No
14%86%
Does firm sales growth affect the length of CEO tenure?
1. Yes2. No
75%25%
Does Total Employment affect CEO Compensation?
1. Yes2. No
82%18%
Are employees more aggressive when their job is stressful?
1. Yes2. No
44%56%
Does employee turnover affect Firm Productivity?
1. Yes2. No
91%9%
PRACTICE 3:INTERPRETING COEFFICIENTS
High values of 1983 centralization product a(n) ….. in current centralization
1. Increase2. Decrease
2%98%
Corporations are more likely to enter petitions when their market share is…
1. High2. Low
81%19%
Starting compensation is a good predictor of current compensation?
1. True2. False
68%32%
Managers at larger firms get paid more?
1. True2. False
18%82%
More centralized companies invest more in Research?
1. True2. False
60%40%
Participant Scores15 Participant 313C7D15 Participant 313C9915 Participant 254CFE15 Participant 313C4115 Participant 313CB2
Fastest Responders (in seconds)
Team Scores14.24 Group 213.23 Group 413.15 Group 712.48 Group 812.13 Group 111.72 Group 311.7 Group 511.17 Group 6
Team MVPPoints Team Participant15 Group 2 313C7D 15 Group 4 313C99 15 Group 7 313CB2 14 Group 8 313D44 15 Group 1 313C41 14 Group 3 313C84 14 Group 5 2D180F 14 Group 6 254D62
OLS VIOLATIONS & OTHER ISSUES
Assumptions of OLS Regression
• .• correctly specified model• linear relationship Errors are normally distributed
• Errors have mean of 0: E(εi)=0
• Homoscedastic: Var(εi)=σ2
• Uncorrelated Errors: Cov(εi,εi)=0• No multicollinearityMG461, Week 3 Seminar 47
When is a model linear?
• Linear in the parameters
• Transformations of x and/or y variables can turn a relationship that isn’t linear initially into one that is linear in the parameters
Example: The Challenger disaster
Example: Challenger
Shuttle disaster
30°
What the m
anagers didn’t see…
Diagnosis of Non-linearity and/or Errors not normally distributed
• Theoretical expectations• Scatterplots of y against x variables prior to
estimating model• Scatterplot of yi-hat against ei-hat (predicted y-
values against predicted residuals)• Normal Probability Plot
Example: Number of Supervisors & Number of Employees
Re-estimated, including x2
Solutions to Non-linearity
• Better Model of Structure (transformations)• Exponential (squared, cubed)• Logs or natural logs (heteroscedasticity)• Proportional scaling (divide by x or y)
• If outliers cause the problem, omit them or use robust regression
Assumptions of OLS Regression
• .• correctly specified model• linear relationship
• Errors have mean of 0: E(εi)=0
• Homoscedastic: Var(εi)=σ2
• Uncorrelated Errors: Cov(εi,εi)=0• No multicollinearity
MG461, Week 3 Seminar 56
Diagnosis of Heteroscedasticity (like non-linearity)
• Theoretical expectations• Scatterplots of y against x variables prior to estimating
model• Scatterplot of yi-hat against ei-hat (predicted y-values
against predicted residuals)• Scatterplot of xi against ei-hat (observed x-values
against predicted residuals)• Normal Probability Plot• Statistical Tests (Breusch Pagan, White, Goldfeld Quant)
OLS estim
ates of Regression Line
MG461, Week 3 Seminar 58
Salary = -34 + 27.47*Runs
Distribution of D.V.
(Salary)
Norm
al Probability Plot of Salary
Baseball Salary and Performance:Residuals vs. Fitted Values
Transformed D
ependent Variable
log(Salary) = 5.3 + 0.026*Runs
Residual Plot of m
odel with Log (Salary)
Norm
al Probability Plot of Residuals
Another Example: SalaryCoefficient s.e. t p-value (sig)
Intercept 11031.81 383.22 28.79 0.000
Years Experience 546.18 30.52 17.90 0.000
HS Degree -2996.21 411.75 -7.28 0.000
B.S. Degree 147.82 387.66 0.38 0.705
Management (1=Yes) 6883.53 313.9 21.90 0.000
R2
n0.95746
Plot of Residuals vs. Education (I.V.)
Plot of Residuals vs. Education
× Manager
Solution: Include Interaction TermCoefficient s.e. t p-value (sig)
Intercept 11023.50 79.07 141.7 0.000
Years Experience 496.98 5.57 89.3 0.000
HS Degree -1730.69 105.33 -16.4 0.000
B.S. Degree -349.03 97.57 -3.6 0.0009
Management (1=Yes) 7047.32 102.60 68.7 0.000
HS + Management -3066.04 149.33 -20.5 0.000
BS + Management 1836.49 131.17 14.0 0.000
R2
n0.99946
Results from
Salary Model
Solutions for Heteroscedasticity:
• Better Model of Structure:• Interaction terms• Transformation
• Robust Standard Errors• Weighted GLM• ARCH models (in time series)
Assumptions of OLS Regression
• .• correctly specified model• linear relationship
• Errors have mean of 0: E(εi)=0
• Homoscedastic: Var(εi)=σ2
• Uncorrelated Errors: Cov(εi,εi)=0• No multicollinearity
MG461, Week 3 Seminar 71
Violation 2: Errors not Independent
• Across time• Across cases (diffusion, network models)• Time series data, panel data, cluster samples,
hierarchical data, repeated measures data, longitudinal data, and other data with dependencies
Example: Consum
er Spending vs. M
oney
Diagnosis & Solutions:
Diagnosis• Type of Data• Durbin-Watson Statistic• Residual Plots
Solution• Incorporate dependencies
into estimates• Difference Variables
(Cochrane-Orcutt)• Variables for Seasonality• Various Time Series Models• Various network/spatial
dependence models• Structural Models (SUR, SEM)
• GLS (generalized least squares)
Assumptions of OLS Regression
• .• correctly specified model• linear relationship
• Errors have mean of 0: E(εi)=0
• Uncorrelated Errors: Cov(εi,εi)=0
• Homoscedastic: Var(εi)=σ2
• No multicollinearity
MG461, Week 3 Seminar 75
Problem: Multicollinearity
Diagnosis• High Correlation between
two or more IVs • Standard errors “blow up”• Large changes in
coefficients between estimated models
• Statistical tests (VIF)
Solutions• Are the two x’s measuring
the same thing: create an index or use PCA
• Get more data!• Centering of x variables• Instrumental variables