multiple regression involves the use of more than one independent variable. multivariate analysis...
Post on 20-Dec-2015
219 views
TRANSCRIPT
Multiple Regression
Involves the use of more than one independent variable.
Multivariate analysis involves more than one dependent variable - OMS 633
Adding more variables will help us to explain more variance - the trick becomes: are the additional variables significant and do they improve the overall model? Additionally, the added independent variables should not be too highly related with each other!
Multiple Regression
A sample data set:
Sales= hundreds of gallonsPrice = price per gallonAdvertising = hundreds of dollars
Week Sales Price Advrtising1 10 1.3 92 6 2 73 5 1.7 54 12 1.5 145 10 1.6 156 15 1.2 127 5 1.5 68 12 1.4 109 17 1 15
10 20 1.1 21
Analyzing the output
Evaluate for multicollinearity State and interpret the equation Interpret Adjusted R2
Interpret Syx
Are the independent variables significant? Is the model significant Forecast and develop prediction interval Examine the error terms Calculate MAD, MSE, MAPE, MPE
Correlation Matrix
Simple correlation for each combination of variables (independents vs. independents; independents vs. dependent)
Sales Price AdvrtisingSales 1Price -0.86349 1Advrtising 0.891497 -0.65449 1
Multicollinearity
It’s possible that the independent variables are related to one another. If they are highly related, this condition is called multicollinearity. Problems: A regression coefficient that is positive in sign in a two-
variable model may change to a negative sign Estimates of the regression coefficient change greatly from
sample to sample because the standard error of the regression coefficient is large.
Highly interrelated independent variable can explain some of the same variance in the dependent variable - so there is no added benefit, even though the R-square has increased.
We would throw one variable out - high correlation (.7)
Multiple Regression Equation
Gallon Sales = 16.4 - 8.2476 (Price) + .59 (Adv)
iiXbXbXbbY ...1ˆ2210
CoefficientsStandard
Error t Stat P-valueIntercept 16.41 4.34 3.78 0.01Price -8.25 2.20 -3.76 0.01Advrtising 0.59 0.13 4.38 0.00
Regression Coefficients
bo is the Y-intercept - the value of sales when X1 and X2 are 0.
b1 and b2 are net regression coefficients. The change in Y per unit change in the relevant independent variable, holding the other independent variables constant.
Regression Coefficients
For each unit increase ($1.00) in price, sales will decrease 8.25 hundred gallons, holding advertising constant.
For each unit increase ($100, represented as 1) in Advertising, sales will increase .59 hundred gallons, holding price constant.
Be very careful about the units! 10 in the advertising indicates $1,000 because advertising is in hundreds
Gallons = 16.4 - 8.2476 (1.00) + .59 (10)
= 14.06 or 1,406 Gallons
Regression Coefficients
How does a one cent increase in price affect sales (holding advertising at $1,000)?
16.4-8.25(1.01)+.59(10) = 13.9675
If price stays $1.00, and increase advertising $100, from $1,000 to $1100:
16.4-8.25(1.00)+.59(11) = 14.65
Regression Statistics
Standard error of the estimateR2 and Adjusted R2
Regression StatisticsMultiple R 0.965364R Square 0.931929Adjusted R Square 0.91248Standard Error 1.507196Observations 10
R2 and Adjusted R2
Same formulas as Simple Regression SSR/SST (this is an UNADJUSTED R2 ) Adjusted R2 from ANOVA = 1-MSR/(SST/n-1)
91% of the variance in gallons sold is explained by price per gallon and advertising.
Standard Error of the Estimate
Measures the standard amount that the
actual values (Y) differ from the
estimated values .
No change in formula, except, in this
example, k=3.
Can still use square root of MSE
Y
Evaluate the Independent Variables
Ho: The regression coefficient is not significantly different from zero
HA: The regression coefficient is significantly different from zero
Use the t-stat and the --value to evaluate EACH independent variable. If an independent variable is NOT significant, we remove it from the model and re-run!
Coefficients
Standard Error t Stat P-value
Intercept 16.40637 4.342519 3.778075 0.00691Price -8.24758 2.196057 -3.75563 0.007115Advrtising 0.585101 0.133672 4.377145 0.003246
Evaluate the Model
Ho: The model is NOT valid and there is NOT a statistical relationship between the dependent and independent variables
HA: The model is valid. There is a statistical relationship between the dependent and independent variables.
If F from the ANOVA is greater than the F from the F-table, reject Ho: The model is valid. We can look at the P-values. If the p-value is less than our set level, we can REJECT Ho.
ANOVA
df SS MS FSignifica
nce FRegression 2 217.6985 108.8493 47.91657 8.23E-05Residual 7 15.90149 2.271641Total 9 233.6
Forecast and Prediction Interval
Same as simple regression - however, many times we will not have the correction factor (formula under the square root). It is acceptable to use the Standard error of the estimate provided in the computer output.
2
2
2/ )(
)(11 ˆ
XX
XX
nSZY
i
iyx
Examining the Errors
Heteroscedasticity exists when the residuals do not have a constant variance across an entire range of values.
Run an autocorrelation on the error terms to determine if the errors are random. If the errors are not random, the model needs to be re-evaluated. More on this in Chapter 9.
Evaluate with MAD, MAPE, MPE, MSE
Dummy Variables
Used to determine the relationship between qualitative independent variables and a dependent variable. Differences based on genderEffect of training/no-training on performanceSeasonal data- quarters
We use 0 and 1 to indicate “off” or “on”. For example, code males as 1 and females as 0.
Dummy Variables
The data indicates jobperformance rating basedon achievement test score and female (0) and males (1).
How do males and females differ in their job performance?
Rating Test Score Gender5 60 04 55 03 35 0
10 96 02 35 07 81 06 65 09 85 09 99 12 43 18 98 16 91 17 95 13 70 16 85 1
Dummy Variables
The regression equation:
Job performance = -1.96 +.12 (test score) -2.18 (gender)
Holding gender constant, a one unit increase in test score increases job performance rating by 1.2 points.
Holding test score constant, males experience a 2.18 point lower performance rating than females. Or stated differently, females have a 2.18 higher job performance than males, holding test scores constant.
Coefficients
Standard Error t Stat P-value
Intercept -1.96 0.71 -2.77 0.02Test Score 0.12 0.01 11.86 0.00Gender -2.18 0.45 -4.84 0.00
Dummy Variable Analysis
Evaluate for multicollinearity State and interpret the equation Interpret Adjusted R2
Interpret Syx
Are the independent variables significant? Is the model significant Forecast and develop prediction interval Examine the error terms Calculate MAD, MSE, MAPE, MPE
Model Evaluation
If the variables indicate multicollinearity, run the model, interpret, but then re-run the best model (I.e. throw out one of the highly correlated variables)
If one of the independent variables are NOT significant, (whether dummy variable or other) throw it out and re-run the model
If the overall model is not significant - back to the drawing board - need to gather better predictor variables… maybe an elective course!
Stepwise Regression
Sometimes, we will have a great number of variables - running a correlation matrix will help determine if any variables should NOT be in the model (low correlation with the dependent variable).
Can also run different types of regression, such as stepwise regression
Stepwise regression
Adds one variable at a time - one step at a time. Based on explained variance (and highest correlation with the dependent variable). The independent variable that explains the most variance in the dependent variable is entered into the model first.
A partial f-test is determined to see if a new variable stays or is eliminated.
Start with the correlation Matrix
Unit Sales
Test Score
Age (years) Anxiety
Experience (Years)
High School GPA
Unit Sales 1Test Score 0.67612 1Age (years) 0.798141 0.227706 1Anxiety -0.29586 -0.22199 -0.28679 1Experience (Years) 0.549834 0.349639 0.539568 -0.27869 1High School GPA 0.621784 0.317772 0.694569 -0.24438 0.3121288 1
Stepwise Regression
F-to-Enter: 4.00 F-to-Remove: 4.00
Response is Unit Sales on 5 predictors, with N = 30
Step 1 2Constant -100.85 -86.79
Age (yea 6.97 5.93T-Value 7.01 10.60
Test Sco 0.200T-Value 8.13
S 6.85 3.75R-Sq 63.70 89.48
Stepwise Regression
The equation at Step1:Sales = -100.85 + 6.97 (age)
The equation at Step2:Sales = -86.79 + 5.93 (age) + .200 (test
score)
No other variables are significant; the model stops.