anova for regression
DESCRIPTION
ANOVA for Regression. ANOVA tests whether the regression model has any explanatory power. In the case of simple regression analysis the ANOVA test and the test for b 1 are identical. ANOVA for Regression. MSE = SSE/(n-2) MSR = SSR/p where p=number of independent variables F = MSR/MSE. - PowerPoint PPT PresentationTRANSCRIPT
ANOVA for Regression
ANOVA tests whether the regression model has any explanatory power.
In the case of simple regression analysis the ANOVA test and the test for b1 are identical.
ANOVA for Regression
MSE = SSE/(n-2)
MSR = SSR/pwhere p=number of independent variables
F = MSR/MSE
ANOVA Hypothesis Test
H0: b1 = 0Ha: b1 ≠ 0
Reject H0 if:F > Fa Or if:p < a
Regression and ANOVASource of variation
Sum of squares Degrees of freedom
Mean Square F
Regression SSR 1 MSR=SSR/1 F=MSR/MSE
Error SSE n-2 MSE=SSE/(n-2)
Total SST n-1
ANOVA and RegressionANOVA
df SS MS FSignificance
F
Regression 1 3364 3364 273 1.23E-15
Residual 27 3334 12.3
Total 28 3697
Fa = 4.21 given a=.05, df num. = 1, df denom. = 27
Issues with Hypothesis Test Results
• Correlation does NOT prove causation• The test does not prove we used the correct
functional form
Output with Temperature as Y
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.953884648R Square 0.909895922Adjusted R Square 0.906558734Standard Error 5.053605155Observations 29
ANOVA df SS MS F Significance F
Regression 1 6963.27661 6963.27661 272.6535 1.23118E-15Residual 27 689.5509766 25.5389251Total 28 7652.827586
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 67.59301867 1.358242515 49.7650588 4.24E-28 64.80613526 70.3799021Thousands of cubic feet -1.372438825 0.083116544 -16.512222 1.23E-15 -1.542979885 -1.20189776
Jun-07
Aug-07
Oct-07
Dec-07
Feb-08
Apr-08
Jun-08
Aug-08
Oct-08
Dec-08
Feb-09
Apr-09
Jun-09
Aug-09
Oct-09
0
10
20
30
40
50
60
70
80
Temperature and Natural Gas Consumed
Average daily temperature Thousands of cubic feet
0 10 20 30 40 50 60 70 800
5
10
15
20
25
30
35
40
Monthly Natural Gas Use and Temperature
Average Daily Temperature
Thou
sand
s of
cub
ic fe
et
Confidence Interval for Estimated Mean Value of y
xp = particular or given value of xyp = value of the dependent variable for xp E(yp) = expected value of yp
or E(y|x= xp)
)(ˆ
ˆ 10
p
p
yEy
xbby
of estimate our is
Confidence Interval for Estimated Mean Value of y
p
p
y
i
py
sty
xx
xx
nss
ˆ2/
2
2
ˆ
ˆ
1
Computing b0 and b1, Examplex y1 15 -3 3 -9 93 14 -1 2 -2 13 11 -1 -1 1 14 12 0 0 0 09 8 5 -4 -20 25
Sum = 20 60 -30 36Mean = 4 12
b1 = -0.83b0 = 15.33
)( xxi )( yyi 2)( xxi ))(( yyxx ii
From example of car age, price:
x y 1 15 9 14.5 6.2 0.3 93 14 1 12.84 0.7 1.3 43 11 1 12.84 0.7 3.4 14 12 0 12.01 0.0 0.0 09 8 25 7.86 17.4 0.0 16
Sum=20 Sum=60 36 SSR=25.0 SSE=5.0 SST=30Mean=4 Mean=12
b1=-0.833b0=15.33
r2 = 25/30 = .833
y 2)ˆ( yy 2)ˆ( yy 2)( yy 2)( xx
Confidence Interval of Conditional Mean
616.0228.29.1
36
45
5
129.1
1
29.125
5
2
2
2
2
ˆ
xx
xx
nss
n
SSEMSEs
i
py p
Confidence Interval of Conditional Mean
Confidence Interval of Conditional Mean
14.13,22.996.118.11
616.182.318.11
ˆ ˆ2/
py
sty
Given 1-a = .95 and df = 3:
Confidence Interval for Predicted Values of y
A confidence interval for a predicted value of y must take into account both random error in the estimate of b1 and the random deviations of individual values from the regression line.
Confidence Interval for Estimated Mean Value of y
ind
i
pind
sty
xx
xx
nss
2/
2
2
ˆ
11
43.1228.129.1
36
45
5
1129.1
11
2
2
2
ˆ
xx
xx
nss
i
pyind
Confidence Interval of Individual Value
Confidence Interval of Conditional Mean
73.15,63.655.418.11
43.1182.318.11
ˆ ˆ2/
indysty
Given 1-a = .95 and df = 3:
Residual Plots Against x
Residual – the difference between the observed value and the predicted value
Look for:• Evidence of a nonconstant variance• Nonlinear relationship
Regression and Outliers
Outliers can have a disproportionate effect on the estimated regression line.
10 20 30 40 50 60 70 80 90 1000
5
10
15
20
25
30
35
40
Natural Gas Usage and Tem-perature
Temperature
000'
s Cu
bic
Feet
CoefficientsIntercept 36.19972
X Variable 1 -0.44381
Regression and Outliers
One solution is to estimate the model with and without the outlier.
Questions to ask:•Is the value a error?•Does the value reflect some unique circumstance?•Is the data point providing unique information about values outside of the range of other observations?
Chapter 15
Multiple Regression
Regression
Multiple Regression Modely = b0 + b1x1 + b2x2 + … + bpxp + e
Multiple Regression Equationy = b0 + b1x1 + b2x2 + … + bpxp
Estimated Multiple Regression Equation
ppxbxbxbby ...ˆ 22110
Car DataMPG Weight Year Cylinders
18 3504 70 815 3693 70 818 3436 70 816 3433 70 817 3449 70 815 4341 70 814 4354 70 814 4312 70 814 4425 70 815 3850 70 815 3563 70 814 3609 70 8… … … …
Multiple Regression, Example Coefficients Standard Error t Stat
Intercept 46.3 0.800 57.8Weight -0.00765 0.000259 -29.4
R Square 0.687
Coefficients Standard Error t StatIntercept -14.7 3.96 -3.71Weight -0.00665 0.000214 -31.0Year 0.763 0.0490 15.5
R Square 0.807
Multiple Regression, Example
Coefficients Standard Error t StatIntercept -14.4 4.03 -3.58Weight -0.00652 0.000460 -14.1Year 0.760 0.0498 15.2Cylinders -0.0741 0.232 -0.319
R Square 0.807
Predicted MPG for car weighing 4000 lbs built in 1980 with 6 cylinders:-14.4 -.00652(4000)+.76(80)-.0741(6)=-14.4-26.08+60.8-.4446=19.88
Multiple Regression Model
2ˆ ii yySSE
2ˆ yySSR i
2 yySST i
SST = SSR + SSE
Multiple Coefficient of Determination
The share of the variation explained by the estimated model.
R2 = SSR/SST
F Test for Overall Significance
H0: b1 = b1 = . . . = bp
Ha: One or more of the parameters is not equal to zero
Reject H0 if: F > Fa OrReject H0 if: p-value < a
F = MSR/MSE
ANOVA Table for Multiple Regression Model
Source Sum of Squares
Degrees of Freedom
Mean Squares
F
Regression SSR p MSR = SSR/p
F=MSR/MSE
Error SSE n-p-1 MSE = SSE/(n-p-1)
Total SST n-1
t Test for Coefficients
H0: b1 = 0Ha: b1 ≠ 0
Reject H0 if:t < -t /2a or t > t /2a Or if:p < a
t = b1/sb1
With a t distribution of n-p-1 df
MulticollinearityWhen two or more independent variables are highly correlated.
When multicollinearity is severe the estimated values of coefficients will be unreliable
Two guidelines for multicollinearity:• If the absolute value of the correlation coefficient for two independent variables exceeds 0.7• If the correlation coefficient for independent variable and some other independent variable is greater than the correlation with the dependent variable
Multicollinearity
MPG Weight Year CylindersMPG 1Weight -0.829 1Year 0.578 -0.300 1Cylinders -0.773 0.895 -0.344 1