1 chapter 12 simple linear regression simple linear regression model least squares method...
TRANSCRIPT
1
Chapter 12 Simple Linear Regression
Simple Linear Regression ModelLeast Squares Method Coefficient of DeterminationModel AssumptionsTesting for SignificanceUsing the Estimated Regression Equation for Estimation and PredictionComputer SolutionResidual Analysis: Validating Model Assumptions
2
The equation that describes how y is related to x and an error term is called the regression model.The simple linear regression model is:
y = 0 + 1x +
– 0 and 1 are called parameters of the model.
– is a random variable called the error term.
Simple Linear Regression Model
3
n The simple linear regression equation is:
EE((yy) = ) = 00 + + 11xx
• Graph of the regression equation is a straight line.
• 0 is the y intercept of the regression line.
• 1 is the slope of the regression line.
• E(y) is the expected value of y for a given x value.
Simple Linear Regression EquationSimple Linear Regression Equation
4
Simple Linear Regression Equation
n Positive Linear Relationship
EE((yy))
xx
Slope Slope 11
is positiveis positive
Regression lineRegression line
InterceptIntercept00
5
Simple Linear Regression Equation
n Negative Linear Relationship
EE((yy))
xx
Slope Slope 11
is negativeis negative
Regression lineRegression lineInterceptIntercept00
6
Simple Linear Regression EquationSimple Linear Regression Equation
n No Relationship
EE((yy))
xx
Slope Slope 11
is 0is 0
Regression lineRegression lineInterceptIntercept
00
7
n The estimated simple linear regression equation is:
• The graph is called the estimated regression line.
• b0 is the y intercept of the line.
• b1 is the slope of the line.
• is the estimated value of y for a given x value.
Estimated Simple Linear Regression Estimated Simple Linear Regression EquationEquation
0 1y b b x 0 1y b b x
yy
8
Estimation Process
Regression Modely = 0 + 1x +
Regression EquationE(y) = 0 + 1x
Unknown Parameters0, 1
Sample Data:Sample Data:x yx y
xx11 y y11
. .. . . .. . xxnn yynn
EstimatedEstimatedRegression EquationRegression Equation
Sample StatisticsSample Statistics
bb00, , bb11
b0 and b1
provide estimates of0 and 1
0 1y b b x 0 1y b b x
9
Least Squares Criterion
where:yi = observed value of the dependent
variable for the ith observationyi = estimated value of the dependent
variable for the ith observation
Least Squares Method
min (y yi i )2min (y yi i )2
^
10
Slope for the Estimated Regression Equation
bx y x y n
x x ni i i i
i i1 2 2
( )/
( ) /b
x y x y n
x x ni i i i
i i1 2 2
( )/
( ) /
The Least Squares Method
11
n y-Intercept for the Estimated Regression Equation
where:xi = value of independent variable for ith observationyi = value of dependent variable for ith observation
x = mean value for independent variable y = mean value for dependent variable n = total number of observations
____
The Least Squares MethodThe Least Squares Method
0 1b y b x 0 1b y b x
12
Example: Reed Auto Sales
Simple Linear Regression Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 5 previous sales are shown on the next slide.
13
Example: Reed Auto Sales
n Simple Linear Regression
Number of TV AdsNumber of TV Ads Number of Cars SoldNumber of Cars Sold11 141433 242422 181811 171733 2727
14
Slope for the Estimated Regression Equation
b1 = 220 - (10)(100)/5 = _____
24 - (10)2/5
y-Intercept for the Estimated Regression Equation
b0 = 20 - 5(2) = _____
Estimated Regression Equation
y = 10 + 5x
^
Example: Reed Auto Sales
15
Example: Reed Auto Sales
Scatter Diagram
y = 10 + 5x
0
5
10
15
20
25
30
0 1 2 3 4TV Ads
Ca
rs S
old ^
16
Relationship Among SST, SSR, SSE
SST = SSR + SSE
where: SST = total sum of squares SSR = sum of squares due to
regression SSE = sum of squares due to error
The Coefficient of Determination
( ) ( ) ( )y y y y y yi i i i 2 2 2( ) ( ) ( )y y y y y yi i i i 2 2 2^^
17
n The coefficient of determination is:
r2 = SSR/SST
where: SST = total sum of squares SSR = sum of squares due to
regression
The Coefficient of DeterminationThe Coefficient of Determination
18
Coefficient of Determination
r2 = SSR/SST = 100/114 =The regression relationship is very strong
because 88% of the variation in number of cars sold can be explained by the linear relationship between the number of TV ads and the number of cars sold.
Example: Reed Auto Sales
19
The Correlation Coefficient
Sample Correlation Coefficient
where: b1 = the slope of the estimated
regressionequation
21 ) of(sign rbrxy 21 ) of(sign rbrxy
ionDeterminat oft Coefficien ) of(sign 1brxy ionDeterminat oft Coefficien ) of(sign 1brxy
xbby 10ˆ xbby 10ˆ
20
Sample Correlation Coefficient
The sign of b1 in the equation is “+”.
rxy = +.9366
Example: Reed Auto Sales
21 ) of(sign rbrxy 21 ) of(sign rbrxy
ˆ 10 5y x ˆ 10 5y x
=+ .8772xyr =+ .8772xyr
21
Model Assumptions
Assumptions About the Error Term 1. The error is a random variable with mean
of zero.2. The variance of , denoted by 2, is the
same for all values of the independent variable.
3. The values of are independent.4. The error is a normally distributed
random variable.
22
Testing for Significance
To test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of 1 is zero.
Two tests are commonly used– t Test– F Test
Both tests require an estimate of 2, the variance of in the regression model.
23
An Estimate of 2
The mean square error (MSE) provides the estimate
of 2, and the notation s2 is also used.
s2 = MSE = SSE/(n-2)
where:
Testing for Significance
210
2 )()ˆ(SSE iiii xbbyyy 210
2 )()ˆ(SSE iiii xbbyyy
24
Testing for Significance
An Estimate of – To estimate we take the square root of 2.– The resulting s is called the standard error of
the estimate.
2
SSEMSE
n
s2
SSEMSE
n
s
25
Hypotheses
H0: 1 = 0
Ha: 1 = 0
Test Statistic
where
Testing for Significance: t Test
tbsb
1
1
tbsb
1
1
2)(1
xx
ss
i
b
2)(1
xx
ss
i
b
26
n Rejection Rule
Reject H0 if t < -tor t > t
where: t is based on a t distribution
with n - 2 degrees of freedom
Testing for Significance: Testing for Significance: tt Test Test
27
t Test – Hypotheses
H0: 1 = 0
Ha: 1 = 0
– Rejection Rule For = .05 and d.f. = 3, t.025 =
_____ Reject H0 if t > t.025 = _____
Example: Reed Auto Sales
28
n t Test
• Test Statistics
t t = _____/_____ = 4.63= _____/_____ = 4.63
• Conclusions
t t = 4.63 > 3.182, so reject = 4.63 > 3.182, so reject HH00
Example: Reed Auto SalesExample: Reed Auto Sales
29
Confidence Interval for 1
We can use a 95% confidence interval for 1 to test the hypotheses just used in the t test.H0 is rejected if the hypothesized value of 1 is not included in the confidence interval for 1.
30
The form of a confidence interval for 1 is:
where b1 is the point estimate
is the margin of erroris the t value providing an
areaof /2 in the upper tail of a
t distribution with n - 2 degrees
of freedom
Confidence Interval for 1
12/1 bstb 12/1 bstb
12/ bst 12/ bst2/t 2/t
31
Rejection RuleReject H0 if 0 is not included in
the confidence interval for 1.
95% Confidence Interval for 1
= 5 +/- 3.182(1.08) = 5 +/- 3.44
or ____ to ____
Conclusion0 is not included in the confidence interval.
Reject H0
Example: Reed Auto Sales
12/1 bstb 12/1 bstb
32
n HypothesesHypotheses
HH00: : 11 = 0 = 0
HHaa: : 11 = 0 = 0
n Test StatisticTest Statistic
FF = MSR/MSE = MSR/MSE
Testing for Significance: F Test
33
n Rejection Rule
Reject Reject HH00 if if FF > > FF
where:where: FF is based on an is based on an FF distribution distribution
with 1 d.f. in the numerator andwith 1 d.f. in the numerator and
nn - 2 d.f. in the denominator - 2 d.f. in the denominator
Testing for Significance: Testing for Significance: FF Test Test
34
n F F Test Test
• HypothesesHypotheses
HH00: : 11 = 0 = 0
HHaa: : 11 = 0 = 0
• Rejection RuleRejection Rule
For For = .05 and d.f. = 1, 3: = .05 and d.f. = 1, 3: FF.05.05 = = ____________
Reject Reject HH00 if F > if F > FF.05.05 = ______. = ______.
Example: Reed Auto SalesExample: Reed Auto Sales
35
n F Test
• Test Statistic
FF = MSR/MSE = ____ / ______ = 21.43 = MSR/MSE = ____ / ______ = 21.43
• Conclusion
FF = 21.43 > 10.13, so we reject = 21.43 > 10.13, so we reject HH00..
Example: Reed Auto SalesExample: Reed Auto Sales
36
Some Cautions about theInterpretation of Significance Tests
Rejecting H0: 1 = 0 and concluding that the relationship between x and y is significant does not enable us to conclude that a cause-and-effect relationship is present between x and y.Just because we are able to reject H0: 1 = 0 and demonstrate statistical significance does not enable us to conclude that there is a linear relationship between x and y.
37
n Confidence Interval Estimate of E(yp)
n Prediction Interval Estimate of yp
yypp ++ tt/2 /2 ssindind
where:where: confidence coefficient is 1 - confidence coefficient is 1 - andand
tt/2 /2 is based on ais based on a t t distributiondistribution
with with nn - 2 degrees of freedom - 2 degrees of freedom
Using the Estimated Regression Equationfor Estimation and Prediction
/ y t sp yp 2 / y t sp yp 2
38
Point EstimationIf 3 TV ads are run prior to a sale, we expect the mean number of cars sold to be:
y = 10 + 5(3) = ______ cars^
Example: Reed Auto Sales
39
n Confidence Interval for E(yp)
95% confidence interval estimate of the 95% confidence interval estimate of the meanmean number of cars sold when 3 TV ads are run is:number of cars sold when 3 TV ads are run is:
25 25 ++ 4.61 = ______ to _______ cars 4.61 = ______ to _______ cars
Example: Reed Auto Sales
40
n Prediction Interval for yp
95% prediction interval estimate of the 95% prediction interval estimate of the number of cars sold in number of cars sold in one particular weekone particular week when 3 TV ads are run is:when 3 TV ads are run is:
25 25 ++ 8.28 = _____ to ______ cars 8.28 = _____ to ______ cars
Example: Reed Auto Sales
41
Residual for Observation i
yi – yi
Standardized Residual for Observation i
where:
and
Residual Analysis
^
y ysi i
y yi i
y ysi i
y yi i
^
^
s s hy y ii i 1s s hy y ii i 1^
2
2
)(
)(1
xx
xx
nh
i
ii
2
2
)(
)(1
xx
xx
nh
i
ii
42
Example: Reed Auto Sales
Residuals
Observation Predicted Cars Sold Residuals
1 15 -1
2 25 -1
3 20 -2
4 15 2
5 25 2
Observation Predicted Cars Sold Residuals
1 15 -1
2 25 -1
3 20 -2
4 15 2
5 25 2
43
Example: Reed Auto Sales
Residual Plot
TV Ads Residual Plot
-3
-2
-1
0
1
2
3
0 1 2 3 4TV Ads
Re
sid
ua
ls
44
Residual Analysis
Residual Plot
xx
ˆy y ˆy y
00
Good PatternGood Pattern
Resi
du
al
Resi
du
al
45
Residual Analysis
n Residual Plot
xx
ˆy y ˆy y
00
Nonconstant VarianceNonconstant Variance
Resi
du
al
Resi
du
al
46
Residual AnalysisResidual Analysis
n Residual Plot
xx
ˆy y ˆy y
00
Model Form Not AdequateModel Form Not Adequate
Resi
du
al
Resi
du
al
47
End of Chapter 12