1 1 slide simple linear regression part a n simple linear regression model n least squares method n...
TRANSCRIPT
1 1 Slide
Slide
Simple Linear RegressionPart A
Simple Linear Regression Model Least Squares Method Coefficient of Determination Model Assumptions Testing for Significance
2 2 Slide
Slide
Simple Linear Regression Model
y = b0 + b1x +e
where: b0 and b1 are called parameters of the model,
e is a random variable called the error term.
The simple linear regression model is:
The equation that describes how y is related to x and an error term is called the regression model.
dependent variable
independent variable
3 3 Slide
Slide
Simple Linear Regression Equation
The simple linear regression equation is:
• E(y) is the expected value of y for a given x value.
• b1 is the slope of the regression line.
• b0 is the y intercept of the regression line.
• Graph of the regression equation is a straight line.
E(y) = 0 + 1x
4 4 Slide
Slide
Simple Linear Regression Equation
Positive Linear Relationship
E(y)E(y)
xx
Slope b1
is positive
Regression line
Intercept b0
5 5 Slide
Slide
Simple Linear Regression Equation
Negative Linear Relationship
E(y)E(y)
xx
Slope b1
is negative
Regression lineIntercept b0
6 6 Slide
Slide
Simple Linear Regression Equation
No Relationship
E(y)E(y)
xx
Slope b1
is 0
Regression lineIntercept
b0
7 7 Slide
Slide
Estimated Simple Linear Regression Equation
The estimated simple linear regression equation
0 1y b b x
• is the estimated value of y for a given x value.y• b1 is the slope of the line.
• b0 is the y intercept of the line.
• The graph is called the estimated regression line.
8 8 Slide
Slide
Estimation Process
Regression Modely = b0 + b1x +e
Regression EquationE(y) = b0 + b1x
Unknown Parametersb0, b1
Sample Data:x y
x1 y1
. . . . xn yn
b0 and b1
provide estimates ofb0 and b1
EstimatedRegression Equation
Sample Statistics
b0, b1
0 1y b b x
9 9 Slide
Slide
Least Squares Method
Least Squares Criterion
min (y yi i )2
where:yi = observed value of the dependent variable
for the ith observation^yi = estimated value of the dependent variable
for the ith observation
10 10 Slide
Slide
Slope for the Estimated Regression Equation
1 2
( )( )
( )i i
i
x x y yb
x x
Least Squares Method
11 11 Slide
Slide
y-Intercept for the Estimated Regression Equation
Least Squares Method
0 1b y b x
where:xi = value of independent variable for ith observation
n = total number of observations
_y = mean value for dependent variable
_x = mean value for independent variable
yi = value of dependent variable for ith observation
12 12 Slide
Slide
Reed Auto periodically hasa special week-long sale. As part of the advertisingcampaign Reed runs one ormore television commercialsduring the weekend preceding the sale. Data
from asample of 5 previous sales are shown on the next
slide.
Simple Linear Regression
Example: Reed Auto Sales
13 13 Slide
Slide
Simple Linear Regression
Example: Reed Auto Sales
Number of TV Ads
Number ofCars Sold
13213
1424181727
14 14 Slide
Slide
Estimated Regression Equation
ˆ 10 5y x
1 2
( )( ) 205
( ) 4i i
i
x x y yb
x x
0 1 20 5(2) 10b y b x
Slope for the Estimated Regression Equation
y-Intercept for the Estimated Regression Equation
Estimated Regression Equation
15 15 Slide
Slide
Excel Worksheet (showing data)
Estimated Regression Equation
A B C D1 Week TV Ads Cars Sold 2 1 1 14 3 2 3 24 4 3 2 18 5 4 1 17 6 5 3 27 7
Download Ch12-CarSales.xlsx
16 16 Slide
Slide
Producing a Scatter Diagram and Trend Line
Step 1 Select cells B2:C6Step 2 Click the Insert tab on the RibbonStep 3 In the Charts group, click Scatter
Step 4 When the list of scatter diagram subtypes appears,
Click Scatter with only Markers
Estimated Regression Equation
Step 5 In the Chart Layouts group, click Layout 1
Step 6 Right-click on the Chart Title to display options; choose DeleteStep 7 Select the Horizontal (Value) Axis Title and replace it with TV Ads
17 17 Slide
Slide
Producing a Scatter Diagram and Trend Line
Estimated Regression Equation
Step 8 Select the Vertical (Value) Axis Title and replace it with Cars Sold
Step 9 Right-click on the Series 1 Legend Entry to display a list of options; choose Delete
Step 10 Position the mouse pointer over any Vertical (Value) Axis Major Gridline in the diagram and right-click to display a list of options; choose Delete
18 18 Slide
Slide
Producing a Scatter Diagram and Trend Line
Estimated Regression Equation
Step 11 Position the mouse pointer over any data point in the diagram and right-click to display a list of options; choose Add TrendlineStep 12 When the Format Trendline dialog box appears, Select Trendline Options and then Choose Linear from the Trend/Regression Type list Choose Display Equation on chart Click Close
19 19 Slide
Slide
Scatter Diagram and Trend Line
y = 5x + 10
0
5
10
15
20
25
30
0 1 2 3 4TV Ads
Ca
rs S
old
20 20 Slide
Slide
Coefficient of Determination
Relationship Among SST, SSR, SSE
where: SST = total sum of squares SSR = sum of squares due to regression SSE = sum of squares due to error
SST = SSR + SSE
2( )iy y 2ˆ( )iy y 2ˆ( )i iy y
21 21 Slide
Slide
The coefficient of determination is:
Coefficient of Determination
where:SSR = sum of squares due to regressionSST = total sum of squares
r2 = SSR/SST
22 22 Slide
Slide
Coefficient of Determination
r2 = SSR/SST = 100/114 = .8772
The regression relationship is very strong; 88%of the variability in the number of cars sold can beexplained by the linear relationship between thenumber of TV ads and the number of cars sold.
23 23 Slide
Slide
Using Excel to Produce r 2
Step 3 When the Format Trendline dialog box appears, Select Trendline Options and then Choose Display R-squared value on chart
Click Close
Step 2 Choose Add Trendline
Step 1 Position the mouse pointer over any data point
in the scatter diagram and right click at display a list of options
Coefficient of Determination
24 24 Slide
Slide
Excel Value Worksheet (showing r 2)
Coefficient of Determination
y = 5x + 10
R2 = 0.8772
0
5
10
15
20
25
30
0 1 2 3 4TV Ads
Ca
rs S
old
25 25 Slide
Slide
Sample Correlation Coefficient
21 ) of(sign rbrxy
ionDeterminat oft Coefficien ) of(sign 1brxy
where: b1 = the slope of the estimated regression
equation xbby 10ˆ
Excel approach:rxy = CORREL(x range, y range)
26 26 Slide
Slide
21 ) of(sign rbrxy
The sign of b1 in the equation is “+”.ˆ 10 5y x
=+ .8772xyr
Sample Correlation Coefficient
rxy = +.9366
27 27 Slide
Slide
Using Excel’s Regression Tool
The Regression tool can be used to perform a complete regression analysis.
Excel also has a comprehensive tool in its Data Analysis package called Regression.
Up to this point, you have seen how Excel can be used for various parts of a regression analysis.
28 28 Slide
Slide
Using Excel’s Regression Tool
Excel Worksheet (showing data)
A B C D1 Week TV Ads Cars Sold 2 1 1 14 3 2 3 24 4 3 2 18 5 4 1 17 6 5 3 27 7
29 29 Slide
Slide
Using Excel’s Regression Tool
Performing the Regression Analysis
Step 3 Choose Regression from the list of Analysis Tools
Step 2 In the Analysis group, click Data AnalysisStep 1 Click the Data tab on the Ribbon
30 30 Slide
Slide
Using Excel’s Regression Tool
Excel Regression Dialog Box
31 31 Slide
Slide
Using Excel’s Regression Tool
Excel Value WorksheetA B C D E F G H I
1 Week TV Ads Cars Sold2 1 1 143 2 3 244 3 2 185 4 1 176 5 3 2778 SUMMARY OUTPUT9
10 Regression Statistics11 Multiple R 0.93658581212 R Square 0.87719298213 Adjusted R Square 0.8362573114 Standard Error 2.16024689915 Observations 51617 ANOVA18 df SS MS F Significance F19 Regression 1 100 100 21.42857 0.01898623120 Residual 3 14 4.66666721 Total 4 1142223 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%24 Intercept 10 2.366431913 4.225771 0.024236 2.468950436 17.53104956 2.468950436 17.5310495625 TV Ads 5 1.08012345 4.6291 0.018986 1.562561893 8.437438107 1.562561893 8.43743810726
ANOVA Output
Regression Statistics Output
Data
Estimated Regression Equation Output
32 32 Slide
Slide
Using Excel’s Regression Tool
Note: Columns F-I are not shown.
Excel Value Worksheet (bottom-left portion)
A B C D E2223 Coeffic. Std. Err. t Stat P-value24 Intercept 10 2.36643 4.2258 0.0242425 TV Ads 5 1.08012 4.6291 0.0189926
ˆ 10 5y x
Estimated Regression Equation
33 33 Slide
Slide
Using Excel’s Regression Tool
Note: Columns C-E are hidden.
Excel Value Worksheet (bottom-right portion) A B F G H I
2223 Coeffic. Low. 95% Up. 95% Low. 95.0% Up. 95.0%24 Intercept 10 2.46895 17.53105 2.46895044 17.531049625 TV Ads 5 1.562562 8.437438 1.56256189 8.4374381126
34 34 Slide
Slide
Using Excel’s Regression Tool
Excel Value Worksheet (middle portion)A B C D E F1617 ANOVA18 df SS MS F Significance F19 Regression 1 100 100 21.4286 0.01898623120 Residual 3 14 4.6666721 Total 4 11422
35 35 Slide
Slide
Using Excel’s Regression Tool
Excel Value Worksheet (top portion)
A B C9
10 Regression Statistics11 Multiple R 0.93658581212 R Square 0.87719298213 Adjusted R Square 0.8362573114 Standard Error 2.16024689915 Observations 516
36 36 Slide
Slide
Assumptions About the Error Term e
1. The error is a random variable with mean of zero.1. The error is a random variable with mean of zero.
2. The variance of , denoted by 2, is the same for all values of the independent variable.2. The variance of , denoted by 2, is the same for all values of the independent variable.
3. The values of are independent.3. The values of are independent.
4. The error is a normally distributed random variable.4. The error is a normally distributed random variable.
37 37 Slide
Slide
Testing for Significance
To test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of b1 is zero.
To test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of b1 is zero.
Two tests are commonly used: Two tests are commonly used:
t Testt Test and F TestF Test
Both the t test and F test require an estimate of s 2, the variance of e in the regression model. Both the t test and F test require an estimate of s 2, the variance of e in the regression model.
Individualsignificance
test
Overallsignificance
test
38 38 Slide
Slide
An Estimate of s
Testing for Significance
210
2 )()ˆ(SSE iiii xbbyyy
where:
s 2 = MSE = SSE/(n - 2)
The mean square error (MSE) provides the estimateof s 2, and the notation s2 is also used.
39 39 Slide
Slide
Testing for Significance
An Estimate of s
2
SSEMSE
n
s
• To estimate s we take the square root of s 2.
• The resulting s is called the standard error of the estimate.
40 40 Slide
Slide
Hypotheses
Test Statistic
Testing for Significance: F Test
F = MSR/MSE
0 1: 0H
1: 0aH
41 41 Slide
Slide
Rejection Rule
Testing for Significance: F Test
where:F is based on an F distribution with
1 degree of freedom in the numerator andn - 2 degrees of freedom in the denominator
p-value approach: Reject H0 if p-value < aCritical value approach: Reject H0 if F > F
42 42 Slide
Slide
1. Determine the hypotheses.
2. Specify the level of significance.
3. Compute the value of the test statistic F.
a = .05
Testing for Significance: F Test
0 1: 0H
1: 0aH
p-value and critical value approach
F = MSR/MSE = 100/4.667 = 21.43test statistic
Use Ch12-CarSales.xlsxp-value
43 43 Slide
Slide
4. Compute the p-value.
Testing for Significance: F Testp-value approach
p-value is the area on the right of F with numerator’sdegrees of freedom 1 and denominator’s degrees offreedom n-2=5-2=3
F=21.43p-value
0.01<p-value<0.025
Excel approach:P-value=FIDST()=FDIST(21.43,1,3)=0.019
44 44 Slide
Slide
Testing for Significance: F Test
5. Determine whether to reject H0.
The p-value corresponding to F = 21.43 is less than .05. Hence, we reject H0.
The statistical evidence is sufficient to concludethat we have a significant relationship between thenumber of TV ads aired and the number of cars sold.
p-value approach
45 45 Slide
Slide
4. Compute the critical value.
Testing for Significance: F TestCritical value approach
Critical value is the value when the area on the right ofit is with numerator’s degrees of freedom 1 anddenominator’s degrees offreedom n-2=5-2=3
critical valuealpha level
Excel approach:Critical value=FINV()=FINV(0.05,1,3)=10.13
46 46 Slide
Slide
Testing for Significance: F Test
5. Determine whether to reject H0.
The critical value corresponding to the level of significance 0.05 is 10.13. The test statistic F = 21.43 is greater than the critical value. Hence, we reject H0.
The statistical evidence is sufficient to concludethat we have a significant relationship between thenumber of TV ads aired and the number of cars sold.
Critical value approach
47 47 Slide
Slide
Hypotheses
Test Statistic
Testing for Significance: t Test
0 1: 0H
1: 0aH
1
1
b
bt
s
1 2( )b
i
ss
x xwhere
48 48 Slide
Slide
Rejection Rule
Testing for Significance: t Test
where: t is based on a t distribution
with n - 2 degrees of freedom
p-value approach: Reject H0 if p-value < aCritical value approach: Reject H0 if t < -tor t > t
49 49 Slide
Slide
1. Determine the hypotheses.
2. Specify the level of significance.
3. Compute the test statistic.
a = .05
Testing for Significance: t Test
0 1: 0H
1: 0aH
1
1
b
bt
s
p-value and critical value approach
test statistic
p-value
Use Ch12-CarSales.xlsx
50 50 Slide
Slide
4. Compute the p-value.
Testing for Significance: t Testp-value approach
p-value is the area on both tails beyond the test statistic t.It’s based on n-2=3 degrees of freedom.
T=4.63
upper tail area
0.01<p-value<0.02p-value is from both tails.
Double the range.
Excel approach:P-value=TIDST()=TDIST(4.63,3,2)=0.019
51 51 Slide
Slide
Testing for Significance: t Test
5. Determine whether to reject H0.
The p-value is less than the alpha level. We can reject H0.
The TV Ads independent variable is a significant factorat the 0.05 level.
p-value approach
52 52 Slide
Slide
Testing for Significance: t Test
4. Compute the critical value and identify rejection rule.
Rejection Rule: Reject H0 if t > t or t < -t
Critical value approach
Critical values for two-tailed test are -tand t.At .05 level, they are -t.025and t.025 (with 3 degrees offreedom).
critical value t.025Excel approach: Critical value t.025 =TINV(0.025*2,3)=3.182 - t.025 =-TINV(0.025*2,3)=-3.182
53 53 Slide
Slide
Testing for Significance: t Test
5. Determine whether to reject H0.
t = 4.63 > 3.182. We can reject H0.
The TV Ads independent variable is a significant factor at the 0.05 level.
Critical value approach
54 54 Slide
Slide
Confidence Interval for 1
H0 is rejected if 0 is not included in the confidence interval for 1.
We can use a 95% confidence interval for 1 to test the hypotheses just used in the t test.
55 55 Slide
Slide
The form of a confidence interval for 1 is:
Confidence Interval for 1
11 / 2 bb t s
where is the t value providing an areaof a/2 in the upper tail of a t distributionwith n - 2 degrees of freedom
2/tb1 is the
pointestimat
or
is themarginof error
1/ 2 bt s
56 56 Slide
Slide
Confidence Interval for 1
Reject H0 if 0 is not included in the confidence interval for 1.
0 is not included in the confidence interval. Reject H0
= 5 +/- 3.182(1.08) = 5 +/- 3.4412/1 bstb
or 1.56 to 8.44
Rejection Rule
95% Confidence Interval for 1
Conclusion
b1 sb1Confidence Interval for 1
t.025 = TINV(0.025*2,3)=3.182
57 57 Slide
Slide
Confidence Interval for 1
Confidence Interval for 1
At a new level
A new level can be specified here
58 58 Slide
Slide
Some Cautions about theInterpretation of Significance Tests
Just because we are able to reject H0: b1 = 0 and demonstrate statistical significance does not enable
us to conclude that there is a linear relationshipbetween x and y.
Rejecting H0: b1 = 0 and concluding that the
relationship between x and y is significant does not enable us to conclude that a cause-and-effect
relationship is present between x and y.
59 59 Slide
Slide
Simple Linear RegressionPart B
Using the Estimated Regression Equation for Estimation Residual Analysis: Validating Model Assumptions Outliers and Influential Observations
60 60 Slide
Slide
1. If 3 TV ads are run prior to a sale, what is the estimated number of cars sold?
Estimation of y
^y = 10 + 5(3) = 25 cars
ˆ 10 5y x
61 61 Slide
Slide
Residual Analysis
ˆi iy y
Much of the residual analysis is based on an examination of graphical plots.
Residual for Observation i
The residuals provide the best information about e .
If the assumptions about the error term e appear questionable, the hypothesis tests about the significance of the regression relationship and the interval estimation results may not be valid.
62 62 Slide
Slide
Residual Plot Against x
If the assumption that the variance of e is the same for all values of x is valid, and the assumed regression model is an adequate representation of the relationship between the variables, then
The residual plot should give an overall impression of a horizontal band of points
63 63 Slide
Slide
x
ˆy y
0
Good PatternR
esi
du
al
Residual Plot Against x
64 64 Slide
Slide
Residual Plot Against x
x
ˆy y
0
Resi
du
al
Nonconstant Variance
65 65 Slide
Slide
Residual Plot Against x
x
ˆy y
0
Resi
du
al
Model Form Not Adequate
66 66 Slide
Slide
Residuals
Residual Plot Against x
Observation Predicted Cars Sold Residuals
1 15 -1
2 25 -1
3 20 -2
4 15 2
5 25 2
67 67 Slide
Slide
Using Excel to Produce a Residual Plot
• The output will include two new items:• A plot of the residuals against the
independent variable, and• A list of predicted values of y and the
corresponding residual values.
• When the Regression dialog box appears, we must also select the Residual Plot option.
• The steps outlined earlier to obtain the regression output are performed with one change.
Residual Plot Against x
68 68 Slide
Slide
Excel Value Worksheet (bottom portion)
Residual Plot Against x
A B C2829 RESIDUAL OUTPUT3031 Observation Predicted Cars Sold Residuals32 1 15 -133 2 25 -134 3 20 -235 4 15 236 5 25 237
69 69 Slide
Slide
Residual Plot Against x
TV Ads Residual Plot
-3
-2
-1
0
1
2
3
0 1 2 3 4TV Ads
Res
idu
als
70 70 Slide
Slide
Standardized Residual for Observation i
Standardized Residuals
ˆ
ˆ
i i
i i
y y
y ys
ˆ 1i i iy ys s h
2
2
( )1( )i
ii
x xh
n x x
where:
71 71 Slide
Slide
Standardized Residual Plot
The standardized residual plot can provide insight about the assumption that the error term e has a normal distribution.
If this assumption is satisfied, the distribution of the standardized residuals should appear to come from a standard normal probability distribution.
72 72 Slide
Slide
Excel’s Regression tool be used to obtain the standardized residuals.
The steps described earlier to conduct a regression analysis are performed with one change:• When the Regression dialog box appears,
we must select the Standardized Residuals option
The Standardized Residuals option does not automatically produce a standardized residual plot.
Standardized Residual Plot
73 73 Slide
Slide
Excel Value Worksheet
Standardized Residual Plot
A B C D2829 RESIDUAL OUTPUT3031 Observation Predicted Y Residuals Standard Residuals32 1 15 -1 -0.53452248433 2 25 -1 -0.53452248434 3 20 -2 -1.06904496835 4 15 2 1.06904496836 5 25 2 1.069044968
74 74 Slide
Slide
Excel’s Chart Wizard can be used to construct the standardized residual plot.
A scatter diagram is developed in which:• The values of the independent variable are
placed on the horizontal axis• The values of the standardized residuals are
placed on the vertical axis
Standardized Residual Plot
75 75 Slide
Slide
Excel Standardized Residual Plot
Standardized Residual Plot
A B C D2829 RESIDUAL OUTPUT3031 Observation Predicted Y ResidualsStandard Residuals32 1 15 -1 -0.53452233 2 25 -1 -0.53452234 3 20 -2 -1.06904535 4 15 2 1.06904536 5 25 2 1.06904537
-1.5
-1
-0.5
0
0.5
1
1.5
0 10 20 30
Cars Sold
Sta
nd
ard
Re
sid
ua
ls
76 76 Slide
Slide
Standardized Residual Plot
All of the standardized residuals are between –1.5 and +1.5 indicating that there is no reason to question the assumption that e has a normal distribution.
77 77 Slide
Slide
Outliers and Influential Observations
Detecting Outliers• An outlier is an observation that is unusual
in comparison with the other data.• Minitab classifies an observation as an
outlier if its standardized residual value is < -2 or > +2.
• This standardized residual rule sometimes fails to identify an unusually large observation as being an outlier.
• This rule’s shortcoming can be circumvented by using studentized deleted residuals.
• The |i th studentized deleted residual| will be larger than the |i th standardized residual|.