business statistics, 4e, by ken black. © 2003 john wiley & sons. 13-1 business statistics, 4e...
TRANSCRIPT
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-1
Business Statistics, 4eby Ken Black
Chapter 13
Simple RegressionAnalysis
Discrete Distributions
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-2
Learning Objectives
• Compute the equation of a simple regression line from a sample of data, and interpret the slope and intercept of the equation.
• Understand the usefulness of residual analysis in testing the assumptions underlying regression analysis and in examining the fit of the regression line to the data.
• Compute a standard error of the estimate and interpret its meaning.
• Compute a coefficient of determination and interpret it.• Test hypotheses about the slope of the regression model and
interpret the results.• Estimate values of Y using the regression model.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-3
Regression and Correlation
• Regression analysis is the process of constructing a mathematical model or function that can be used to predict or determine one variable by another variable.
• Correlation is a measure of the degree of relatedness of two variables.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-4
Simple Regression Analysis
• bivariate (two variables) linear regression -- the most elementary regression model– dependent variable, the variable to be
predicted, usually called Y– independent variable, the predictor or
explanatory variable, usually called X
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-5
Airline Cost Data
Number of Passengers
XCost ($1,000)
Y
61 4.280
63 4.080
67 4.420
69 4.170
70 4.480
74 4.300
76 4.820
81 4.700
86 5.110
91 5.130
95 5.640
97 5.560
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-6
Scatter Plot of Airline Cost Data
0
1
2
3
4
5
6
0 20 40 60 80 100 120
Number of Passengers
Co
st
($1
00
0)
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-7
Regression Models
Deterministic Regression ModelDeterministic Regression Model
Y = Y = 00 + + 11XX
Probabilistic Regression ModelProbabilistic Regression Model
Y = Y = 00 + + 11X + X +
00 and and 11 are population parameters are population parameters
00 and and 11 are estimated by sample statistics b are estimated by sample statistics b00 and b and b11
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-8
Equation of the Simple Regression Line
YY
where
XY
bb
bb
of valuepredicted the= ˆ
slope sample the=
intercept sample the= :
ˆ
1
0
10
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-9
Least Squares Analysis
1 2 2 2
22b
X X X X X XX X Y Y XY nXY
n
XYX Y
n
n
0 1 1b b bY XY
n
X
n
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-10
Least Squares Analysis
SS X X Y Y XYX Y
n
SSn
SS
SS
XY
XX
XY
XX
X X X X
b
2 2
2
1
0 1 1b b bY XY
n
X
n
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-11
Solving for b1 and b0 of the Regression Line: Airline Cost Example (Part 1)
Number ofPassengers Cost ($1,000) X Y X2 XY
61 4.28 3,721 261.0863 4.08 3,969 257.0467 4.42 4,489 296.1469 4.17 4,761 287.7370 4.48 4,900 313.6074 4.30 5,476 318.2076 4.82 5,776 366.3281 4.70 6,561 380.7086 5.11 7,396 439.4691 5.13 8,281 466.8395 5.64 9,025 535.8097 5.56 9,409 539.32
X = 930 Y = 56.69 2X = 73,764 XY = 4,462.22
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-12
Solving for b1 and b0 of the Regression Line: Airline Cost Example (Part 2)
745.6812
)69.56)(930(22.462,4
n
YXXYSS XY
168912
)930(764,73
)( 222
n
XXSS XX
0407.1689
745.681
XX
XY
SS
SSb
57.112
930)0407(.
12
69.5610
n
Xb
n
Yb
XY 0407.57.1ˆ
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-13
Graph of Regression Line for the Airline Cost Example
0
1
2
3
4
5
6
0 20 40 60 80 100 120
Number of Passengers
Co
st (
$100
0)
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-14
Airline Cost: Excel Summary Output
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.94820033
R Square 0.89908386
Adjusted R Square 0.88899225
Standard Error 0.17721746
Observations 12
ANOVA
df SS MS F Significance F
Regression 1 2.79803 2.79803 89.092179 2.7E-06
Residual 10 0.31406 0.03141
Total 11 3.11209
Coefficients Standard Error t Stat P-valueIntercept 1.56979278 0.33808 4.64322 0.0009175Number of Passengers 0.0407016 0.00431 9.43887 2.692E-06
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-15
Residual Analysis: Airline Cost Example
N u m b e r o f P r e d i c t e dP a s s e n g e r s C o s t ( $ 1 , 0 0 0 ) V a l u e R e s i d u a l X Y Y ˆ YY ˆ
6 1 4 . 2 8 4 . 0 5 3 . 2 2 76 3 4 . 0 8 4 . 1 3 4 - . 0 5 46 7 4 . 4 2 4 . 2 9 7 . 1 2 36 9 4 . 1 7 4 . 3 7 8 - . 2 0 87 0 4 . 4 8 4 . 4 1 9 . 0 6 17 4 4 . 3 0 4 . 5 8 2 - . 2 8 27 6 4 . 8 2 4 . 6 6 3 . 1 5 78 1 4 . 7 0 4 . 8 6 7 - . 1 6 78 6 5 . 1 1 5 . 0 7 0 . 0 4 09 1 5 . 1 3 5 . 2 7 4 - . 1 4 49 5 5 . 6 4 5 . 4 3 6 . 2 0 49 7 5 . 5 6 5 . 5 1 8 . 0 4 2
001.)ˆ( YY
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-16
Excel Graph of Residuals for the Airline Cost Example
10090807060
0.2
0.1
0.0
-0.1
-0.2
-0.3
Number of Passengers
Res
idua
l
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-17
Nonlinear Residual Plot
0 X
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-18
Nonconstant Error Variance
0 X
0 X
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-19
Graphs of Nonindependent Error Terms
0 X 0 X
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-20
Healthy Residual Plot
0 X
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-21
Standard Error of the Estimate
SSE
Y XY
SSE
n
Y Y
Y b b
Se
2
2
0 1
2
Sum of Squares Error
Standard Errorof the
Estimate
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-22
Determining SSE for the Airline Cost Example
N u m b e r o fP a s s e n g e r s C o s t ( $ 1 , 0 0 0 ) R e s i d u a l X Y YY ˆ 2)ˆ( YY
6 1 4 . 2 8 . 2 2 7 . 0 5 1 5 36 3 4 . 0 8 - . 0 5 4 . 0 0 2 9 26 7 4 . 4 2 . 1 2 3 . 0 1 5 1 36 9 4 . 1 7 - . 2 0 8 . 0 4 3 2 67 0 4 . 4 8 . 0 6 1 . 0 0 3 7 27 4 4 . 3 0 - . 2 8 2 . 0 7 9 5 27 6 4 . 8 2 . 1 5 7 . 0 2 4 6 58 1 4 . 7 0 - . 1 6 7 . 0 2 7 8 98 6 5 . 1 1 . 0 4 0 . 0 0 1 6 09 1 5 . 1 3 - . 1 4 4 . 0 2 0 7 49 5 5 . 6 4 . 2 0 4 . 0 4 1 6 29 7 5 . 5 6 . 0 4 2 . 0 0 1 7 6
001.)ˆ( YY 2)ˆ( YY = . 3 1 4 3 4
S u m o f s q u a r e s o f e r r o r = S S E = . 3 1 4 3 4
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-23
Standard Error of the Estimate for the Airline Cost Example
1773.010
31434.0
2
31434.0
ˆ 2
n
SSE
SSE
S
YY
e
Sum of Squares ErrorSum of Squares Error
Standard Errorof the
Estimate
Standard Errorof the
Estimate
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-24
Coefficient of Determination
SSn
SS lained iation un lained iation
SS SSR SSE
SSR
SS
SSE
SSSSR
SSSSE
SSSSE
n
YY
YY
YY
YY YY
YY
YY
Y Y YY
r
YY
2 2
2
2
2
2
1
1
1
exp var exp var
0 12 r
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-25
Coefficient of Determination for the Airline Cost Example
899.11209.3
31434.1
1
11209.312
56.699251.270
31434.0
2
22
2
YY
YY
SS
SSEr
n
YYSS
SSE
89.9% of the variabilityof the cost of flying a
Boeing 737 is accounted for by the number of passengers.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-26
Hypothesis Tests for the Slope of the Regression Model
t
whereSS
SSE
n
SSn
df n
bS
S S
S
XX
b
be
XX
e
XX
1 1
2
2
1
2
2
:
the hypothesized slope
H
H
01
11
0
0
:
:
H
H
01
11
0
0
:
:
H
H
01
11
0
0
:
:
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-27
Hypothesis Test: Airline Cost Example (Part 1)
0
0
10,025.
Hreject not do ,228.2228.2
Hreject ,228.2||
228.2
05.
102102
tIf
tIf
ndf
t
H
H
01
11
0
0
:
:
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-28
Hypothesis Test: Airline Cost Example (Part 2)
43.912
764,73
1773.00407.
)930(2
t
0Hreject ,228.243.9 Since t
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-29
Testing the Overall Model (Part 1)
0
0
10,1,05.
Hreject not do ,96.4
Hreject ,96.4
96.4
05.
1011121
1
FIf
IfF
kndf
kdf
F
err
reg
H
H
01
11
0
0
:
:
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-30
Testing the Overall Model (Part 2)
ANOVA
df SS MS F Significance F
Regression 1 2.79803 2.79803 89.092179 2.7E-06
Residual 10 0.31406 0.03141
Total 11 3.11209
09.8903141.0
7980.2
103141.01
7980.2
F
err
reg
err
err
reg
reg
MS
MS
dfSSdfSS
F
F = 89.09 > 4.96, reject H0
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-31
Point Estimation for the Airline Cost Example
10.541,4$ 5411.4
730407.057.1ˆ
,73
0407.057.1ˆ
or
Y
XFor
XY
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-32
Confidence Interval to Estimate Y : Airline Cost Example
6631.44191.4
12205411.412
930764,73
5.7773
12
11773.0228.25411.4
,level confidence 95% a and 73For
=SS
of valueparticular a :SS01ˆ
73
2
2
0
2
2XX
0
XX
2
2,2
YE
X
n
XX
XXwheren
YXXSt en
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-33
Confidence Interval to Estimate the Average Value of Y for some Values of
X: Airline Cost Example
X Confidence Interval
62 4.0934 + .1876 3.9058 to 4.2810 68 4.3376 + .1461 4.1915 to 4.4837 73 4.5411 + .1220 4.4191 to 4.6631 85 5.0295 + .1349 4.8946 to 5.1644 90 5.2230 + .1656 5.0674 to 5.3986
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-34
Prediction Interval to Estimate Y for a given value of X
n
where
nY
XX
X
XXSt en
2
2XX
0
XX
2
2,2
=SS
X of valueparticular a :
SS01
1ˆ
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-35
Confidence Intervals for Estimation
60 70 80 90 100
4
5
6
Number of Passengers
Co
st
Regression
95% CI
95% PI
Regression Plot
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-36
MINITAB Regression Analysis of the Airline Cost Example
The regression equation isCost = 1.57 + 0.0407 Number of Passengers
Predictor Coef StDev T PConstant 1.5698 0.3381 4.64 0.001Number o 0.040702 0.004312 9.44 0.000
S = 0.1772 R-Sq = 89.9% R-Sq(adj) = 88.9%
Analysis of Variance
Source DF SS MS F PRegression 1 2.7980 2.7980 89.09 0.000Residual Error 10 0.3141 0.0314Total 11 3.1121
Obs Number o Cost Fit StDev Fit Residual St Resid 1 61.0 4.2800 4.0526 0.0876 0.2274 1.48 2 63.0 4.0800 4.1340 0.0808 -0.0540 -0.34 3 67.0 4.4200 4.2968 0.0683 0.1232 0.75 4 69.0 4.1700 4.3782 0.0629 -0.2082 -1.26 5 70.0 4.4800 4.4189 0.0605 0.0611 0.37 6 74.0 4.3000 4.5817 0.0533 -0.2817 -1.67 7 76.0 4.8200 4.6631 0.0516 0.1569 0.93 8 81.0 4.7000 4.8666 0.0533 -0.1666 -0.99 9 86.0 5.1100 5.0701 0.0629 0.0399 0.24 10 91.0 5.1300 5.2736 0.0775 -0.1436 -0.90 11 95.0 5.6400 5.4364 0.0912 0.2036 1.34 12 97.0 5.5600 5.5178 0.0984 0.0422 0.29
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-37
Pearson Product-Moment Correlation Coefficient
rSSXY
SSX SSY
X X Y Y
XYX Y
n
n n
X X Y Y
XX
YY
2 2
2
2
2
2
1 1r
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 13-38
Three Degrees of Correlation
r < 0 r > 0
r = 0