11 - 1 © 1997 prentice-hall, inc. learning objectives n describe the linear regression model n...
TRANSCRIPT
11 - 1
© 1997 Prentice-Hall, Inc.
Learning ObjectivesLearning Objectives
Describe the linear regression model State the regression modeling steps Explain least squares Compute regression coefficients Describe residual analysis Predict the response variable Understand correlational analysis
11 - 2
© 1997 Prentice-Hall, Inc.
Probabilistic ModelsProbabilistic Models
Hypothesize 2 components Deterministic Random error
Example: Sales volume is 10 times advertising spending plus random error Y = 10X + e Random error may be due to factors
other than advertising
11 - 3
© 1997 Prentice-Hall, Inc.
Types of Probabilistic Models
Types of Probabilistic Models
ProbabilisticModels
RegressionModels
CorrelationModels
OtherModels
ProbabilisticModels
RegressionModels
CorrelationModels
OtherModels
11 - 4
© 1997 Prentice-Hall, Inc.
Regression ModelsRegression Models
Answer ‘What is the relationship between the variables?’
Equation used 1 numerical dependent (response) variable
What is to be predicted 1 or more numerical or categorical
independent (explanatory) variables
Used mainly for prediction
11 - 5
© 1997 Prentice-Hall, Inc.
Regression Modeling Steps
Regression Modeling Steps
Define problem or question Specify model Collect data Do descriptive data analysis Estimate unknown parameters Evaluate model Use model for prediction
11 - 6
© 1997 Prentice-Hall, Inc.
Problem DefinitionProblem Definition
Most critical step Don’t want right answer to wrong question
What are the model objectives? Who will use the model? What will be the benefits? Are resources available (data etc.)? How will the results be implemented?
11 - 7
© 1997 Prentice-Hall, Inc.
Specifying the ModelSpecifying the Model
Define variables Conceptual (e.g., advertising, price) Empirical (e.g., list price, regular price) Measurement (e.g., $, units)
Hypothesize nature of relationship Expected effects (i.e., coefficients’ signs) Functional form (linear or non-linear) Interactions
11 - 8
© 1997 Prentice-Hall, Inc.
Model Specification Is Based on Theory
Model Specification Is Based on Theory
Economic & business theory Mathematical theory Previous research ‘Common sense’
11 - 9
© 1997 Prentice-Hall, Inc.
Types of Regression Models
Types of Regression Models
RegressionModels
LinearNon-
Linear
2+ ExplanatoryVariables
Simple
Non-Linear
Multiple
Linear
1 ExplanatoryVariable
RegressionModels
LinearNon-
Linear
2+ ExplanatoryVariables
Simple
Non-Linear
Multiple
Linear
1 ExplanatoryVariable
11 - 10
© 1997 Prentice-Hall, Inc.
Y
Y = mX + b
b = Y-intercept
X
Changein Y
Change in X
m = Slope
Linear EquationsLinear Equations
High School Teacher© 1984-1994 T/Maker Co.
11 - 11
© 1997 Prentice-Hall, Inc.
Linear Regression Model
Linear Regression Model
Relationship between variables is a linear function
Y Xi i i 0 1Y Xi i i 0 1
Dependent (Response) Variable
Independent (Explanatory) Variable
Population Slope
Population Y-Intercept
Random Error
11 - 12
© 1997 Prentice-Hall, Inc.
Sample Linear Regression Model
Sample Linear Regression Model
Y
X
Y
X
Y b b X ei i i 0 1Y b b X ei i i 0 1
Y b b Xi i 0 1Y b b Xi i 0 1
Unsampled observation
ei = Random
error
Observed value
11 - 13
© 1997 Prentice-Hall, Inc.
0204060
0 20 40 60
X
Y
Scatter DiagramScatter Diagram
Plot of all (Xi, Yi) pairs Suggests how well model will fit
11 - 14
© 1997 Prentice-Hall, Inc.
0204060
0 20 40 60
X
Y
Thinking ChallengeThinking Challenge
How would you draw a line through the points? How do you determine which line ‘fits best’?
Alone Group Class
11 - 15
© 1997 Prentice-Hall, Inc.
Least Squares Least Squares
‘Best fit’ means difference between actual Y values & predicted Y values are a minimum But positive differences off-set negative
LS minimizes the sum of the squared differences (or errors)
n
ii
n
iii eYY
1
2
1
2ˆ
11 - 16
© 1997 Prentice-Hall, Inc.
Least Squares Graphically
Least Squares Graphically
e2
Y
X
e1 e3
e4e2
Y
X
e1 e3
e4
Y b b X e2 0 1 2 2 Y b b X e2 0 1 2 2
Y b b Xi i 0 1Y b b Xi i 0 1
LS minimizes e e e e eii
n2
112
22
32
42
LS minimizes e e e e eii
n2
112
22
32
42
11 - 17
© 1997 Prentice-Hall, Inc.
Coefficient EquationsCoefficient Equations
XbYb
XnX
YXnYXb
XbbY
n
ii
n
iii
ii
10
1
22
11
10ˆ
Sample regression equation
Sample slope
Sample Y-intercept
# (Xi, Yi) pairs
Average Xi’s, then square
11 - 18
© 1997 Prentice-Hall, Inc.
Computation TableComputation Table
Xi Yi Xi2 Yi
2 XiYi
X1 Y1 X12 Y1
2 X1Y1
X2 Y2 X22 Y2
2 X2Y2
: : : : :
Xn Yn Xn2 Yn
2 XnYn
XiYi
Xi2 Yi
2 XiYi
Xi Yi Xi2 Yi
2 XiYi
X1 Y1 X12 Y1
2 X1Y1
X2 Y2 X22 Y2
2 X2Y2
: : : : :
Xn Yn Xn2 Yn
2 XnYn
XiYi
Xi2 Yi
2 XiYi
11 - 19
© 1997 Prentice-Hall, Inc.
Interpretation of Coefficients
Interpretation of Coefficients
Slope (b1) Estimated Y changes by b1 for each
1 unit increase in XExample: If b1 = 2, then Sales (Y) is expected
to increase by 2 for each 1 unit increase in Advertising (X)
Y-Intercept (b0) Average value of Y when X = 0
Example: If b0 = 4, then average Sales (Y) is expected to be 4 when Advertising (X) is 0
11 - 20
© 1997 Prentice-Hall, Inc.
Parameter Estimation Example
Parameter Estimation Example
You’re a marketing analyst for Hasbro Toys. You gather the following data:
Ad $ Sales (Units)1 12 13 24 25 4
What is the relationship between sales & advertising?
11 - 21
© 1997 Prentice-Hall, Inc.
0
1
2
3
4
0 1 2 3 4 5
Scatter Diagram Sales vs. Advertising
Scatter Diagram Sales vs. Advertising
Sales
Advertising
11 - 22
© 1997 Prentice-Hall, Inc.
Parameter Estimation Solution Table
Parameter Estimation Solution Table
Xi Yi Xi2 Yi
2 XiYi
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
Xi Yi Xi2 Yi
2 XiYi
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
11 - 23
© 1997 Prentice-Hall, Inc.
Coefficient Interpretation Solution
Coefficient Interpretation Solution
Slope (b1) Sales Volume (Y) is expected to
increase by .7 units for each $1 increase in Advertising (X)
Y-Intercept (b0) Average value of Sales Volume (Y) is
-.10 units when Advertising (X) is 0Difficult to explain to Marketing ManagerExpect some sales without advertising
11 - 24
© 1997 Prentice-Hall, Inc.
Parameter Estimation Excel Output
Parameter Estimation Excel Output
b0
bP
b1
11 - 25
© 1997 Prentice-Hall, Inc.
Evaluating the ModelEvaluating the Model
How well does the model describe the relationship between the variables?
Closeness of ‘best fit’ Closer the points to the line the better
Assumptions met Significance of parameter estimates
11 - 26
© 1997 Prentice-Hall, Inc.
Evaluating Model StepsEvaluating Model Steps
Examine variation measures
Do residual analysis
Test coefficients for significance Y b b Xi i 0 1
Y b b Xi i 0 1
11 - 27
© 1997 Prentice-Hall, Inc.
Random Error VariationRandom Error Variation
Variation of actual Y from predicted Y Measured by standard error of
estimate Sample standard deviation of e Denoted SYX
Affects several factors Parameter significance Prediction accuracy
11 - 28
© 1997 Prentice-Hall, Inc.
Standard Error of Estimate
Standard Error of Estimate
S
Y Y
n
S
Y b Y b X Y
n
YX
i ii
n
YX
i ii
n
i ii
n
i
n
e j21
20
11
11
2
2
S
Y Y
n
S
Y b Y b X Y
n
YX
i ii
n
YX
i ii
n
i ii
n
i
n
e j21
20
11
11
2
2
11 - 29
© 1997 Prentice-Hall, Inc.
Measures of Variation in Regression
Measures of Variation in Regression
Total sum of squares (SST) Measures variation of observed Yi
around the mean`Y Explained variation (SSR)
Variation due to relationship between X & Y
Unexplained variation (SSE) Variation due to other factors
11 - 30
© 1997 Prentice-Hall, Inc.
Variation MeasuresVariation Measures
Y
X
Y
X i
Y
X
Y
X i
Y b b Xi i 0 1Y b b Xi i 0 1
Total sum of squares (Yi -`Y)2
Unexplained sum of squares (Yi - Yi)2 ^
Explained sum of squares (Yi -`Y)2 ^
Yi
11 - 31
© 1997 Prentice-Hall, Inc.
Proportion of variation ‘explained’ by relationship between X & Y
Coefficient of DeterminationCoefficient of Determination
r
b Y b X Y n Y
Y n Y
i i ii
n
i
n
ii
n
2
0 12
11
2
1
2
Explained VariationTotal Variation
SSRSST
c h
c h
r
b Y b X Y n Y
Y n Y
i i ii
n
i
n
ii
n
2
0 12
11
2
1
2
Explained VariationTotal Variation
SSRSST
c h
c h
0 £ r2 £ 1
11 - 32
© 1997 Prentice-Hall, Inc.
YY i = b0 + b1X i
X
^
Y
Y i = b0 + b1X i
X
^
Y
Y i = b0 + b1X i
X
^
r 2 Examplesr 2 Examples
Y
Y i = b0 + b1X i
X
^
r2 = 1 r2 = 1
r2 = .8 r2 = 0
11 - 33
© 1997 Prentice-Hall, Inc.
Proportion of variation ‘explained’ by relationship between X & Y
Reflects Sample size Number of independent variables
Adjusted Coefficient of Determination
Adjusted Coefficient of Determination
11 - 34
© 1997 Prentice-Hall, Inc.
Coef. of Determination
Excel OutputCoef. of Determination
Excel Output
r2
r2 adjusted for number of explanatory variables & sample size
SYX
11 - 35
© 1997 Prentice-Hall, Inc.
Residual AnalysisResidual Analysis
Graphical analysis of residuals Plot residuals vs. Xi values
Residuals are also called errors Difference between actual Yi &
predicted Yi
Purposes Examine functional form (linear vs.
non-linear model) Evaluate violations of assumptions
11 - 36
© 1997 Prentice-Hall, Inc.
Linear Regression Assumptions
Linear Regression Assumptions
Normality Y values are normally distributed for
each X Probability distribution of error is normal
Homoscedasticity (constant variance) Independence of errors Linearity
11 - 37
© 1997 Prentice-Hall, Inc.
X
e
Residual Plot for Functional FormResidual Plot for Functional Form
X
e
Add X2 Term Correct Specification
11 - 38
© 1997 Prentice-Hall, Inc.
Residual Plot for HomoscedasticityResidual Plot for
Homoscedasticity
X
SR
Heteroscedasticity
X
SR
Correct Specification
Fan-shaped.Standardized residuals used typically.
11 - 39
© 1997 Prentice-Hall, Inc.
Residual Plot for Independence
Residual Plot for Independence
X
SR
Not Independent
X
SR
Correct Specification
Plots reflect sequence data were collected.
11 - 40
© 1997 Prentice-Hall, Inc.
Residual Analysis Excel Output
Residual Analysis Excel Output
11 - 41
© 1997 Prentice-Hall, Inc.
Residual Plot Excel OutputResidual Plot Excel Output
11 - 42
© 1997 Prentice-Hall, Inc.
Test of Slope CoefficientTest of Slope Coefficient
Tests if there is a linear relationship between X & Y
Involves population slope b1
Hypotheses H0: b1 = 0 (No linear relationship)
H1: b1 ¹ 0 (Linear relationship)
Theoretical basis is sampling distribution of slopes
11 - 43
© 1997 Prentice-Hall, Inc.
Test of Slope Parameter Solution
Test of Slope Parameter Solution
H0: b1 = 0
H1: b1 ¹ 0
a = .05
df = 5 - 2 = 3
Critical Value(s):
Test Statistic:
Decision:
Conclusion:
t0 3.1824-3.1824
.025
Reject Reject
.025
t0 3.1824-3.1824
.025
Reject Reject
.025
tb
Sb
1 1
1
0 70 00 1915
3 656 .
..t
bSb
1 1
1
0 70 00 1915
3 656 .
..
Reject at a = .05
There is evidence of a relationship
11 - 44
© 1997 Prentice-Hall, Inc.
Test StatisticSolution
Test StatisticSolution
SS
X n X
tb
S
bYX
ii
n
nb
1
1
2 2
1
2
21 1
0 60553
55 5 30 1915
0 70 00 1915
3 656
c h afaf.
.
..
.
SS
X n X
tb
S
bYX
ii
n
nb
1
1
2 2
1
2
21 1
0 60553
55 5 30 1915
0 70 00 1915
3 656
c h afaf.
.
..
.
11 - 45
© 1997 Prentice-Hall, Inc.
Test of Slope ParameterExcel Output
Test of Slope ParameterExcel Output
t = bP /SbSbbP
P-ValuePP
11 - 46
© 1997 Prentice-Hall, Inc.
Prediction With Regression Models
Prediction With Regression Models
Types of predictions Point estimates Interval estimates
What is predicted Population mean response (mYX) for
given XPoint on population regression line
Individual response (Yi) for given X
11 - 47
© 1997 Prentice-Hall, Inc.
What Is PredictedWhat Is Predicted
Mean Y ( YX )
YY Individual
Prediction, Y
YX = 0 + 1X
^
XXGiven
Mean Y ( YX )
YY Individual
Prediction, Y
YX = 0 + 1X
^
XXGiven
11 - 48
© 1997 Prentice-Hall, Inc.
Factors Affecting Interval Width
Factors Affecting Interval Width
Level of confidence (1 - a) Width increases as confidence increases
Data dispersion (SYX) Width increases as variation increases
Sample size Width decreases as sample size increases
Distance of Xgiven from mean`X Width increases as distance increases
11 - 49
© 1997 Prentice-Hall, Inc.
Regression CautionsRegression Cautions
Violated assumptions Relevancy of
historical data Level of significance Extrapolation Cause & effect
11 - 50
© 1997 Prentice-Hall, Inc.
ExtrapolationExtrapolation
Y
Interpolation
X
Extrapolation Extrapolation
Relevant Range
Y
Interpolation
X
Extrapolation Extrapolation
Relevant Range
11 - 51
© 1997 Prentice-Hall, Inc.
Cause & EffectCause & Effect
Liquor Consumption
# Teachers
11 - 52
© 1997 Prentice-Hall, Inc.
Types of Probabilistic Models
Types of Probabilistic Models
ProbabilisticModels
RegressionModels
CorrelationModels
OtherModels
ProbabilisticModels
RegressionModels
CorrelationModels
OtherModels
11 - 53
© 1997 Prentice-Hall, Inc.
Correlation ModelsCorrelation Models
Answer ‘How strong is the linear relationship between 2 variables?’
Coefficient of correlation used Population correlation coefficient
denoted r (rho) Values range from -1 to +1 Measures degree of association
Used mainly for understanding
11 - 54
© 1997 Prentice-Hall, Inc.
Pearson Product-Moment Coefficient of Correlation:
Sample Coefficient of Correlation
Sample Coefficient of Correlation
r
X X Y Y
X X Y Y
i ii
n
ii
n
ii
n
Coefficient of Determination
c hc h
c h c h1
2
1
2
1
r
X X Y Y
X X Y Y
i ii
n
ii
n
ii
n
Coefficient of Determination
c hc h
c h c h1
2
1
2
1
11 - 55
© 1997 Prentice-Hall, Inc.
Correlation & Regression Line
Correlation & Regression Line
Y
Y i = b0 + b1X i
X
^
Y
Y i = b0 + b1X i
X
^
YY i = b0 + b1X i
X
^
Y
Y i = b0 + b1X i
X
^
r = 1 r = -1
r = .89 r = 0
11 - 56
© 1997 Prentice-Hall, Inc.
Test of Correlation Coefficient
Test of Correlation Coefficient
Shows if there is a linear relationship between 2 numerical variables
Same conclusion as testing population slope b1
Hypotheses H0: r = 0 (No correlation)
H1: r ¹ 0 (Correlation)
11 - 57
© 1997 Prentice-Hall, Inc.
ConclusionConclusion
Described the linear regression model Stated the regression modeling steps Explained least squares Computed regression coefficients Described residual analysis Predicted the response variable
11 - 58
© 1997 Prentice-Hall, Inc.
Learning ObjectivesLearning Objectives
Explain the linear multiple regression model
Interpret linear multiple regression computer output
Explain multicollinearity
11 - 59
© 1997 Prentice-Hall, Inc.
Multiple Regression Models
Multiple Regression Models
MultipleRegression
Models
LinearDummyVariable
LinearNon-
Linear
Inter-action
Poly-Nomial
SquareRoot Log Reciprocal Exponential
MultipleRegression
Models
LinearDummyVariable
LinearNon-
Linear
Inter-action
Poly-Nomial
SquareRoot Log Reciprocal Exponential
11 - 60
© 1997 Prentice-Hall, Inc.
Linear Multiple Regression Model
Linear Multiple Regression Model
Relationship between 1 dependent & 2 or more independent variables is a linear function
Y X X Xi i i P Pi i 0 1 1 2 2 Y X X Xi i i P Pi i 0 1 1 2 2
Dependent (response) variable
Independent (explanatory) variables
Population slopes
Population Y-intercept
Random error
11 - 61
© 1997 Prentice-Hall, Inc.
Population Multiple Regression Model
Population Multiple Regression Model
X2
Y
X1YX = 0 + 1X 1 i + 2X 2 i
0
Y i = 0 + 1X 1 i + 2X 2 i + i
ResponsePlane
(X 1 i,X 2 i)
(Observed Y )
i
X2
Y
X1YX = 0 + 1X 1 i + 2X 2 i
0
Y i = 0 + 1X 1 i + 2X 2 i + i
ResponsePlane
(X 1 i,X 2 i)
(Observed Y )
i
Bivariate model
11 - 62
© 1997 Prentice-Hall, Inc.
Sample Multiple Regression ModelSample Multiple
Regression Model
X2
Y
X1
b0
Y i = b0 + b1X 1 i + b2X 2 i + e i
ResponsePlane
(X 1 i,X 2 i)
(Observed Y)
^
e i
Y i = b0 + b1X 1 i + b2X 2 i
X2
Y
X1
b0
Y i = b0 + b1X 1 i + b2X 2 i + e i
ResponsePlane
(X 1 i,X 2 i)
(Observed Y)
^
e i
Y i = b0 + b1X 1 i + b2X 2 i
Bivariate model
11 - 63
© 1997 Prentice-Hall, Inc.
Regression Modeling Steps
Regression Modeling Steps
Define problem or question Specify model Collect data Do descriptive data analysis Estimate unknown parameters Evaluate model Use model for prediction
11 - 64
© 1997 Prentice-Hall, Inc.
Parameter EstimationParameter Estimation
Linear Multiple Regression Model
11 - 65
© 1997 Prentice-Hall, Inc.
Multiple Linear Regression Equations
Multiple Linear Regression Equations
Too complicated
by hand! Ouch!
11 - 66
© 1997 Prentice-Hall, Inc.
Interpretation of Estimated Coefficients
Interpretation of Estimated Coefficients
Slope (bP) Estimated Y changes by bP for each 1
unit increase in XP holding all other variables constantExample: If b1 = 2, then Sales (Y) is expected
to increase by 2 for each 1 unit increase in Advertising (X1) given the Number of Sales Rep’s (X2)
Y-Intercept (b0) Average value of Y when XP = 0
11 - 67
© 1997 Prentice-Hall, Inc.
Parameter Estimation Example
Parameter Estimation Example
You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.) & newspaper circulation (000) on the number of ad responses (00).
You’ve collected the following data:
Resp Size Circ
1 1 24 8 81 3 13 5 72 6 44 10 6
11 - 68
© 1997 Prentice-Hall, Inc.
Parameter Estimation Excel Output
Parameter Estimation Excel Output
bP
b0
b1b2
11 - 69
© 1997 Prentice-Hall, Inc.
Interpretation of Coefficients Solution
Interpretation of Coefficients Solution
Slope (b1) # Responses to Ad is expected to increase
by .2049 (20.49) for each 1 sq. in. increase in Ad Size holding Circulation constant
Slope (b2) # Responses to Ad is expected to increase
by .2805 (28.05) for each 1 unit (1,000) increase in Circulation holding Ad Size constant
11 - 70
© 1997 Prentice-Hall, Inc.
Evaluating the ModelEvaluating the Model
11 - 71
© 1997 Prentice-Hall, Inc.
Regression Modeling Steps
Regression Modeling Steps
Define problem or question Specify model Collect data Do descriptive data analysis Estimate unknown parameters
Evaluate model Use model for prediction
FF
11 - 72
© 1997 Prentice-Hall, Inc.
Evaluating Multiple Regression Model Steps
Evaluating Multiple Regression Model Steps
Examine variation measures Do residual analysis Test parameter significance
Overall model Portions of model Individual coefficients
Test for multicollinearity
11 - 73
© 1997 Prentice-Hall, Inc.
Coef. of DeterminationExcel Output
Coef. of DeterminationExcel Output
SYX
r2Y.12
r2adj means 95.61%
of variation in Y is due to Ad Size & Circulation
11 - 74
© 1997 Prentice-Hall, Inc.
Coefficient of Partial Determination
Coefficient of Partial Determination
Proportion of variation in Y ‘explained’ by variable XP holding all others constant
Must estimate separate models Denoted r2
Y1.2 in two X variables case Coefficient of partial determination of X1
with Y holding X2 constant
Useful in selecting X variables
11 - 75
© 1997 Prentice-Hall, Inc.
rSSR X X
SST SSR X X SSR X XY 1 22 1 2
1 2 1 2
10124
9 5000 9 2497 101240 80
.(
( (
.
. . ..
)
and ) )
a fa f a f a f
rSSR X X
SST SSR X X SSR X XY 1 22 1 2
1 2 1 2
10124
9 5000 9 2497 101240 80
.(
( (
.
. . ..
)
and ) )
a fa f a f a f
ANOVAdf SS
Regression 1 8.2373Residual 4 1.2627Total 5 9.5000
ANOVAdf SS
Regression 1 8.2373Residual 4 1.2627Total 5 9.5000
r 2Y1.2
Excel Outputr 2
Y1.2
Excel Output
Y b b X b Xi i i 0 1 1 2 2Y b b X b Xi i i 0 1 1 2 2
Y b b Xi i 0 2 2Y b b Xi i 0 2 2
ANOVAANOVAdfdf SSSS
RegressionRegression 22 9.24979.2497ResidualResidual 33 0.25030.2503TotalTotal 55 9.50009.5000
11 - 76
© 1997 Prentice-Hall, Inc.
Testing ParametersTesting Parameters
11 - 77
© 1997 Prentice-Hall, Inc.
Evaluating Multiple Regression Model Steps
Evaluating Multiple Regression Model Steps
Examine variation measures Do residual analysis
Test parameter significance Overall model Portions of model Individual coefficients
Test for multicollinearity
New!New!Ne
w!New!
New!New!
Expan
ded!
Expan
ded!
F
11 - 78
© 1997 Prentice-Hall, Inc.
Testing Overall Significance
Testing Overall Significance
Shows if there is a linear relationship between all X variables together & Y
Uses F test statistic Hypotheses
H0: b1 = b2 = ... = bP = 0 No linear relationship
H1: At least one coefficient is not 0 At least one X variable affects Y
11 - 79
© 1997 Prentice-Hall, Inc.
Overall SignificanceExcel Output
Overall SignificanceExcel Output
n - 1n - 1 P-value
P n - P -1 MSR / MSE
11 - 80
© 1997 Prentice-Hall, Inc.
Examines the contribution of a set of X variables to the relationship with Y
Null hypothesis: Variables in set do not improve
significantly the model when all other variables are included
Must estimate separate models Used in selecting X variables
Testing Model PortionsTesting Model Portions
11 - 81
© 1997 Prentice-Hall, Inc.
Testing Model Portions Test Statistic
Testing Model Portions Test Statistic
FSSR X X SSR X
MSE X X
( ((
1 2 2
1 2
and ) - ) and )
FSSR X X SSR X
MSE X X
( ((
1 2 2
1 2
and ) - ) and )
From ANOVA section of regression forY b b X b Xi i i 0 1 1 2 2Y b b X b Xi i i 0 1 1 2 2
Y b b Xi i 0 2 2Y b b Xi i 0 2 2
From ANOVA section of regression for
Test H0: b1 = 0 in a 2 variable model
11 - 82
© 1997 Prentice-Hall, Inc.
MulticollinearityMulticollinearity
11 - 83
© 1997 Prentice-Hall, Inc.
Evaluating Multiple Regression Model Steps
Evaluating Multiple Regression Model Steps
Examine variation measures Do residual analysis Test parameter significance
Overall model Portions of model Individual coefficients
Test for multicollinearity
New!New!Ne
w!New!
New!New!
Expan
ded!
Expan
ded!
F
11 - 84
© 1997 Prentice-Hall, Inc.
MulticollinearityMulticollinearity
High correlation between X variables Coefficients measure combined effect Leads to unstable coefficients
depending on X variables in model Always exists; matter of degree Example: Using both Sales & Profit as
explanatory variables in same model
11 - 85
© 1997 Prentice-Hall, Inc.
Detecting Multicollinearity
Detecting Multicollinearity
Examine correlation matrix Correlations between pairs of X variables
are more than with Y variable
Examine variance inflation factor (VIF) If VIFj > 5, multicollinearity exists
Few remedies Obtain new sample data Eliminate one correlated X variable
11 - 86
© 1997 Prentice-Hall, Inc.
Correlation Matrix Excel Output
Correlation Matrix Excel Output
rY2 r12
rY1
11 - 87
© 1997 Prentice-Hall, Inc.
VIF Excel Output
VIF Excel Output
VIFr
VIF
1
1
11 5493
2 2190
5
2 ..
so no multicollinearity
VIFr
VIF
1
1
11 5493
2 2190
5
2 ..
so no multicollinearity
Regress X1
on X2
11 - 88
© 1997 Prentice-Hall, Inc.
This Class...This Class...
What was the most important thing you learned in class today?
What do you still have questions about?
How can today’s class be improved?
Please take a moment to answer the following questions in writing: