regression ii dr. rahim mahmoudvand department of statistics, bu-ali sina university...
TRANSCRIPT
Regression II
Dr. Rahim Mahmoudvand
Department of Statistics,
Bu-Ali Sina University
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 11
Chapter 4Chapter 4
Model Adequacy
Checking
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 22
Ch4: Partial Regression Plot:Ch4: Partial Regression Plot: Definition and UsageDefinition and Usage
Is a curvature effect for the regressor needed in the model?
A partial regression plot is a variation of the plot of residual versus the residual.
This plot evaluate whether we have specified the relationship between the response and the regressor variables correctly.
This plot study the marginal relationship of a regressor given the other variables that are in the model.
The partial residual plot is called the added-variable plot or the adjusted variable plot, too.
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 33
Ch4: Partial Regression Plot: Ch4: Partial Regression Plot: The way of working with exampleThe way of working with example
In this plot, the response variable y and the regressor xj are both regressed against the other regressors in the model and the residuals obtained for each regression. The plot of these residuals against each other provides information about the nature of the marginal relationship for regressor xj under consideration.
Example: Example:
2 0 1 2
2 2
ˆ ˆˆ ( )
ˆ ( ) , 1, 2,...,
i i
i i i
y x x
e y x y y x i n
1 2 0 1 2
1 2 1 1 2
ˆ ˆˆ ( )
ˆ ( ) , 1,2,...,
i i
i i i
x x x
e x x x x x i n
0 1 1 2 2 ; 1, 2,...,i i i iy x x i n
Y is regressed on x2
x1 is regressed on x2
Regression II; Bu-Ali Sina UniversityFall 2014 4
Ch4: Partial Regression Plot:Ch4: Partial Regression Plot: Interpretation of plotInterpretation of plot
2ie y x
1 2ie x x
2ie y x 2ie y x
1 2ie x x 1 2ie x x
• Regressor x1 enters the
model linearly,
• Line through from the
origin and slop of the
line is equal to 1
• Higher order term in x1
such as x12 is required.
• Transformation such as
replacing x1 with 1/ x1 is
required
• there is no additional useful information in x1 for predicting y.
Regression II; Bu-Ali Sina UniversityFall 2014 5
Ch4: Partial Regression Plot: Ch4: Partial Regression Plot: Relationship among ResidualsRelationship among Residuals
Consider model . Denoting
We have:
( )
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
0
( ) ( )
j j
j j j j j j j
j j j j j j j
e x X
j j j j
e Y X I H Y I H X x
I H X I H x I H
e x X I H
( ) ( )j j j je x X I H x
Y X
Y is regressed on X(j)
xj is regressed on X(j)
( ) 1 11j j jX x x
( ) ( )j j j jY X x
Regression II; Bu-Ali Sina UniversityFall 2014 6
Ch4: Partial Regression Plot: Ch4: Partial Regression Plot: shortcomingshortcoming
• These plots may not give information about the proper form of the relationship if several variables already in the model are incorrectly specified.
• Partial regression plots will not, in general, detect interaction effects among the regressors.
• The presence of strong multicollinearity can cause partial regression plots to give incorrect information about the relationship between the response and the regressor variables.
Regression II; Bu-Ali Sina UniversityFall 2014 7
Ch4: Ch4: Partial Residual Plots: Definition and usageCh4: Ch4: Partial Residual Plots: Definition and usage
• Partial residual plot is closely related to the partial regression plot
• A partial residual plot is a variation of the plot of residuals versus the predictor.
• It is designed to show the relationship between the response variable and the regressors
Regression II; Bu-Ali Sina UniversityFall 2014 8
Ch4: Partial residual Plot:Ch4: Partial residual Plot: computation of partial residuals computation of partial residuals
Consider model . Denoting
We have:
Partial residual is defined and calculated by:
,( ) ( )
,( ) ( )
ˆ ˆˆ
ˆ ˆi i i i i j j ij j
i i j j i ij j
e y y y X x
y X e x
Y X
,( ) , 1 , 11i j i j i jX x x
,( ) ( )i i j j ij j iy X x
*,( ) ( )
ˆ ˆi j i i j j i ij je y x y X e x
Regression II; Bu-Ali Sina UniversityFall 2014 9
Ch4: Partial Residual Plot:Ch4: Partial Residual Plot: Interpretation of plotInterpretation of plot
*i je y x
ijx
*i je y x *
i je y x
ijxijx
• Regressor xj enters the
model linearly,
• Line through from the
origin and slop of the
line is equal to ˆ
j
• Higher order term in xij
such as xij2 is required.
• Transformation such as
replacing xij with 1/ xij is
required
• there is no additional useful information in xij for predicting y.
Regression II; Bu-Ali Sina UniversityFall 2014 10
Ch4: Other Plots: Regressor versus regressorCh4: Other Plots: Regressor versus regressor
• Scatterplot of regressor xi against regressor xj :• Is useful in studying the relationship between regressor variables,
• Is useful in detecting multicollinearity
Regression II; Bu-Ali Sina UniversityFall 2014 11
jx
ix
jx
ix
• There is one unusual observation with respect to xj
• There is one unusual observation with respect to xj
• There is one unusual observation with respect to both sides
• There is one unusual observation with respect to both sides
Ch4: Other Plots: Response versus regressorCh4: Other Plots: Response versus regressor
• Scatterplot of response y against regressor xj :• Is useful in distinguishing the type of points
Regression II; Bu-Ali Sina UniversityFall 2014 12
y
ix
y
ix
y
ix
• Influential point,• Outlier in x space,• Prediction variance for
this point is large,• Residual variance for
this point is small.
• Influential point,• Outlier in x space,• Prediction variance for
this point is large,• Residual variance for
this point is small.
• Influential point,• Outlier in y direction• Influential point,• Outlier in y direction
• leverage point,• Outlier in both sides,• Prediction variance for
this point is large,• Residual variance for
this point is small.
• leverage point,• Outlier in both sides,• Prediction variance for
this point is large,• Residual variance for
this point is small.
Ch4: PRESS Statistic:Ch4: PRESS Statistic: computation and usage computation and usage
• PRESS is generally regarded as a measure of how well a regression model
will perform in predicting new data.
• A model with a small value of PRESS is desired.
• PRESS residuals are:
and accordingly PRESS statistic is defined as follows:
• R2 for prediction based on PRESS statistic:
( ) ( )ˆ , 1,2,..., .1
ii i i
ii
ee y y i n
h
2( )
1
PRESS=n
ii
e
Regression II; Bu-Ali Sina UniversityFall 2014 13
2 PRESSR 1-
TSS
Ch4: PRESS Statistic:Ch4: PRESS Statistic: Interpretation with an example Interpretation with an example
Regression II; Bu-Ali Sina UniversityFall 2014 14
Ch4: PRESS Statistic: Ch4: PRESS Statistic: Interpretation with an exampleInterpretation with an example
Regression II; Bu-Ali Sina UniversityFall 2014 15
2 2( ) (9)
1
PRESS= 459 , 218n
ii
e e
2 459
R -PRESS=1- 0.92095784
2 233.73R =1- 0.9596
5784
y is regressed on x1
2( )
1
PRESS= 733.55n
ii
e
y is regressed on x1 , x2
2( )
1
PRESS= 459n
ii
e
So, model including both x1 and x2 is better than model with only x1 is included.
Ch4: Ch4: Detection and Treatment of Outliers:Detection and Treatment of Outliers: Tools and methods Tools and methods
Recall that, an outlier is an extreme observation; one that is considerably different from the majority of the data.Detection tools:
ResidualsScaled residualsDoing statistical test
Outliers can be categorized to:Bad values, occurring as a result of unusual but explainable events; such as faulty measurement or analysis, incorrect recording of data and failure of a measurement instrument;Normally observed values, such as leverage and influential observations.
TreatmentRemove bad valuesFollow-up analysis of outliers; this may help us to improve process or results in new knowledge concerning factors whose effect on the response was previously unknown.The effect of outliers may be checked easily by dropping these points and refitting the regression equation.
Regression II; Bu-Ali Sina UniversityFall 2014 16
Ch4Ch4: Detection and Treatment of Outliers:: Detection and Treatment of Outliers: Example 1 (Rocket data) Example 1 (Rocket data)
Regression II; Bu-Ali Sina UniversityFall 2014 17
Quantity Obs 5 & 6 IN Obs 5 & 6 out
intercept 2627.82 2658.97
Slope -37.15 -37.69
R2 0.9018 0.9578
MSRes 9244.59 3964.63
Ch4: Detection and Treatment of Outliers:Ch4: Detection and Treatment of Outliers: Example 2 Example 2
Regression II; Bu-Ali Sina UniversityFall 2014 18
Country Cigarette Deaths
Australia 480 180
Canada 500 150
Denmark 380 170
Finland 1100 350
UK 1100 460
Iceland 230 60
Netherlands 490 240
Norway 250 90
Sweden 300 110
Switzerland 510 250
USA 1300 200
Regression with all the data:The regression equation isy = 67.6 + 0.228 xR-Sq = 54.4%
Regression without the USAThe regression equation isy = 9.1 + 0.369 xR-Sq = 88.9%
Ch4: Lack of fit of the regression model:Ch4: Lack of fit of the regression model: What is meant? What is meant?
All models are wrong; some models are useful All models are wrong; some models are useful (George Box)(George Box)
In the simple linear regression model if we have n distinct data points we can always fit a polynomial of order up to n-1. In the process what we claim to be random error is actually a systematic departure as the result of not fitting enough terms.
Regression II; Bu-Ali Sina UniversityFall 2014 19
y
ix
y
ix
• Perfect linear fitting is always possible when we have two distinct points.
• Perfect linear fitting is always possible when we have two distinct points.
• Perfect linear fitting is not possible in general when we have three (and more) distinct points.
• Perfect linear fitting is not possible in general when we have three (and more) distinct points.
Ch4: Lack of fit of the regression model:Ch4: Lack of fit of the regression model: A formal test A formal test
This test assumes that normality, independence and constant variance requirements are met and
Only first order or straight line character of the relationship is in doubt. To do this test, we have to replicate observations on response y for at least
one level of x. These new data can provide a model-independent estimate of 2.
Regression II; Bu-Ali Sina UniversityFall 2014 20
ijx
iy
• Straight line fit is not satisfactory• Straight line fit is not
satisfactory
Ch4: Lack of fit of the regression model:Ch4: Lack of fit of the regression model: A formal test A formal test
Let yij= jth observation on the response at xi ; j=1,…, ni ; i=1,….,m.
So, we have n=ni observations and we can write:
Regression II; Bu-Ali Sina UniversityFall 2014 21
1
2
11 1
1 1
21 2
2 2
1
1
,
1m
n
n
mm
mmn
y x
y x
y x
Y Xy x
xy
xy
0 1 ; 1,..., , 1,...,ij i ij iy x j n i m
Considering a linear regression:
0 1
1 10 1 1
1 1 1
ˆ ˆˆ ˆ ; 1,..., , 1,...,
( )1 1ˆ ˆ ˆ;
i
i
ij i i i
nm
i ijnm mi j
ij i ii j i xx
y y x j n i m
x x y y
y n xn n S
0 1
2
,1 1
mininm
iji j
Ch4: Lack of fit of the regression model:Ch4: Lack of fit of the regression model: A formal test A formal test
We have:
Accordingly, we get:
Regression II; Bu-Ali Sina UniversityFall 2014 22
ˆ ˆij ij i ij i i ie y y y y y y
22
Re1 1 1 1
2 2
1 1 1 1 1 1
2 2
1 1 1 1 1 1
0
ˆ
ˆ ˆ2
ˆ ˆ2
i i
i i i
i i i
n nm m
s ij ij i i ii j i j
n n nm m m
ij i i i ij i i ii j i j i j
n n nm m m
ij i i i i i ij ii j i j i j
SS e y y y y
y y y y y y y y
y y y y y y y y
Ch4: Lack of fit of the regression model:Ch4: Lack of fit of the regression model: A formal test A formal test
Accordingly:
Regression II; Bu-Ali Sina UniversityFall 2014 23
LOFPE
2 2
Re1 1 1
ˆinm m
s ij i i i ii j i
SSSS
SS y y n y y
1 1
. 1 ;m m
i ii i
d f n n m n n
• If the assumption of constant variance is satisfied, SSPE is a model independent measure of pure error.
• Degree of freedom for SSPE is
• Note also that:
where Si2
is the variance of response at level xi.
• If the assumption of constant variance is satisfied, SSPE is a model independent measure of pure error.
• Degree of freedom for SSPE is
• Note also that:
where Si2
is the variance of response at level xi.
• If the fitted value are close to the corresponding average response then there is a strong indication that the regression function is linear.
• Note that:
• If the fitted value are close to the corresponding average response then there is a strong indication that the regression function is linear.
• Note that:
2PE
1
1m
i ii
SS n S
1
1
11 1 1 1
1 ˆˆ;
( )1 ˆ;
i
i i
n
i ij i iji
n nm mi
ij iji j i jxx
y y y y x xn
x xy y y
n S
Ch4: Lack of fit of the regression model:Ch4: Lack of fit of the regression model: A formal test A formal test
It is well known that E(Si2)=2 and so we get:
But for the SSLOF we have:
Regression II; Bu-Ali Sina UniversityFall 2014 24
2 2PE
1
1 ( )m
i ii
E SS n E S n m
2 2LOF
1 1
222
0 11
222
0 11 1
220 1
1
ˆ ˆ ˆvar
1 1( )
1 ( )
2 ( )
m m
i i i i i i i ii i
mi
i i ii i xx
m mi ii
i i ii ixx
m
i i ii
E SS n E y y n y y E y y
x xn E y x
n n S
n x xnn E y x
n S
m n E y x
2
2
2
1ˆvar( )
ˆ ˆvar( ) ;cov , var( )
ii
xx
i i i ii
x xy
n S
y y y yn
Ch4: Lack of fit of the regression model:Ch4: Lack of fit of the regression model: A formal test A formal test
An unbiased estimation of variance can be obtained by:
Moreover, we have:
So, the ratio
can be considered as a statistics for testing the linearity assumption in the linear regression model. It can be seen that F0 follows a Fm-2,n-m and therefor Regression function is not linear if F0> Fm-2,n-m,1-
Regression II; Bu-Ali Sina UniversityFall 2014 25
2PEPE
SSE MS E
n m
2
0 12LOF 1
LOF
( )
2 2
m
i i ii
n E y xSS
E MS Em m
PE0
LOF
MSF
MS
Ch4: Ch4: Lack of fit of the regression model:Lack of fit of the regression model: limitation and solutions limitation and solutions
limitations Ideally, we find that the F ratio for lack of fit is not significant, and the hypothesis of
significance of regression is rejected. Unfortunately, this does not guarantee that the model will be satisfactory as a prediction
equation. The model may have been fitted to error only.
Solutions Regression model is to be useful as a predictor when F ratio is at least four or five times
of critical value from F table, Comparing the range of the fitted value s to their average standard error. In order to do
this we can use the following measure for average standard error
Where is a model independent estimate of the error variance.
Regression II; Bu-Ali Sina UniversityFall 2014 26
2
1
ˆ11ˆ ˆˆ ˆvar var
n
i ii
ky y
n n
2
Ch4: Lack of fit of the regression model:Ch4: Lack of fit of the regression model: multiple version multiple version
repeat observations do not often occur in multiple regression According to a solution we are searching for points in x space that are
near neighbors, that is, sets of observations that have been taken with nearly identical levels of x1 , x2 , …, xk.
As a measure of the distance between any two points, for example
will use the weighted sum of squared distance (WSSD):
Pairs of point that have small Dii’2
are near neighbors
The residuals at two points with a small value of Dii’2
can be used to obtain an estimate of pure error.
Regression II; Bu-Ali Sina UniversityFall 2014 27
1 1,..., ; ,...,i ik i i kx x x x
2
2
1 Re
ˆkj ij i j
iij s
x xD
MS
Ch4: Lack of fit of the regression model:Ch4: Lack of fit of the regression model: multiple version multiple version
There is a relationship between the range of a sample from a normal population and the population standard deviation. For samples of size 2, this relationship is (Exercise)
An algorithm for sample size greater than 2 is: First arrange the data points xi1, xi2, …, xik in order of increasing
Compute the values of Dii’2 for all n – 1 pairs of points with adjacent values of
Repeat this calculation for the pairs of points separated by one, two, and three intermediate values . This will produce 4n-10 values of Dii’
2 Arrange above values in ascending order. Let Eu for u=1,…, 4n-10 denote the range of residuals at these points and
calculate an estimate of standard deviation of pure error by:
E1, E2 ,…, Em, are residuals associated with the m smallest values of Dii’2
Regression II; Bu-Ali Sina UniversityFall 2014 28
1ˆ 1.128 0.886 ; i iE E E e e
ˆ iy
ˆ iy
1
0.886ˆ
m
uu
Em
Chapter 5Chapter 5
Methods to Correct
Model Inadequacy
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 2929
Ch 5: Transformation and WeightingCh 5: Transformation and Weighting
Main assumption in model Y=X+ are:
E()=0 , Var()= 2 ,
N(0, 2 ) ,
Form of X , that has used in the model, is correct.
We use residuals analysis to detect violation from these basic
assumptions. In this chapter, we focus on methods and procedures
for building regression models when some of the above assumptions
are violated.
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 3030
Ch5: Transformation and Weighting: Ch5: Transformation and Weighting: Problems?Problems?
Error variance is not constant
Relationship between y and regressors is not linear.
Solutions
Transformation: Using transformed data and stabilize variance.
Weighting: Using weighted least square.
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 3131
Parameter estimators are unbiased Parameter estimators are not BLUE
Ch5: Transformation: Ch5: Transformation: Stabilizing varianceStabilizing variance
Let Var(Y)=c2.[E(Y)]h . In this case we have:
Example 1: Poisson data, Var(Yi)=E(Yi)
Example 2: Inverse-Gaussian data, Var(Yi)=E3(Yi)
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 3232
1
2
22 2 20 1
1
( ) .. ( ) .
hi i i i i
i ih h h hi ii i i i
y y y y yy y
cVar Y c yc E Y c x c y
1
2
0 1( ) ( )i i i i i
i i
i ii i i i
y y y y yy y
Var Y yE Y x y
1
2
3 3 3 3
0 1( ) ( )
i i i i ii i
i ii i i i
y y y y yy y
Var Y yE Y x y
Ch5: Transformation:Ch5: Transformation: Stabilizing variance Stabilizing variance
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 3333
Relationship of 2 to E(Y) Transformation
2 constant Y’=Y (no transformation)
2 E(Y) Y’=Y1/2 (square root; Poisson data)
2 E(Y)[1-E(Y)] Y’=Arcsine(Y1/2 ) (Binomial data)
2 [E(Y)]2 Y’=log(Y) (Gamma distribution)
2 [E(Y)]3 Y’=Y-1/2 (Inverse-Gaussian)
2 [E(Y)]4 Y’=Y-1
Ch5: Transformation for stabilizing variance: Ch5: Transformation for stabilizing variance: limitationslimitations
Note that the predicted values are in the transformed scale, so:
Applying the inverse transformation directly to the predicted values
gives an estimate of the median of the distribution of the response
instead of the mean.
Confidence or prediction intervals may be directly converted from one
metric to another. However, there is no assurance that the resulting
intervals in the original units are the shortest possible intervals
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 3434
Ch5: Transformation: Ch5: Transformation: Linearizing modelLinearizing model
The linearity assumption is the usual starting point in regression analysis.
Occasionally we find that this assumption is inappropriate.
Nonlinearity may be detected via the
lack-of-fit test,
from scatter diagrams, the matrix of scatterplots,
residual plots such as the partial regression plot,
Prior experience or theoretical considerations .
In some cases a nonlinear function can be linearized by using a suitable
transformation. Such nonlinear models are called intrinsically or
transformably linear.
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 3535
Ch5: Transformation: Ch5: Transformation: Linearizing modelLinearizing model
Example 1:
This function is intrinsically linear since it can be transformed to a straight line by a logarithmic transformation
Example 2:
This function can be linearized by using the reciprocal transformation x’=1/x:
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 3636
10
xy e
0 1
0 1
log log logy x
y x
0 1
1y
x
0 1y x
Ch5: Transformation:Ch5: Transformation: Linearizing model Linearizing model
When transformations such as those described above are employed, the least –squares estimator has least-squares properties with respect to the transformed data, not the original data.
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 3737
Linearizable function Transformation
Y=0exp(1x) Y’=log (Y)
Y=0 x1 Y’=log(Y) and x’=log(x)
Y=0+1log(x) x’=log(x)
Y=0/(0x-1) X’=1/x and Y’=1/Y
Ch5: Transformation: Ch5: Transformation: Analytical methodAnalytical method
Transformation on Y
Power transformation
Box-Cox method
Where is the geometric mean of the observations. Then fit model
Y() =X+
The maximum-likelihood estimate of λ corresponds to the value of λ for
which the residual sum of squares from the fitted model SSRes(λ) is a
minimum
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 3838
( ) 1
1; 0
log ( ) ; 0G
G
y
y y
y y
Gy
Ch5: Transformation: Ch5: Transformation: Analytical methodAnalytical method
Obtaining suitable value for λ is easy by plotting λ versus SSRes(λ) for
some possible values of λ (usually between (-3 , +3))
SSRes(λ)
λ
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 3939
Ch5: Transformation: Ch5: Transformation: Analytical method with regressorsAnalytical method with regressors
Suppose that the relationship between y and one or more of the regressor
variables is nonlinear but that the usual assumptions of normally and
independently distributed responses with constant variance are at least
approximately satisfied.
Assume E(Y)=0+1Z where
Assuming 0, we expand about 0= in a Taylor series and ignore terms of
higher than first order:
E(Y)=0+1x0 +(- 0) 1x
0.log(x)=0+1x1+2.x2
Where, 0= 0 , 1=1 , 2=(- 0) 1 and x1=x0 , x2=x0.log(x).
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 4040
; 0
log ( ) ; 0
xZ
x
Ch5: Transformation: Ch5: Transformation: Analytical method with regressorsAnalytical method with regressors
Now use the following algorithm (By Box-Tidwell, 1962):
1.Fit model E(Y)=0+1x and find least square estimates of 0 and 1
2.Fit model E(Y)=0+1x+2x.log(x) and find least square estimates of 0 , 1
and 2.
3.Applying equality provide an updated
4.Set x=xi and repeat steps 1-3 again.
5.apply the above algorithm until a small difference among i and i-1.
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 4141
2 1 1ˆ i i 21
1
ˆˆi i
(index i is for the repeat in the algorithm and 0=1)(index i is for the repeat in the algorithm and 0=1)
Ch5: Transformation: Ch5: Transformation: Analytical method with regressorsAnalytical method with regressors
Example:
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 4242
ŷ=0.1309+0.2411 x
ŷ =-2.4168+1.5344 x-0.462 x log(x)
1=-0.462/0.2411+1=-0.92
ŷ=0.1309+0.2411 x
ŷ =-2.4168+1.5344 x-0.462 x log(x)
1=-0.462/0.2411+1=-0.92
ŷ =3.1039-6.6874 x’
ŷ =3.2409-6.445 x+0.5994 x’ log(x’)
2=0.5994/-6.6874-0.92=-1.01
ŷ =3.1039-6.6874 x’
ŷ =3.2409-6.445 x+0.5994 x’ log(x’)
2=0.5994/-6.6874-0.92=-1.01
x’=x-0.92
Ch5: Transformation: Ch5: Transformation: Analytical method with regressorsAnalytical method with regressors
Example:
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 4343
Model 1 (blue and solid line in the graph)
ŷ=0.1309+0.2411 x
R2=0.87
Model 1 (blue and solid line in the graph)
ŷ=0.1309+0.2411 x
R2=0.87
Model2 (Red and dotted line in the graph)
ŷ =2.9650-6.9693/x
R2=0.980
Model2 (Red and dotted line in the graph)
ŷ =2.9650-6.9693/x
R2=0.980
Ch5: Ch5: Generalized least square:Generalized least square: Covariance matrix is Covariance matrix is nonsingularnonsingular
Consider the model Y=X+ with the following assumptions:
E()=0 ,
Var()= 2 V ,
where V is a nonsingular square matrix.
We will approach this problem by transforming the model to a new set of
observations that satisfy the standard least-squares assumptions.
Then we will use ordinary least squares on the transformed data.
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 4444
Ch5: Generalized least square:Ch5: Generalized least square: Covariance matrix is nonsingularCovariance matrix is nonsingular
Since V is nonsingular and positive definite, we can write
V=K’K=KK
where K is a nonsingular symmetry square matrix.
Define the new variables:
Z=K-1Y ; B=K-1X ; g=K-1
Multiplying both sides of original regression model by K-1 gives:
Z=B+g
This new transformed model has following properties:
E(g)=K-1E()=0 ; Var(g)=E{[(g-E(g)]’[(g-E(g)]}=E(g’g)=K-1E(’) K-1
= K-1 2 V K-1= 2 K-1 KK K-1= 2
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 4545
Ch5: Generalized least square:Ch5: Generalized least square: Covariance matrix is nonsingularCovariance matrix is nonsingular
So, in this transformed model, the error terms g has zero mean and constant variance and uncorrelated. In this model:
This estimator is called Generalized least square estimator of . We have easily:
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 4646
11 1 1 1 1
11 1 1 1
11 1
ˆ ( ) ( )B B B Z K X K X K X K Y
X K K X X K K Y
X V X X V Y
11 1
11 1
11 1
ˆE E X V X X V Y
X V X X V E Y
X V X X V X
12
12 1
ˆvar B B
X V X
Ch5: Generalized least square:Ch5: Generalized least square: Covariance matrix is diagonalCovariance matrix is diagonal
When the errors ε are uncorrelated but have unequal variances so that the covariance matrix of ε is:
the estimation procedure is usually called weighted least squares. Let W=V-1. Then we have:
Which is called the weighted least-squares estimator. Note that observations with large variances will have smaller weights than observations with small variances
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 4747
1
2
2 2 2
1
1
10 0 0
10 0 0
1 1diag , , 0 0 0 0
1
10 0 0
n
n
n
w
w
Vw w
w
w
1ˆ X WX X WY
Ch5: Generalized least square:Ch5: Generalized least square: Covariance matrix is diagonalCovariance matrix is diagonal
For the case of simple linear regression, the weighted least-squares function is
Getting derivative with respect to 0 and 1 the resulting least-squares normal equations would become:
Exercise: Show the solutions of the above system is coincide with general formula, stated in the previous page.
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 4848
2
0 1 0 11
( , )n
i i ii
S w y x
0 11 1 1
20 1
1 1 1
n n n
i i i i ii i i
n n n
i i i i i i ii i i
w w x w y
w x w x w x y
Chapter 6Chapter 6
Diagnostics for
Leverage and
Influence Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 4949
Ch6: Diagnostics for Leverage and influenceCh6: Diagnostics for Leverage and influence
In this chapter, we present several diagnostics for leverage and influence.
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 5050
y
ix
y
ix
• It has a noticeable impact on the model coefficients.
• It has a noticeable impact on the model coefficients.
• This point does not affect the estimates of the regression coefficients
• It has a dramatic effect on the model summary statistics such as R2 and the standard error of the regression coefficients
• This point does not affect the estimates of the regression coefficients
• It has a dramatic effect on the model summary statistics such as R2 and the standard error of the regression coefficients
Ch6: Diagnostics for Leverage and influence: Ch6: Diagnostics for Leverage and influence: importanceimportance
A regression coefficient may have a sign that does not make
engineering or scientific sense,
A regressor known to be important may be statistically insignificant,
A model that fits the data well and that is logical from an
application–environment perspective may produce poor predictions.
These situations may be the result of one or perhaps a few influential
observations. Finding these observations then can shed considerable
light on the problems with the model.
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 5151
Ch6: Diagnostics for Leverage and influence: Ch6: Diagnostics for Leverage and influence: LeverageLeverage
The only measure is hat matrix. The hat matrix diagonal is a standardized measure of the distance of the ith observation from the center (or centroid) of the x space. Thus, large hat diagonals reveal observations that are leverage points because they are remote in x space from the rest of the sample.
Two problem with this rule:
2(K+1)>n, in this case the cut off does not apply,
Leverage points are potentially influential.
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 5252
If hii>2¯h =2(K+1)/n then i
th observation is leverage
If hii>2¯h =2(K+1)/n then i
th observation is leverage
Ch6: Diagnostics for Leverage and influence: Ch6: Diagnostics for Leverage and influence: Measures of Measures of influenceinfluence
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 5353
Measure Formula Rules
Cook’s D
• Di is not an F statistics but practically it can be compared with F,K+1,n-K-1,
• We consider Points with Di>1 to be influence.
DFBETASIf |DFBETTASj,i|>2/n then ith observation warrants examination
DFFITSIf |DFFITSi|>2(K+1)/n then ith observation warrants attention
( ) ( )
Re
ˆ ˆ ˆ ˆ
( 1)
i i
is
X XD
K MS
( ), 2
( )
ˆ ˆj j i
j i
i jj
DFBETASS C
( )
2( )
ˆ ˆi ii
i ii
y yDFFITS
S h
Ch6: Diagnostics for Leverage and influence: Ch6: Diagnostics for Leverage and influence: Cook’s DCook’s D
There are several equivalent formulas (Exercise)
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 5454
22ˆvar 1
1 var 1 1ii ii
i ii ii
yr hD r
K e K h
This term is big if case i is unusual in y-directionThis term is big if case i is unusual in y-direction
This term is big if case i is unusual in x-directionThis term is big if case i is unusual in x-direction
( ) ( )
Re
ˆ ˆ ˆ ˆ
( 1)i i
is
y y y yD
K MS
It is the squared EuclideanDistance that the vector of fitted values moves when the ith observation is deleted
It is the squared EuclideanDistance that the vector of fitted values moves when the ith observation is deleted
Ch6: Diagnostics for Leverage and influence: Ch6: Diagnostics for Leverage and influence: DFBETASDFBETAS
There is an interesting computational formula for DFBETAS.
Let R=(X’X)-1X’ and r’j =[rj,1, rj,2,…,rj,n] denotes the jth row of R. Then, we can
write (Exercise)
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 5555
,, 1
j i ij i
iij j
r tDFBETAS
h
r r
Is a measure of the impact of the ith observation on ˆjIs a measure of the impact
of the ith observation on ˆjThis term is big if case i is unusual in both sidesThis term is big if case i is unusual in both sides
Ch6: Diagnostics for Leverage and influence: Ch6: Diagnostics for Leverage and influence: DFFITSDFFITS
DFFITSi is the number of standard deviations that the fitted value i changes if observation i is removed. Computationally we may find (Exersice)
Note that, However, if hii≈0, the effect of R-student will be moderated. Similarly a near-zero R-student combined with a high leverage point could produce a small value of DFFITS.
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 5656
1ii
i iii
hDFFITS t
h
Is the leverage of the ith observation.
Is the leverage of the ith observation.
This term is big if case i is an outlier
This term is big if case i is an outlier
Ch6: Ch6: Diagnostics for Leverage and influence: Diagnostics for Leverage and influence: A measure of model performanceA measure of model performance
The diagnostics Di , DFBETASj,i , and DFFITSi provide insight about the effect of observations on the estimated coefficients j and fitted values i. They do not provide any information about overall precision of estimation.
Since it is fairly common practice to use the determinant of the covariance matrix as a convenient scalar measure of precision, called the generalized variance, we could define the generalized variance of as
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 5757
12ˆ ˆvarGV X X
Ch6: Diagnostics for Leverage and influence: Ch6: Diagnostics for Leverage and influence: A measure of model A measure of model performanceperformance
To express the role of the ith observation on the precision of estimation, we could define
Clearly if COVRATIOi > 1, the ith observation improves the precision of estimation, while if COVRATIOi < 1, inclusion of the ith point degrades precision. Computationally (Exercise):
Cutoff value for COVRATIO is not easy, but researchers suggest that if COVRATIOi > 1 + 3(K+1)/n or if COVRATIOi < 1 – 3(K+1)/n, then the ith point should be considered influential.
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 5858
1 2( ) ( ) ( )
1
Re
; 1, 2,...,i i i
i
s
X X SCOVRATIO i n
X X MS
12( )
1Re
1
1
K
i
i Ks ii
SCOVRATIO
MS h
Chapter 7Chapter 7
Polynomial
Regression Models
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 5959
Ch 7: Polynomial regression modelsCh 7: Polynomial regression models
Is a subclass of multiple regression, Example 1: the second-order polynomial in one variable Y=0+1x+ 2x2+
Example 2: the second-order polynomial in two variables Y=0+1x+ 2x2+ 11x12 +22x2
2 +12x1x2 +
Polynomials are widely used in situations where the response is
curvilinear,
Complex nonlinear relationships can be adequately modeled by
polynomials over reasonably small ranges of the x’s.
This chapter will survey several problems and issues associated with fitting polynomials.
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 6060
Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: in one variables in one variables
In general, the kth-order polynomial model in one variable is
Y=0+1x+ 2x2+ … +kxk+
If we set xj = xj, j = 1, 2,.. ., k, then the above model becomes a multiple linear regression
model in the k regressors x1 , x2 ,. .. xk. Thus, a polynomial model of order k may be fitted
using the techniques studied previously.
Set E(Y|X=x)=g(x) be an unknown function. Using Taylor series expansion:
So, the polynomial models are also useful as approximating functions to unknown
and possibly very complex nonlinear relationships.
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 6161
( ) ( )
0 0
( ) ( ) ( )! !
m mkm m
m j
x a x aY g x g a g a
m m
Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: in one variables in one variables
Example (Second-order model or Quadratic model):
Y=0+1x+ 2x2+
We often call β1 the linear effect parameter and β2 the quadratic effect parameter.
The parameter β0 is the mean of y when x = 0 if the range of the data includes x = 0.
Otherwise β0 has no physical interpretation.
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 6262
| | | | | | | | | | |0 1 2 3 4 5 6 7 8 9 10 x
10 ╾ 9 ╾ 8 ╾ 7 ╾E(Y) 6 ╾ 5 ╾ 4 ╾ 3 ╾ 2 ╾ 1 ╾ 0 ╾
5-2x-0.25x2
Numerical exampleNumerical example
Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: Important consideration in fitting these models Important consideration in fitting these models
1. Order of the model: Keep the order of the model as low as possible
2. Model building strategy: Use forward selection or backward elimination
3. Extrapolation: extrapolation with polynomial models can be extremely hazardous
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 6363
| | | | | | | | | | |0 1 2 3 4 5 6 7 8 9 10 x
10 ╾ 9 ╾ 8 ╾ 7 ╾E(Y) 6 ╾ 5 ╾ 4 ╾ 3 ╾ 2 ╾ 1 ╾ 0 ╾
5+2x-0.25x2
Region of original data Extrapolation
Example:
•If we extrapolate beyond the range of
the original data, the predicted
response turns downward.
•This may be at odds with the true
behavior of the system. But, In
general, a polynomial models may
turn in unanticipated and
inappropriate directions, both in
interpolation and in extrapolation.
Example:
•If we extrapolate beyond the range of
the original data, the predicted
response turns downward.
•This may be at odds with the true
behavior of the system. But, In
general, a polynomial models may
turn in unanticipated and
inappropriate directions, both in
interpolation and in extrapolation.
Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: Important consideration in fitting these models Important consideration in fitting these models
4. Ill –Conditioning I: This means that the matrix inversion calculations will
be inaccurate, and considerable error may be introduced into the parameter
estimates. Nonessential ill-conditioning caused by the arbitrary choice of
origin can be removed by first centering the regressor variables
5. Ill –Conditioning II : If the values of x are limited to a narrow range, there
can be significant ill-conditioning or multicollinearity in the columns of the X
matrix. For example, if x varies between 1 and 2, x2 varies between 1 and 4,
which could create strong multicollinearity between x and x2.
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 6464
Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: Important consideration in fitting these models Important consideration in fitting these models
4. Example: Hardwood Concentration in Pulp and Tensile Strength of Kraft Paper
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 6565
20 1 2( ) ( )y x x x x
2ˆ 45.295 2.546( 7.2632) 0.635( 7.2632)y x x Fitting:Fitting:
Testing:Testing:
0 2
1 2 0 2 1 0
Re1 2
0.01,1,16
: 0( , ) ( , )
: 0
105.45 8.53
R R
s
HSS SS
by FMS
H
F F
Diagnostic:Diagnostic: Residual analysis
Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: in two or more variables in two or more variables
In general, these models are straightforward extension of the model
with one variable . An example of a second-order in two variable is:
Y=0+1x1 + 2x2+ 11x12 +22x2
2 +12x1x2 +
Where, 1, 2 are linear effect parameters, 11, 22 are quadratic effect parameters and
12 is an interaction effect parameter.
This example, has received considerable attention, both from researchers and from
practitioners. The regression function of this example is called response surface.
Response surface methodology (RSM) is widely applied in industry formodeling the output response(s) of a process in terms of the important
controllable variables and then finding the operating conditions that optimize the response.
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 6666
Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: in two or more variables in two or more variables
Example:
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 6767
Observation Run order Temperature (T) Concentration (C ) conversion
1 4 200 15 43
2 12 250 15 78
3 11 200 25 69
4 5 250 25 73
5 6 189.65 20 48
6 7 260.35 20 76
7 3 225 12.93 65
8 1 225 27.07 74
9 8 225 20 76
10 10 225 20 79
11 9 225 20 83
12 2 225 20 81
x1 x2 y
-1 -1 43
1 -1 78
-1 1 69
1 1 73
-1.414 0 48
1.414 0 76
0 -1.414 65
0 1.414 74
0 0 76
0 0 79
0 0 83
0 0 81
1
225
25
Tx
2
20
5
Cx
Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: in two or more variables in two or more variables
Example: Central composite design is widely used for fitting RSM.
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 6868
| | | |175 200 225 250 275 Temperature,
| | | |-2 -1 0 1 2 x1,
30 ╾
25 ╾
20 ╾
15 ╾
10
2 ╾
1 ╾
0 ╾
-1 ╾
-2
Runs at:Comers of square (x1,x2)=(-1,-1),(-1,1), (1,-1),(1,1)
Center of square (x1,x2)=(0,0),(0,0), (0,0),(0,0)
Axial of square (x1,x2)=(0,-1.414),(0,1.414), (-1.414,0),(1.414,0)
Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: in two or more variables in two or more variables
We fit the second order model:
Y=0+1x1 + 2x2+ 11x12 +22x2
2 +12x1x2 +
To do this, we have:
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 6969
2 21 2 1 2 1 2
1 1 1 1 1 1 43
1 1 1 1 1 1 78
1 1 1 1 1 1 69
1 1 1 1 1 1 73
1 1.414 0 2 0 0 48
1 1.414 0 2 0 0 76
1 0 1.414 0 2 0 65
1 0 1.414 0 2 0 74
1 0 0 0 0 0 76
1 0 0 0 0 0 79
1 0 0 0 0 0 83
1 0 0 0 0 0 81
x x x x x x
X y
0 11 22
1
2
0 11 22
0 11 22
12 0 0 8 8 0 845
0 8 0 0 0 0 78.592
0 0 8 0 0 0 33.726,
8 0 0 12 4 0 511
8 0 0 4 12 0 541
0 0 0 0 0 4 31
ˆ ˆ ˆ12 8 8 845
ˆ8 78.592
ˆ8 33.726ˆˆ ˆ ˆ8 12 4 511
ˆ ˆ ˆ8 4 12
X X X y
β β β
β
βX X β X y
β β β
β β β
12
79.75
9.83
4.22ˆ8.88
5.135417.75ˆ4 31
β
β
Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: in two or more variables in two or more variables
So, the fitted model by coded variable is: the second order model:
ŷ=79.75+9.83x1 +4.22x2- 8.88x12 -5.13x2
2 -7.75x1x2
And in terms of the original data, the model is:
We use coded data for computation of sum of squares:
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 7070
2 2
2 2
225 20 225 20 225 20ˆ 79.75 9.83 4.22 8.88 5.13 7.75
25 5 25 5 25 5
1105.56 8.0242 22.994 0.0142 0.20502 0.062
T C T C T Cy
T C T C TC
Source of variation SS D.F MS F P-valueRegression 1733.58 5 346.72 58.87 <0.0001
Residual 35.34 6 5.89
Total 1768.92 11
12 12
2 2 2 2
1 1
ˆˆ 43.96 79.11 67.89 72.04 48.11 75.90 63.54 75.46 79.75 79.75 79.75 79.75
ˆ 12 1733.57 ; 12 1768.92R i T ii i
y β X
SS y y SS y y
Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: in two or more variables in two or more variables
So, if we fit only linear model by coded variable, we have:
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 7171
1 2
1 1 1 43
1 1 1 78
1 1 1 69
1 1 1 73
1 1.414 0 48
1 1.414 0 76
1 0 1.414 65
1 0 1.414 74
1 0 0 76
1 0 0 79
1 0 0 83
1 0 0 81
x x
X y
0
1
2
12 0 0 845
0 8 0 , 78.592
0 0 8 33.726
ˆ12 845 70.42ˆ ˆ ˆ8 78.592 9.83
ˆ 4.228 33.726
X X X y
β
X X β X y β β
β
Source of variation SS D.F MS F P-valueRegression 914.41 2 457.21 4.82 0.0377
Residual 854.51 9 94.95
Total 1768.92 11
12 12
2 2 2
1 1
1 1 1 1 1 1 1 1 1 1 1 1ˆˆ [70.42 9.83 4.22] 1 1 1 1 1.414 1.414 0 0 0 0 0 0
1 1 1 1 0 0 1.414 1.414 0 0 0 0
ˆ 56.37 76.03 64.81 84.47 56.52 84.32 64.45 76.39 70.42 70.42 70.42 70.42
ˆ 12 914.41 ;R i T ii i
y β X
y
SS y y SS y
212 1768.92y
Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: in two or more variables in two or more variables
As, the last four rows of the matrix X in page 64 are the same, we can divide the SSRes ibto two components and do a
lack of fit test. We have:
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 7272
Source of variation SS D.F MS F P-value
Regression (SSR) SSR(1, 2|0) SSR(11, 22, 12 |1, 2, 0)=SSR-SSR(1, 2|0)
1733.58(914.4)(819.2)
5(2)(3)
346.72(457.2)(273.1)
58.87 <0.0001
Residual Lack of fit Pure error
35.34(8.5)(26.8)
6(3)(3)
5.89(2.83)(8.92)
0.3176 0.8120
Total 1768.92 11
2
2 2 2 2 2 29
2 2PE 9 LOF Re PE
1
76 79 83 8110 for 1,...,8 but 76 79 83 81
3 4
( 1) (4 1) 26.75 35.34 26.75 8.59
j
m
i i si
S j S
SS n S S SS SS SS
Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: in two or more variables in two or more variables
As the quadratic model is significant for the data, we can do tests on the individual
variables to drop out unimportant terms, if there is any. We use the following statistics
Where Cjj are diagonal entities of the matrix (XX’)-1:
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 7373
Re
ˆ ˆ ˆ
ˆ ˆˆvar
j j jj
jj sj j
β β βt
C MSS β β
Variable Estimated coefficient Standard error
t P-value
Intercept 79.75 1.21 65.72
x1 9.83 0.86 11.45 0.0001
x2 4.22 0.86 4.913 0.0027
x12 -8.88 0.96 -9.25 0.0001
x22 -5.13 0.96 -5.341 0.0018
x1x2 -7.75 1.21 -6.386 0.0007
1
1 1 10 0 0
4 8 81
0 0 0 0 08
10 0 0 0 0
81 5 1
0 0 08 32 321 1 5
0 0 08 32 32
10 0 0 0 0
4
X X
15.89
4
Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: in two or more variables in two or more variables
Generally we prefer to fit the full quadratic model whenever possible, unless there are large
differences between the full and the reduced model in terms of PRESS and adjusted R2
ŷ=79.75+9.83x1 +4.22x2- 8.88x12 -5.13x2
2 -7.75x1x2
Using equation we have:
Note that, all 8 runs 1 to 8 have the same h ii as these points are
Equidistant form the center of the design. In addition, all last
four runs have hii=0.25.
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 7474
x1 x2 y ŷ ei hii ti e[i]
-1 -1 43 43.96 -0.96 0.625 -0.67 -2.55
1 -1 78 79.11 -1.11 0.625 -0.74 -2.95
-1 1 69 67.89 1.11 0.625 0.75 2.96
1 1 73 72.04 0.96 0.625 0.65 2.56
-1.414 0 48 48.11 -0.11 0.625 -0.07 -0.29
1.414 0 76 75.90 0.10 0.625 0.07 0.28
0 -1.414 65 63.54 1.46 0.625 0.98 3.89
0 1.414 74 75.46 -1.46 0.625 0.99 -3.90
0 0 76 79.75 -3.75 0.250 -1.78 -5.00
0 0 79 79.75 -0.75 0.250 -0.36 -1.00
0 0 83 79.75 3.25 0.250 1.55 4.33
0 0 81 79.75 1.25 0.250 0.59 1.67
1
ii i ih XX x x
11
1 1 10 0 0
4 8 81
0 0 0 0 0 1811
0 0 0 0 0181 1 1 1 1 1 0.625
11 5 10 0 0
8 32 32 11 1 5 10 0 08 32 32
10 0 0 0 0
4
h
R2=0.98 , R2Adj=0.96 , R2
Predicted=0.94
Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: in two or more variables in two or more variables
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 7575
Normality is hold,
because:
Normality is hold,
because:
Variance is stable,
because:
Variance is stable,
because:
independence is hold,
because:
independence is hold,
because:
Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: Orthogonal polynomial Orthogonal polynomial
Consider the kth-order polynomial model in one variable as
Y=0+1x+ 2x2+ … +kxk+
Generally the columns of the X matrix will not be orthogonal. One
approach to deal with this problem is orthogonal polynomial. In this
approach we fit the following model:
Y=0+ 1 P1(x)+ 2P1(x)+ … + kPk(x)+
Where Pj(x) is jth-order orthogonal polynomial defined such as:
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 7676
01
( ) ( ) 0 , ; ( ) 1 , 1,...,n
r i s i ii
P x P x r s P x i n
Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: Orthogonal polynomial Orthogonal polynomial
With this model we have
Pj(x) can be determined by gram-Schmidt process. In the cases where the level of x are equally spaced we have:
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 7777
20
10 1 1 1 1
20 2 1 2 2 1 1
12
10 12
1
( ) 0 0
( ) ( ) ( )( )
( ) ( ) ( ) 0 ( )) 0ˆ , 0,...,
( )( ) ( ) ( )
0 0 ( )
n
ii
nk n
j i ik i i
i j n
j iin n k n n
k ii
P x
P x P x P xP x y
P x P x P x P xX X X α j k
P xP x P x P x
P x
0
11
2 2
22
3 2
33
2 24 2 2
44
( ) 1
1( )
1 1( )
12
1 3 7( )
20
3 1 91 3 13( )
14 560
i
ii
ii
i ii
i ii
P x
x xP x
λ d
x x nP x
λ d
x x x x nP x
λ d d
n nx x x x nP x
λ d d
Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: Orthogonal polynomial Orthogonal polynomial
Gram-Schmidt process: Consider an arbitrary set S={U1,…,Uk} and
denote by 〈 Ui , Uj 〉 the inner product of Ui and Uj. Then the set
S’={V1,…,Vk} are orthogonal when computed as bellow:
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 7878
1 1
2 12 2 1
1 1
3 2 3 13 3 2 1
2 2 1 1
1
1
,
,
, ,
, ,
,
,
kk j
k k jj j j
V U
U VV U V
V V
U V U VV U V V
V V V V
U VV U V
V V
〈 〉〈 〉〈 〉 〈 〉〈 〉 〈 〉
〈 〉〈 〉
NormalizingNormalizing 1,...,,
jj
j j
Ve j k
V V
〈 〉
Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: Orthogonal polynomial Orthogonal polynomial
In polynomial regression with one variable assume that Ui =xi-1 . Now, applying
Gram-Schmidt process, we have:
If the levels of x are equally spaced we have:
So, in this case we have:
Exercise : Give a proof for other Pj(x) in page 72 by similar method.
Note: every arbitrary constant can be substituted by j in Pj(x).
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 7979
2
,11
,1V x x x
x〈 〉1〈 〉
NormalizingNormalizing
21
22 2
1
( ), n
ii
V x xP x
V Vx x
〈 〉
2 2 2
2 2 20
1 1 1
1 1 1( 1) 0 1
2 2 12
n n n
ii i i
n n nx x x d i x d d i nd
1 2 212
1 1( )
1 112 12
x x x x x xP x
d dn nd n n
Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: Orthogonal polynomial Orthogonal polynomial
Example:
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 8080
x y50 335
75 326
100 316
125 313
150 311
175 314
200 318
225 328
250 337
275 345
i P0(x)=1
1 1 -9 335
2 1 -7 326
3 1 -5 316
4 1 -3 313
5 1 -1 311
6 1 1 314
7 1 3 318
8 1 5 328
9 1 7 337
10 1 9 345
xi-xi-1 =25 for all i, so the levels of X
are equally spaced and we have
xi-xi-1 =25 for all i, so the levels of X
are equally spaced and we have
2
2
99( ) 0.5 5.5
12iP x i
1( ) 2 5.5iP x i
10 10 10
0 1 21 1 1
0 1 210 10 102 2 2
0 1 21 1 1
( ) ( ) ( )ˆ ˆ ˆ324.3 , 0.74 , 2.8
( ) ( ) ( )
i i i i i ii i i
i i ii i i
P x y P x y P x yα y α α
P x P x P x
Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: in two or more variables in two or more variables
Then, the fitted model is:
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 8181
Source of variation SS D.F MS F P-value
Regression (SSR) Linear Quadratic
1213.43181.891031.54
2(1)(1)
606.72(181.89)(1031.54)
159.2447.74270.75
<0.0001<0.0002<0.0001
Residual 26.67 7 3.81
Total 1240.1 9
1 2
2
2
324 3 0 74 2 8
99324 3 1 48 5 5 1 4 5 5
12
346 96 13 92 1 4
i i iy . . P ( x ) . P ( x )
. . i . . i .
. . i . i
Chapter 8Chapter 8
Indicator Variables
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 8282
Ch 8: Indicator VariablesCh 8: Indicator Variables
The variables employed in regression analysis, are often quantitative variables: Example: temperature, distance, income
These variables have well defined scale of measurement
In some situation it is necessary to use qualitative or categorical variables, as
predictor variables Example: sex, operators, employment status,
In general, these variables have no natural scale of measurement,
Question: How we can account for the effect that these variables may have on the response?
This is done through the use of indicator variable. Sometimes, indicator variables are called dummy variables .
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 8383
Ch 8: Indicator Variables: Example 1Ch 8: Indicator Variables: Example 1
Y=life of cutting tool: X1 = lathe speed per minute;
X2 = Type of cutting tool; is qualitative and has two levels (e.g. tool types A and B)
Let
Assuming that a first-order model is appropriate, we have:Y=β0+ β1x1+ β2x2+ϵ
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 8484
0 if the observation is from tool types AX2= 1 if the observation is from tool types B
Y=β0+ β1x1+ β2(0)+ϵ=β0+ β1x1+ϵ Y=β0+ β1x1+ β2(1)+ϵ=(β0+β2)+ β1x1+ϵA ← tool type →BA ← tool type →B
Ch 8: Indicator VariablesCh 8: Indicator Variables
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 8585
50
β0+β2
β0
0
500 1000 Lathe speed, x1 (RPM)
E(Y|x2=0)=β0+ β1x1 , tool type A
E(Y|x2=1)=β0+ β2 + β1x1 , tool type B↑β2
↓ β1
β1
• Regression lines are parallel;
• 2 is a measure of the difference
in mean tool life resulting from
changing from tool type A to tool
type B;
• Variance of the error is assumed
to be the same for both tool types
A and B.
• Regression lines are parallel;
• 2 is a measure of the difference
in mean tool life resulting from
changing from tool type A to tool
type B;
• Variance of the error is assumed
to be the same for both tool types
A and B.
Ch 8: Indicator Variables: Example 2Ch 8: Indicator Variables: Example 2
Consider again the example 1 , but here assume that X2 = Type of cutting tool; is qualitative and has three levels (e.g. tool types A, B and C)
Define two indicator variables:
Assuming that a first-order model is appropriate, we have:
Y=β0+ β1x1+ β2x2+ β3x3+ ϵ=
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 8686
(0,0) if the observation is from tool types A
(x2,x3)= (1,0) if the observation is from tool types B
(0,1) if the observation is from tool types C
β0+ β1x1+ β2(0)+ β3(0)+ ϵ=β0+ β1x1+ϵ tool type=A
β0+ β1x1+ β2(1)+ β3(0)+ ϵ=(β0+β2)+ β1x1+ϵ tool type=B
β0+ β1x1+ β2(0)+ β3(1)+ ϵ=(β0+β3)+ β1x1+ϵ tool type=C
In general, a qualitative variable with l levels is represented by l -1 indicator variables, each taking on the values 0 and 1.
In general, a qualitative variable with l levels is represented by l -1 indicator variables, each taking on the values 0 and 1.
Ch 8: Indicator Variables: Ch 8: Indicator Variables: Numerical exampleNumerical example
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 8787
yi xi1 xi2
18.73 610 A
14.52 950 A
17.43 720 A
14.54 840 A
13.44 980 A
24.39 530 A
13.34 680 A
22.71 540 A
12.68 890 A
19.32 730 A
30.16 670 B
27.09 770 B
25.40 880 B
26.05 1000 B
33.49 760 B
35.62 590 B
26.07 910 B
36.78 650 B
34.95 810 B
43.67 500 B
We fit model Y=β0+ β1x1+ β2x2+ϵ
0 1 2
0 1 2
0 1 2
20 15010 10 490.38
15010 11717500 7540 , 356515.7
10 7540 10 319.28
ˆ ˆ ˆ20 15010 10 490.38
ˆ ˆ ˆ ˆ15010 11717500 7540 356515.7
ˆ ˆ ˆ10 7540 10 319.28
36.99ˆ
X X X y
β β β
X X β X y β β β
β β β
β
0.03
15.00
20 20
2 2 2 2
1 1
1 1 1 1 1 1ˆˆ [36.99 0.03 15] 610 950 730 670 770 500
0 0 0 1 1 1
ˆ 20.76 11.71 38.69
ˆ 20 1418.03 ; 20 1575.09R i T ii i
y β X
y
SS y y SS y y
Ch 8: Indicator Variables: Ch 8: Indicator Variables: Numerical exampleNumerical example
Then, the fitted model is:
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 8888
Source of variation SS D.F MS F P-value
Regression (SSR) 1418.03 2 709.02 79.75 <0.0001
Residual 157.06 17 9.24
Total 1575.09 19
1 236 99 0 03 15 00iy . . x . x
Variable Estimated coefficient Standard error
t P-value
Intercept 36.99
x1 -0.03 0.005 -5.89 <0.00001
x2 15 1.360 11.04 <0.00001
Ch 8: Indicator Variables: Ch 8: Indicator Variables: Comparing regression modelsComparing regression models
Consider the case of simple linear regression where the n observations can be formed into M groups, with the
m th group having nm observations. The most general model consists of M separate equations such as:
Y=0m+1mx+ , m=1,2,…,M
It is often of interest to compare this general model to a more restrictive one
Indicator variables are helpful in this regard. Using indicator variables we can write:
Y=(01+11x)D1 +(02+12x)D2 +…+(0M+1Mx)DM +
Where Di is 1 when group i is selected. We call this model as full model (FM). In this model we have
2M parameters and so degree of freedom for SSRes(FM) is n-2M.
Exercise: Let SSRes(FMm) denotes sum of square of residual in model Y=0m+1mx+. Show
that SSRes(FM)= SSRes(FM1)+ SSRes(FM2)+…+ SSRes(FMM)
We consider three cases:1) Parallel lines: 11=12 =…= 1M
2) Concurrent lines: 01=02 =…= 0M
3) Coincide lines: 11=12 =…= 1M and 01=02 =…= 0M
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 8989
Ch8: Indicator Variables: Ch8: Indicator Variables: parallel linesparallel lines
In the parallel lines all M slopes are identical but the intercepts may differ. So, here we want to test:
H0:11=12 =…= 1M =1
Recall that this procedure involves fitting a full model (FM) and a reduced model(RM) restricted to the null hypothesis
and computing the F statistics
Under H0 the full model will be reduced to the following model
Y=01+1x +2D2 +…+MDM +
In this model dfRM=n-(M+1)
Therefore, using the above F statistics we can test hypothesis H0.
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 9090
FM RM FM
Res Res
FM RM0 1 , ,
Res
FM
(FM) (RM)
H is rejected when(FM) df df df
SS SS
df dfF F F
SS
df
Analysis of covarianceAnalysis of covariance
Ch 8: Indicator Variables: Ch 8: Indicator Variables: Concurrent and coincide linesConcurrent and coincide lines
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 9191
In the concurrent lines all M intercepts are identical but the slopes may differ:
H0:01=02 =…= 0M =0
Under H0 the full model will be reduced to the following model
Y=0+1x +2 xD2 +…+M xDM +
In this model dfRM=n-(M+1)
In this way, similar to parallel lines, we can test hypothesis H0 using the above F statistics.
In the concurrent lines all M intercepts are identical but the slopes may differ:
H0:01=02 =…= 0M =0
Under H0 the full model will be reduced to the following model
Y=0+1x +2 xD2 +…+M xDM +
In this model dfRM=n-(M+1)
In this way, similar to parallel lines, we can test hypothesis H0 using the above F statistics.
In the coincide lines we want to test:
H0:01=02 =…= 0M =0 and 11=12 =…= 1M =1
Under H0 the full model will be reduced to the simple model
Y=0+1x +
In this model dfRM=n-2
In this way, similar to parallel lines, we can test hypothesis H0 using the above F statistics.
In the coincide lines we want to test:
H0:01=02 =…= 0M =0 and 11=12 =…= 1M =1
Under H0 the full model will be reduced to the simple model
Y=0+1x +
In this model dfRM=n-2
In this way, similar to parallel lines, we can test hypothesis H0 using the above F statistics.
Ch 8: Indicator Variables: Ch 8: Indicator Variables: Regression approach to analysis varianceRegression approach to analysis variance
Consider a one way model:
yij=+i+ij =i+ij , i=1,…,k; j=1,2,…,n
In the fixed effect case:
H0:1= 2 =…= k =0
H1: 10 at least for one i
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 9292
Source of variation
SS Df MS F
Treatment K-1 ssT/(k-1)
Error K(n-1) SSRes /k(n-1)
Total Kn-1
2
. ..1
k
ii
n y y
2
.1 1
k n
ij ii j
y y
2
..1 1
k n
iji j
y y
Ch 8: Indicator Variables: Ch 8: Indicator Variables: Regression approach to analysis varianceRegression approach to analysis variance
Equivalent regression model for the one way model:
yij=i+ij , i=1,…,k; j=1,2,…,n
is:
Yij=0+1x1j + 2x2j+ …+k-1xk-1,j +ij
where
Relationship between two models:
0=k
i= i -k ; i=1,…,k
Exercise: Find the relationship among Sum of Squares in regression and one way Anova.
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 9393
1 if the observation j is from treatment iXij= 0 otherwise
Ch 8: Indicator Variables: Ch 8: Indicator Variables: Regression approach to analysis varianceRegression approach to analysis variance
For the case k=3 we have:
Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 9494
1 2
11
12
13
0 021 ..
22 1 1. 1
23 2.2
31
32
33
1 1 0
1 1 0
1 1 0ˆ ˆ9 3 31 0 1
ˆ ˆ ˆ( ) 3 3 01 0 1
ˆ ˆ3 0 31 0 1
1 0 0
1 0 0
1 0 0
x x
y
y
yβ βy y
X y y X X β X y β y β
y yβ βy
y
y
3.
1. 3.
2. 3.2
y
y y
y y
H0:1= 2 = 3 =0
H1: 10 at least for one i
H0:0= and 1= 2=0
H1: 10 or 10 or bothequivalentequivalent