regression ii dr. rahim mahmoudvand department of statistics, bu-ali sina university...

Regression II

Dr. Rahim Mahmoudvand

Department of Statistics,

Bu-Ali Sina University

[email protected]

Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 11

Chapter 4Chapter 4

Model Adequacy

Checking


Ch4: Partial Regression Plot:Ch4: Partial Regression Plot: Definition and UsageDefinition and Usage

Is a curvature effect for the regressor needed in the model?

A partial regression plot is a variation of the plot of residual versus the residual.

This plot evaluate whether we have specified the relationship between the response and the regressor variables correctly.

This plot study the marginal relationship of a regressor given the other variables that are in the model.

The partial residual plot is called the added-variable plot or the adjusted variable plot, too.


Ch4: Partial Regression Plot: Ch4: Partial Regression Plot: The way of working with exampleThe way of working with example

In this plot, the response variable y and the regressor xj are both regressed against the other regressors in the model and the residuals obtained for each regression. The plot of these residuals against each other provides information about the nature of the marginal relationship for regressor xj under consideration.

Example: Example:

2 0 1 2

2 2

ˆ ˆˆ ( )

ˆ ( ) , 1, 2,...,

i i

i i i

y x x

e y x y y x i n

1 2 0 1 2

1 2 1 1 2

ˆ ˆˆ ( )

ˆ ( ) , 1,2,...,

i i

i i i

x x x

e x x x x x i n

0 1 1 2 2 ; 1, 2,...,i i i iy x x i n

Y is regressed on x2

x1 is regressed on x2

Regression II; Bu-Ali Sina UniversityFall 2014 4

Ch4: Partial Regression Plot:Ch4: Partial Regression Plot: Interpretation of plotInterpretation of plot

2ie y x

1 2ie x x

2ie y x 2ie y x

1 2ie x x 1 2ie x x

• Regressor x1 enters the

model linearly,

• Line through from the

origin and slop of the

line is equal to 1

• Higher order term in x1

such as x12 is required.

• Transformation such as

replacing x1 with 1/ x1 is

required

• there is no additional useful information in x1 for predicting y.


Ch4: Partial Regression Plot: Ch4: Partial Regression Plot: Relationship among ResidualsRelationship among Residuals

Consider model . Denoting

We have:

( )

( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

0

( ) ( )

j j

j j j j j j j

j j j j j j j

e x X

j j j j

e Y X I H Y I H X x

I H X I H x I H

e x X I H

( ) ( )j j j je x X I H x

Y X

Y is regressed on X(j)

xj is regressed on X(j)

( ) 1 11j j jX x x

( ) ( )j j j jY X x


Ch4: Partial Regression Plot: Ch4: Partial Regression Plot: shortcomingshortcoming

• These plots may not give information about the proper form of the relationship if several variables already in the model are incorrectly specified.

• Partial regression plots will not, in general, detect interaction effects among the regressors.

• The presence of strong multicollinearity can cause partial regression plots to give incorrect information about the relationship between the response and the regressor variables.


Ch4: Ch4: Partial Residual Plots: Definition and usageCh4: Ch4: Partial Residual Plots: Definition and usage

• Partial residual plot is closely related to the partial regression plot

• A partial residual plot is a variation of the plot of residuals versus the predictor.

• It is designed to show the relationship between the response variable and the regressors


Ch4: Partial residual Plot:Ch4: Partial residual Plot: computation of partial residuals computation of partial residuals

Consider model . Denoting

We have:

Partial residual is defined and calculated by:

,( ) ( )

,( ) ( )

ˆ ˆˆ

ˆ ˆi i i i i j j ij j

i i j j i ij j

e y y y X x

y X e x

Y X

,( ) , 1 , 11i j i j i jX x x

,( ) ( )i i j j ij j iy X x

*,( ) ( )

ˆ ˆi j i i j j i ij je y x y X e x


Ch4: Partial Residual Plot:Ch4: Partial Residual Plot: Interpretation of plotInterpretation of plot

*i je y x

ijx

*i je y x *

i je y x

ijxijx

• Regressor xj enters the

model linearly,

• Line through from the

origin and slop of the

line is equal to ˆ

j

• Higher order term in xij

such as xij2 is required.

• Transformation such as

replacing xij with 1/ xij is

required

• there is no additional useful information in xij for predicting y.


Ch4: Other Plots: Regressor versus regressorCh4: Other Plots: Regressor versus regressor

• Scatterplot of regressor xi against regressor xj :• Is useful in studying the relationship between regressor variables,

• Is useful in detecting multicollinearity


jx

ix

jx

ix

• There is one unusual observation with respect to xj

• There is one unusual observation with respect to xj

• There is one unusual observation with respect to both sides

• There is one unusual observation with respect to both sides

Ch4: Other Plots: Response versus regressorCh4: Other Plots: Response versus regressor

• Scatterplot of response y against regressor xj :• Is useful in distinguishing the type of points


y

ix

y

ix

y

ix

• Influential point,• Outlier in x space,• Prediction variance for

this point is large,• Residual variance for

this point is small.

• Influential point,• Outlier in x space,• Prediction variance for



• Influential point,• Outlier in y direction• Influential point,• Outlier in y direction

• leverage point,• Outlier in both sides,• Prediction variance for



• leverage point,• Outlier in both sides,• Prediction variance for



Ch4: PRESS Statistic:Ch4: PRESS Statistic: computation and usage computation and usage

• PRESS is generally regarded as a measure of how well a regression model

will perform in predicting new data.

• A model with a small value of PRESS is desired.

• PRESS residuals are:

and accordingly PRESS statistic is defined as follows:

• R2 for prediction based on PRESS statistic:

( ) ( )ˆ , 1,2,..., .1

ii i i

ii

ee y y i n

h

2( )

1

PRESS=n

ii

e


2 PRESSR 1-

TSS

Ch4: PRESS Statistic:Ch4: PRESS Statistic: Interpretation with an example Interpretation with an example


Ch4: PRESS Statistic: Ch4: PRESS Statistic: Interpretation with an exampleInterpretation with an example


2 2( ) (9)

1

PRESS= 459 , 218n

ii

e e

2 459

R -PRESS=1- 0.92095784

2 233.73R =1- 0.9596

5784

y is regressed on x1

2( )

1

PRESS= 733.55n

ii

e

y is regressed on x1 , x2

2( )

1

PRESS= 459n

ii

e

So, model including both x1 and x2 is better than model with only x1 is included.

Ch4: Ch4: Detection and Treatment of Outliers:Detection and Treatment of Outliers: Tools and methods Tools and methods

Recall that, an outlier is an extreme observation; one that is considerably different from the majority of the data.Detection tools:

ResidualsScaled residualsDoing statistical test

Outliers can be categorized to:Bad values, occurring as a result of unusual but explainable events; such as faulty measurement or analysis, incorrect recording of data and failure of a measurement instrument;Normally observed values, such as leverage and influential observations.

TreatmentRemove bad valuesFollow-up analysis of outliers; this may help us to improve process or results in new knowledge concerning factors whose effect on the response was previously unknown.The effect of outliers may be checked easily by dropping these points and refitting the regression equation.


Ch4Ch4: Detection and Treatment of Outliers:: Detection and Treatment of Outliers: Example 1 (Rocket data) Example 1 (Rocket data)


Quantity Obs 5 & 6 IN Obs 5 & 6 out

intercept 2627.82 2658.97

Slope -37.15 -37.69

R2 0.9018 0.9578

MSRes 9244.59 3964.63

Ch4: Detection and Treatment of Outliers:Ch4: Detection and Treatment of Outliers: Example 2 Example 2


Country Cigarette Deaths

Australia 480 180

Canada 500 150

Denmark 380 170

Finland 1100 350

UK 1100 460

Iceland 230 60

Netherlands 490 240

Norway 250 90

Sweden 300 110

Switzerland 510 250

USA 1300 200

Regression with all the data:The regression equation isy = 67.6 + 0.228 xR-Sq = 54.4%

Regression without the USAThe regression equation isy = 9.1 + 0.369 xR-Sq = 88.9%

Ch4: Lack of fit of the regression model:Ch4: Lack of fit of the regression model: What is meant? What is meant?

All models are wrong; some models are useful All models are wrong; some models are useful (George Box)(George Box)

In the simple linear regression model if we have n distinct data points we can always fit a polynomial of order up to n-1. In the process what we claim to be random error is actually a systematic departure as the result of not fitting enough terms.


y

ix

y

ix

• Perfect linear fitting is always possible when we have two distinct points.

• Perfect linear fitting is always possible when we have two distinct points.

• Perfect linear fitting is not possible in general when we have three (and more) distinct points.

• Perfect linear fitting is not possible in general when we have three (and more) distinct points.

Ch4: Lack of fit of the regression model:Ch4: Lack of fit of the regression model: A formal test A formal test

This test assumes that normality, independence and constant variance requirements are met and

Only first order or straight line character of the relationship is in doubt. To do this test, we have to replicate observations on response y for at least

one level of x. These new data can provide a model-independent estimate of 2.


ijx

iy

• Straight line fit is not satisfactory• Straight line fit is not

satisfactory


Let yij= jth observation on the response at xi ; j=1,…, ni ; i=1,….,m.

So, we have n=ni observations and we can write:


1

2

11 1

1 1

21 2

2 2

1

1

,

1m

n

n

mm

mmn

y x

y x

y x

Y Xy x

xy

xy

0 1 ; 1,..., , 1,...,ij i ij iy x j n i m

Considering a linear regression:

0 1

1 10 1 1

1 1 1

ˆ ˆˆ ˆ ; 1,..., , 1,...,

( )1 1ˆ ˆ ˆ;

i

i

ij i i i

nm

i ijnm mi j

ij i ii j i xx

y y x j n i m

x x y y

y n xn n S

0 1

2

,1 1

mininm

iji j


We have:

Accordingly, we get:


ˆ ˆij ij i ij i i ie y y y y y y

22

Re1 1 1 1

2 2

1 1 1 1 1 1

2 2

1 1 1 1 1 1

0

ˆ

ˆ ˆ2

ˆ ˆ2

i i

i i i

i i i

n nm m

s ij ij i i ii j i j

n n nm m m

ij i i i ij i i ii j i j i j

n n nm m m

ij i i i i i ij ii j i j i j

SS e y y y y

y y y y y y y y

y y y y y y y y


Accordingly:


LOFPE

2 2

Re1 1 1

ˆinm m

s ij i i i ii j i

SSSS

SS y y n y y

1 1

. 1 ;m m

i ii i

d f n n m n n

• If the assumption of constant variance is satisfied, SSPE is a model independent measure of pure error.

• Degree of freedom for SSPE is

• Note also that:

where Si2

is the variance of response at level xi.

• If the assumption of constant variance is satisfied, SSPE is a model independent measure of pure error.

• Degree of freedom for SSPE is

• Note also that:

where Si2

is the variance of response at level xi.

• If the fitted value are close to the corresponding average response then there is a strong indication that the regression function is linear.

• Note that:

• If the fitted value are close to the corresponding average response then there is a strong indication that the regression function is linear.

• Note that:

2PE

1

1m

i ii

SS n S

1

1

11 1 1 1

1 ˆˆ;

( )1 ˆ;

i

i i

n

i ij i iji

n nm mi

ij iji j i jxx

y y y y x xn

x xy y y

n S


It is well known that E(Si2)=2 and so we get:

But for the SSLOF we have:


2 2PE

1

1 ( )m

i ii

E SS n E S n m

2 2LOF

1 1

222

0 11

222

0 11 1

220 1

1

ˆ ˆ ˆvar

1 1( )

1 ( )

2 ( )

m m

i i i i i i i ii i

mi

i i ii i xx

m mi ii

i i ii ixx

m

i i ii

E SS n E y y n y y E y y

x xn E y x

n n S

n x xnn E y x

n S

m n E y x

2

2

2

1ˆvar( )

ˆ ˆvar( ) ;cov , var( )

ii

xx

i i i ii

x xy

n S

y y y yn


An unbiased estimation of variance can be obtained by:

Moreover, we have:

So, the ratio

can be considered as a statistics for testing the linearity assumption in the linear regression model. It can be seen that F0 follows a Fm-2,n-m and therefor Regression function is not linear if F0> Fm-2,n-m,1-


2PEPE

SSE MS E

n m

2

0 12LOF 1

LOF

( )

2 2

m

i i ii

n E y xSS

E MS Em m

PE0

LOF

MSF

MS

Ch4: Ch4: Lack of fit of the regression model:Lack of fit of the regression model: limitation and solutions limitation and solutions

limitations Ideally, we find that the F ratio for lack of fit is not significant, and the hypothesis of

significance of regression is rejected. Unfortunately, this does not guarantee that the model will be satisfactory as a prediction

equation. The model may have been fitted to error only.

Solutions Regression model is to be useful as a predictor when F ratio is at least four or five times

of critical value from F table, Comparing the range of the fitted value s to their average standard error. In order to do

this we can use the following measure for average standard error

Where is a model independent estimate of the error variance.


2

1

ˆ11ˆ ˆˆ ˆvar var

n

i ii

ky y

n n

2

Ch4: Lack of fit of the regression model:Ch4: Lack of fit of the regression model: multiple version multiple version

repeat observations do not often occur in multiple regression According to a solution we are searching for points in x space that are

near neighbors, that is, sets of observations that have been taken with nearly identical levels of x1 , x2 , …, xk.

As a measure of the distance between any two points, for example

will use the weighted sum of squared distance (WSSD):

Pairs of point that have small Dii’2

are near neighbors

The residuals at two points with a small value of Dii’2

can be used to obtain an estimate of pure error.


1 1,..., ; ,...,i ik i i kx x x x

2

2

1 Re

ˆkj ij i j

iij s

x xD

MS

Ch4: Lack of fit of the regression model:Ch4: Lack of fit of the regression model: multiple version multiple version

There is a relationship between the range of a sample from a normal population and the population standard deviation. For samples of size 2, this relationship is (Exercise)

An algorithm for sample size greater than 2 is: First arrange the data points xi1, xi2, …, xik in order of increasing

Compute the values of Dii’2 for all n – 1 pairs of points with adjacent values of

Repeat this calculation for the pairs of points separated by one, two, and three intermediate values . This will produce 4n-10 values of Dii’

2 Arrange above values in ascending order. Let Eu for u=1,…, 4n-10 denote the range of residuals at these points and

calculate an estimate of standard deviation of pure error by:

E1, E2 ,…, Em, are residuals associated with the m smallest values of Dii’2


1ˆ 1.128 0.886 ; i iE E E e e

ˆ iy

ˆ iy

1

0.886ˆ

m

uu

Em

Chapter 5Chapter 5

Methods to Correct

Model Inadequacy


Ch 5: Transformation and WeightingCh 5: Transformation and Weighting

Main assumption in model Y=X+ are:

E()=0 , Var()= 2 ,

N(0, 2 ) ,

Form of X , that has used in the model, is correct.

We use residuals analysis to detect violation from these basic

assumptions. In this chapter, we focus on methods and procedures

for building regression models when some of the above assumptions

are violated.


Ch5: Transformation and Weighting: Ch5: Transformation and Weighting: Problems?Problems?

Error variance is not constant

Relationship between y and regressors is not linear.

Solutions

Transformation: Using transformed data and stabilize variance.

Weighting: Using weighted least square.


Parameter estimators are unbiased Parameter estimators are not BLUE

Ch5: Transformation: Ch5: Transformation: Stabilizing varianceStabilizing variance

Let Var(Y)=c2.[E(Y)]h . In this case we have:

Example 1: Poisson data, Var(Yi)=E(Yi)

Example 2: Inverse-Gaussian data, Var(Yi)=E3(Yi)


1

2

22 2 20 1

1

( ) .. ( ) .

hi i i i i

i ih h h hi ii i i i

y y y y yy y

cVar Y c yc E Y c x c y

1

2

0 1( ) ( )i i i i i

i i

i ii i i i

y y y y yy y

Var Y yE Y x y

1

2

3 3 3 3

0 1( ) ( )

i i i i ii i

i ii i i i

y y y y yy y

Var Y yE Y x y

Ch5: Transformation:Ch5: Transformation: Stabilizing variance Stabilizing variance


Relationship of 2 to E(Y) Transformation

2 constant Y’=Y (no transformation)

2 E(Y) Y’=Y1/2 (square root; Poisson data)

2 E(Y)[1-E(Y)] Y’=Arcsine(Y1/2 ) (Binomial data)

2 [E(Y)]2 Y’=log(Y) (Gamma distribution)

2 [E(Y)]3 Y’=Y-1/2 (Inverse-Gaussian)

2 [E(Y)]4 Y’=Y-1

Ch5: Transformation for stabilizing variance: Ch5: Transformation for stabilizing variance: limitationslimitations

Note that the predicted values are in the transformed scale, so:

Applying the inverse transformation directly to the predicted values

gives an estimate of the median of the distribution of the response

instead of the mean.

Confidence or prediction intervals may be directly converted from one

metric to another. However, there is no assurance that the resulting

intervals in the original units are the shortest possible intervals


Ch5: Transformation: Ch5: Transformation: Linearizing modelLinearizing model

The linearity assumption is the usual starting point in regression analysis.

Occasionally we find that this assumption is inappropriate.

Nonlinearity may be detected via the

lack-of-fit test,

from scatter diagrams, the matrix of scatterplots,

residual plots such as the partial regression plot,

Prior experience or theoretical considerations .

In some cases a nonlinear function can be linearized by using a suitable

transformation. Such nonlinear models are called intrinsically or

transformably linear.


Ch5: Transformation: Ch5: Transformation: Linearizing modelLinearizing model

Example 1:

This function is intrinsically linear since it can be transformed to a straight line by a logarithmic transformation

Example 2:

This function can be linearized by using the reciprocal transformation x’=1/x:


10

xy e

0 1

0 1

log log logy x

y x

0 1

1y

x

0 1y x

Ch5: Transformation:Ch5: Transformation: Linearizing model Linearizing model

When transformations such as those described above are employed, the least –squares estimator has least-squares properties with respect to the transformed data, not the original data.


Linearizable function Transformation

Y=0exp(1x) Y’=log (Y)

Y=0 x1 Y’=log(Y) and x’=log(x)

Y=0+1log(x) x’=log(x)

Y=0/(0x-1) X’=1/x and Y’=1/Y

Ch5: Transformation: Ch5: Transformation: Analytical methodAnalytical method

Transformation on Y

Power transformation

Box-Cox method

Where is the geometric mean of the observations. Then fit model

Y() =X+

The maximum-likelihood estimate of λ corresponds to the value of λ for

which the residual sum of squares from the fitted model SSRes(λ) is a

minimum


( ) 1

1; 0

log ( ) ; 0G

G

y

y y

y y

Gy

Ch5: Transformation: Ch5: Transformation: Analytical methodAnalytical method

Obtaining suitable value for λ is easy by plotting λ versus SSRes(λ) for

some possible values of λ (usually between (-3 , +3))

SSRes(λ)

λ


Ch5: Transformation: Ch5: Transformation: Analytical method with regressorsAnalytical method with regressors

Suppose that the relationship between y and one or more of the regressor

variables is nonlinear but that the usual assumptions of normally and

independently distributed responses with constant variance are at least

approximately satisfied.

Assume E(Y)=0+1Z where

Assuming 0, we expand about 0= in a Taylor series and ignore terms of

higher than first order:

E(Y)=0+1x0 +(- 0) 1x

0.log(x)=0+1x1+2.x2

Where, 0= 0 , 1=1 , 2=(- 0) 1 and x1=x0 , x2=x0.log(x).


; 0

log ( ) ; 0

xZ

x


Now use the following algorithm (By Box-Tidwell, 1962):

1.Fit model E(Y)=0+1x and find least square estimates of 0 and 1

2.Fit model E(Y)=0+1x+2x.log(x) and find least square estimates of 0 , 1

and 2.

3.Applying equality provide an updated

4.Set x=xi and repeat steps 1-3 again.

5.apply the above algorithm until a small difference among i and i-1.


2 1 1ˆ i i 21

1

ˆˆi i

(index i is for the repeat in the algorithm and 0=1)(index i is for the repeat in the algorithm and 0=1)


Example:


ŷ=0.1309+0.2411 x

ŷ =-2.4168+1.5344 x-0.462 x log(x)

1=-0.462/0.2411+1=-0.92

ŷ=0.1309+0.2411 x

ŷ =-2.4168+1.5344 x-0.462 x log(x)

1=-0.462/0.2411+1=-0.92

ŷ =3.1039-6.6874 x’

ŷ =3.2409-6.445 x+0.5994 x’ log(x’)

2=0.5994/-6.6874-0.92=-1.01

ŷ =3.1039-6.6874 x’

ŷ =3.2409-6.445 x+0.5994 x’ log(x’)

2=0.5994/-6.6874-0.92=-1.01

x’=x-0.92


Example:


Model 1 (blue and solid line in the graph)

ŷ=0.1309+0.2411 x

R2=0.87

Model 1 (blue and solid line in the graph)

ŷ=0.1309+0.2411 x

R2=0.87

Model2 (Red and dotted line in the graph)

ŷ =2.9650-6.9693/x

R2=0.980

Model2 (Red and dotted line in the graph)

ŷ =2.9650-6.9693/x

R2=0.980

Ch5: Ch5: Generalized least square:Generalized least square: Covariance matrix is Covariance matrix is nonsingularnonsingular

Consider the model Y=X+ with the following assumptions:

E()=0 ,

Var()= 2 V ,

where V is a nonsingular square matrix.

We will approach this problem by transforming the model to a new set of

observations that satisfy the standard least-squares assumptions.

Then we will use ordinary least squares on the transformed data.


Ch5: Generalized least square:Ch5: Generalized least square: Covariance matrix is nonsingularCovariance matrix is nonsingular

Since V is nonsingular and positive definite, we can write

V=K’K=KK

where K is a nonsingular symmetry square matrix.

Define the new variables:

Z=K-1Y ; B=K-1X ; g=K-1

Multiplying both sides of original regression model by K-1 gives:

Z=B+g

This new transformed model has following properties:

E(g)=K-1E()=0 ; Var(g)=E{[(g-E(g)]’[(g-E(g)]}=E(g’g)=K-1E(’) K-1

= K-1 2 V K-1= 2 K-1 KK K-1= 2


Ch5: Generalized least square:Ch5: Generalized least square: Covariance matrix is nonsingularCovariance matrix is nonsingular

So, in this transformed model, the error terms g has zero mean and constant variance and uncorrelated. In this model:

This estimator is called Generalized least square estimator of . We have easily:


11 1 1 1 1

11 1 1 1

11 1

ˆ ( ) ( )B B B Z K X K X K X K Y

X K K X X K K Y

X V X X V Y

11 1

11 1

11 1

ˆE E X V X X V Y

X V X X V E Y

X V X X V X

12

12 1

ˆvar B B

X V X

Ch5: Generalized least square:Ch5: Generalized least square: Covariance matrix is diagonalCovariance matrix is diagonal

When the errors ε are uncorrelated but have unequal variances so that the covariance matrix of ε is:

the estimation procedure is usually called weighted least squares. Let W=V-1. Then we have:

Which is called the weighted least-squares estimator. Note that observations with large variances will have smaller weights than observations with small variances


1

2

2 2 2

1

1

10 0 0

10 0 0

1 1diag , , 0 0 0 0

1

10 0 0

n

n

n

w

w

Vw w

w

w

1ˆ X WX X WY

Ch5: Generalized least square:Ch5: Generalized least square: Covariance matrix is diagonalCovariance matrix is diagonal

For the case of simple linear regression, the weighted least-squares function is

Getting derivative with respect to 0 and 1 the resulting least-squares normal equations would become:

Exercise: Show the solutions of the above system is coincide with general formula, stated in the previous page.


2

0 1 0 11

( , )n

i i ii

S w y x

0 11 1 1

20 1

1 1 1

n n n

i i i i ii i i

n n n

i i i i i i ii i i

w w x w y

w x w x w x y

Chapter 6Chapter 6

Diagnostics for

Leverage and

Influence Regression II; Bu-Ali Sina UniversityRegression II; Bu-Ali Sina UniversityFall 2014Fall 2014 4949

Ch6: Diagnostics for Leverage and influenceCh6: Diagnostics for Leverage and influence

In this chapter, we present several diagnostics for leverage and influence.


y

ix

y

ix

• It has a noticeable impact on the model coefficients.

• It has a noticeable impact on the model coefficients.

• This point does not affect the estimates of the regression coefficients

• It has a dramatic effect on the model summary statistics such as R2 and the standard error of the regression coefficients

• This point does not affect the estimates of the regression coefficients

• It has a dramatic effect on the model summary statistics such as R2 and the standard error of the regression coefficients

Ch6: Diagnostics for Leverage and influence: Ch6: Diagnostics for Leverage and influence: importanceimportance

A regression coefficient may have a sign that does not make

engineering or scientific sense,

A regressor known to be important may be statistically insignificant,

A model that fits the data well and that is logical from an

application–environment perspective may produce poor predictions.

These situations may be the result of one or perhaps a few influential

observations. Finding these observations then can shed considerable

light on the problems with the model.


Ch6: Diagnostics for Leverage and influence: Ch6: Diagnostics for Leverage and influence: LeverageLeverage

The only measure is hat matrix. The hat matrix diagonal is a standardized measure of the distance of the ith observation from the center (or centroid) of the x space. Thus, large hat diagonals reveal observations that are leverage points because they are remote in x space from the rest of the sample.

Two problem with this rule:

2(K+1)>n, in this case the cut off does not apply,

Leverage points are potentially influential.


If hii>2¯h =2(K+1)/n then i

th observation is leverage

If hii>2¯h =2(K+1)/n then i

th observation is leverage

Ch6: Diagnostics for Leverage and influence: Ch6: Diagnostics for Leverage and influence: Measures of Measures of influenceinfluence


Measure Formula Rules

Cook’s D

• Di is not an F statistics but practically it can be compared with F,K+1,n-K-1,

• We consider Points with Di>1 to be influence.

DFBETASIf |DFBETTASj,i|>2/n then ith observation warrants examination

DFFITSIf |DFFITSi|>2(K+1)/n then ith observation warrants attention

( ) ( )

Re

ˆ ˆ ˆ ˆ

( 1)

i i

is

X XD

K MS

( ), 2

( )

ˆ ˆj j i

j i

i jj

DFBETASS C

( )

2( )

ˆ ˆi ii

i ii

y yDFFITS

S h

Ch6: Diagnostics for Leverage and influence: Ch6: Diagnostics for Leverage and influence: Cook’s DCook’s D

There are several equivalent formulas (Exercise)


22ˆvar 1

1 var 1 1ii ii

i ii ii

yr hD r

K e K h

This term is big if case i is unusual in y-directionThis term is big if case i is unusual in y-direction

This term is big if case i is unusual in x-directionThis term is big if case i is unusual in x-direction

( ) ( )

Re

ˆ ˆ ˆ ˆ

( 1)i i

is

y y y yD

K MS

It is the squared EuclideanDistance that the vector of fitted values moves when the ith observation is deleted

It is the squared EuclideanDistance that the vector of fitted values moves when the ith observation is deleted

Ch6: Diagnostics for Leverage and influence: Ch6: Diagnostics for Leverage and influence: DFBETASDFBETAS

There is an interesting computational formula for DFBETAS.

Let R=(X’X)-1X’ and r’j =[rj,1, rj,2,…,rj,n] denotes the jth row of R. Then, we can

write (Exercise)


,, 1

j i ij i

iij j

r tDFBETAS

h

r r

Is a measure of the impact of the ith observation on ˆjIs a measure of the impact

of the ith observation on ˆjThis term is big if case i is unusual in both sidesThis term is big if case i is unusual in both sides

Ch6: Diagnostics for Leverage and influence: Ch6: Diagnostics for Leverage and influence: DFFITSDFFITS

DFFITSi is the number of standard deviations that the fitted value i changes if observation i is removed. Computationally we may find (Exersice)

Note that, However, if hii≈0, the effect of R-student will be moderated. Similarly a near-zero R-student combined with a high leverage point could produce a small value of DFFITS.


1ii

i iii

hDFFITS t

h

Is the leverage of the ith observation.

Is the leverage of the ith observation.

This term is big if case i is an outlier

This term is big if case i is an outlier

Ch6: Ch6: Diagnostics for Leverage and influence: Diagnostics for Leverage and influence: A measure of model performanceA measure of model performance

The diagnostics Di , DFBETASj,i , and DFFITSi provide insight about the effect of observations on the estimated coefficients j and fitted values i. They do not provide any information about overall precision of estimation.

Since it is fairly common practice to use the determinant of the covariance matrix as a convenient scalar measure of precision, called the generalized variance, we could define the generalized variance of as


12ˆ ˆvarGV X X

Ch6: Diagnostics for Leverage and influence: Ch6: Diagnostics for Leverage and influence: A measure of model A measure of model performanceperformance

To express the role of the ith observation on the precision of estimation, we could define

Clearly if COVRATIOi > 1, the ith observation improves the precision of estimation, while if COVRATIOi < 1, inclusion of the ith point degrades precision. Computationally (Exercise):

Cutoff value for COVRATIO is not easy, but researchers suggest that if COVRATIOi > 1 + 3(K+1)/n or if COVRATIOi < 1 – 3(K+1)/n, then the ith point should be considered influential.


1 2( ) ( ) ( )

1

Re

; 1, 2,...,i i i

i

s

X X SCOVRATIO i n

X X MS

12( )

1Re

1

1

K

i

i Ks ii

SCOVRATIO

MS h

Chapter 7Chapter 7

Polynomial

Regression Models


Ch 7: Polynomial regression modelsCh 7: Polynomial regression models

Is a subclass of multiple regression, Example 1: the second-order polynomial in one variable Y=0+1x+ 2x2+

Example 2: the second-order polynomial in two variables Y=0+1x+ 2x2+ 11x12 +22x2

2 +12x1x2 +

Polynomials are widely used in situations where the response is

curvilinear,

Complex nonlinear relationships can be adequately modeled by

polynomials over reasonably small ranges of the x’s.

This chapter will survey several problems and issues associated with fitting polynomials.


Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: in one variables in one variables

In general, the kth-order polynomial model in one variable is

Y=0+1x+ 2x2+ … +kxk+

If we set xj = xj, j = 1, 2,.. ., k, then the above model becomes a multiple linear regression

model in the k regressors x1 , x2 ,. .. xk. Thus, a polynomial model of order k may be fitted

using the techniques studied previously.

Set E(Y|X=x)=g(x) be an unknown function. Using Taylor series expansion:

So, the polynomial models are also useful as approximating functions to unknown

and possibly very complex nonlinear relationships.


( ) ( )

0 0

( ) ( ) ( )! !

m mkm m

m j

x a x aY g x g a g a

m m

Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: in one variables in one variables

Example (Second-order model or Quadratic model):

Y=0+1x+ 2x2+

We often call β1 the linear effect parameter and β2 the quadratic effect parameter.

The parameter β0 is the mean of y when x = 0 if the range of the data includes x = 0.

Otherwise β0 has no physical interpretation.


| | | | | | | | | | |0 1 2 3 4 5 6 7 8 9 10 x

10 ╾ 9 ╾ 8 ╾ 7 ╾E(Y) 6 ╾ 5 ╾ 4 ╾ 3 ╾ 2 ╾ 1 ╾ 0 ╾

5-2x-0.25x2

Numerical exampleNumerical example

Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: Important consideration in fitting these models Important consideration in fitting these models

1. Order of the model: Keep the order of the model as low as possible

2. Model building strategy: Use forward selection or backward elimination

3. Extrapolation: extrapolation with polynomial models can be extremely hazardous


| | | | | | | | | | |0 1 2 3 4 5 6 7 8 9 10 x

10 ╾ 9 ╾ 8 ╾ 7 ╾E(Y) 6 ╾ 5 ╾ 4 ╾ 3 ╾ 2 ╾ 1 ╾ 0 ╾

5+2x-0.25x2

Region of original data Extrapolation

Example:

•If we extrapolate beyond the range of

the original data, the predicted

response turns downward.

•This may be at odds with the true

behavior of the system. But, In

general, a polynomial models may

turn in unanticipated and

inappropriate directions, both in

interpolation and in extrapolation.

Example:

•If we extrapolate beyond the range of

the original data, the predicted

response turns downward.

•This may be at odds with the true

behavior of the system. But, In

general, a polynomial models may

turn in unanticipated and

inappropriate directions, both in

interpolation and in extrapolation.


4. Ill –Conditioning I: This means that the matrix inversion calculations will

be inaccurate, and considerable error may be introduced into the parameter

estimates. Nonessential ill-conditioning caused by the arbitrary choice of

origin can be removed by first centering the regressor variables

5. Ill –Conditioning II : If the values of x are limited to a narrow range, there

can be significant ill-conditioning or multicollinearity in the columns of the X

matrix. For example, if x varies between 1 and 2, x2 varies between 1 and 4,

which could create strong multicollinearity between x and x2.



4. Example: Hardwood Concentration in Pulp and Tensile Strength of Kraft Paper


20 1 2( ) ( )y x x x x

2ˆ 45.295 2.546( 7.2632) 0.635( 7.2632)y x x Fitting:Fitting:

Testing:Testing:

0 2

1 2 0 2 1 0

Re1 2

0.01,1,16

: 0( , ) ( , )

: 0

105.45 8.53

R R

s

HSS SS

by FMS

H

F F

Diagnostic:Diagnostic: Residual analysis

Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: in two or more variables in two or more variables

In general, these models are straightforward extension of the model

with one variable . An example of a second-order in two variable is:

Y=0+1x1 + 2x2+ 11x12 +22x2

2 +12x1x2 +

Where, 1, 2 are linear effect parameters, 11, 22 are quadratic effect parameters and

12 is an interaction effect parameter.

This example, has received considerable attention, both from researchers and from

practitioners. The regression function of this example is called response surface.

Response surface methodology (RSM) is widely applied in industry formodeling the output response(s) of a process in terms of the important

controllable variables and then finding the operating conditions that optimize the response.



Example:


Observation Run order Temperature (T) Concentration (C ) conversion

1 4 200 15 43

2 12 250 15 78

3 11 200 25 69

4 5 250 25 73

5 6 189.65 20 48

6 7 260.35 20 76

7 3 225 12.93 65

8 1 225 27.07 74

9 8 225 20 76

10 10 225 20 79

11 9 225 20 83

12 2 225 20 81

x1 x2 y

-1 -1 43

1 -1 78

-1 1 69

1 1 73

-1.414 0 48

1.414 0 76

0 -1.414 65

0 1.414 74

0 0 76

0 0 79

0 0 83

0 0 81

1

225

25

Tx

2

20

5

Cx


Example: Central composite design is widely used for fitting RSM.


| | | |175 200 225 250 275 Temperature,

| | | |-2 -1 0 1 2 x1,

30 ╾

25 ╾

20 ╾

15 ╾

10

2 ╾

1 ╾

0 ╾

-1 ╾

-2

Runs at:Comers of square (x1,x2)=(-1,-1),(-1,1), (1,-1),(1,1)

Center of square (x1,x2)=(0,0),(0,0), (0,0),(0,0)

Axial of square (x1,x2)=(0,-1.414),(0,1.414), (-1.414,0),(1.414,0)


We fit the second order model:

Y=0+1x1 + 2x2+ 11x12 +22x2

2 +12x1x2 +

To do this, we have:


2 21 2 1 2 1 2

1 1 1 1 1 1 43

1 1 1 1 1 1 78

1 1 1 1 1 1 69

1 1 1 1 1 1 73

1 1.414 0 2 0 0 48

1 1.414 0 2 0 0 76

1 0 1.414 0 2 0 65

1 0 1.414 0 2 0 74

1 0 0 0 0 0 76

1 0 0 0 0 0 79

1 0 0 0 0 0 83

1 0 0 0 0 0 81

x x x x x x

X y

0 11 22

1

2

0 11 22

0 11 22

12 0 0 8 8 0 845

0 8 0 0 0 0 78.592

0 0 8 0 0 0 33.726,

8 0 0 12 4 0 511

8 0 0 4 12 0 541

0 0 0 0 0 4 31

ˆ ˆ ˆ12 8 8 845

ˆ8 78.592

ˆ8 33.726ˆˆ ˆ ˆ8 12 4 511

ˆ ˆ ˆ8 4 12

X X X y

β β β

β

βX X β X y

β β β

β β β

12

79.75

9.83

4.22ˆ8.88

5.135417.75ˆ4 31

β

β


So, the fitted model by coded variable is: the second order model:

ŷ=79.75+9.83x1 +4.22x2- 8.88x12 -5.13x2

2 -7.75x1x2

And in terms of the original data, the model is:

We use coded data for computation of sum of squares:


2 2

2 2

225 20 225 20 225 20ˆ 79.75 9.83 4.22 8.88 5.13 7.75

25 5 25 5 25 5

1105.56 8.0242 22.994 0.0142 0.20502 0.062

T C T C T Cy

T C T C TC

Source of variation SS D.F MS F P-valueRegression 1733.58 5 346.72 58.87 <0.0001

Residual 35.34 6 5.89

Total 1768.92 11

12 12

2 2 2 2

1 1

ˆˆ 43.96 79.11 67.89 72.04 48.11 75.90 63.54 75.46 79.75 79.75 79.75 79.75

ˆ 12 1733.57 ; 12 1768.92R i T ii i

y β X

SS y y SS y y


So, if we fit only linear model by coded variable, we have:


1 2

1 1 1 43

1 1 1 78

1 1 1 69

1 1 1 73

1 1.414 0 48

1 1.414 0 76

1 0 1.414 65

1 0 1.414 74

1 0 0 76

1 0 0 79

1 0 0 83

1 0 0 81

x x

X y

0

1

2

12 0 0 845

0 8 0 , 78.592

0 0 8 33.726

ˆ12 845 70.42ˆ ˆ ˆ8 78.592 9.83

ˆ 4.228 33.726

X X X y

β

X X β X y β β

β

Source of variation SS D.F MS F P-valueRegression 914.41 2 457.21 4.82 0.0377

Residual 854.51 9 94.95

Total 1768.92 11

12 12

2 2 2

1 1

1 1 1 1 1 1 1 1 1 1 1 1ˆˆ [70.42 9.83 4.22] 1 1 1 1 1.414 1.414 0 0 0 0 0 0

1 1 1 1 0 0 1.414 1.414 0 0 0 0

ˆ 56.37 76.03 64.81 84.47 56.52 84.32 64.45 76.39 70.42 70.42 70.42 70.42

ˆ 12 914.41 ;R i T ii i

y β X

y

SS y y SS y

212 1768.92y


As, the last four rows of the matrix X in page 64 are the same, we can divide the SSRes ibto two components and do a

lack of fit test. We have:


Source of variation SS D.F MS F P-value

Regression (SSR) SSR(1, 2|0) SSR(11, 22, 12 |1, 2, 0)=SSR-SSR(1, 2|0)

1733.58(914.4)(819.2)

5(2)(3)

346.72(457.2)(273.1)

58.87 <0.0001

Residual Lack of fit Pure error

35.34(8.5)(26.8)

6(3)(3)

5.89(2.83)(8.92)

0.3176 0.8120

Total 1768.92 11

2

2 2 2 2 2 29

2 2PE 9 LOF Re PE

1

76 79 83 8110 for 1,...,8 but 76 79 83 81

3 4

( 1) (4 1) 26.75 35.34 26.75 8.59

j

m

i i si

S j S

SS n S S SS SS SS


As the quadratic model is significant for the data, we can do tests on the individual

variables to drop out unimportant terms, if there is any. We use the following statistics

Where Cjj are diagonal entities of the matrix (XX’)-1:


Re

ˆ ˆ ˆ

ˆ ˆˆvar

j j jj

jj sj j

β β βt

C MSS β β

Variable Estimated coefficient Standard error

t P-value

Intercept 79.75 1.21 65.72

x1 9.83 0.86 11.45 0.0001

x2 4.22 0.86 4.913 0.0027

x12 -8.88 0.96 -9.25 0.0001

x22 -5.13 0.96 -5.341 0.0018

x1x2 -7.75 1.21 -6.386 0.0007

1

1 1 10 0 0

4 8 81

0 0 0 0 08

10 0 0 0 0

81 5 1

0 0 08 32 321 1 5

0 0 08 32 32

10 0 0 0 0

4

X X

15.89

4


Generally we prefer to fit the full quadratic model whenever possible, unless there are large

differences between the full and the reduced model in terms of PRESS and adjusted R2

ŷ=79.75+9.83x1 +4.22x2- 8.88x12 -5.13x2

2 -7.75x1x2

Using equation we have:

Note that, all 8 runs 1 to 8 have the same h ii as these points are

Equidistant form the center of the design. In addition, all last

four runs have hii=0.25.


x1 x2 y ŷ ei hii ti e[i]

-1 -1 43 43.96 -0.96 0.625 -0.67 -2.55

1 -1 78 79.11 -1.11 0.625 -0.74 -2.95

-1 1 69 67.89 1.11 0.625 0.75 2.96

1 1 73 72.04 0.96 0.625 0.65 2.56

-1.414 0 48 48.11 -0.11 0.625 -0.07 -0.29

1.414 0 76 75.90 0.10 0.625 0.07 0.28

0 -1.414 65 63.54 1.46 0.625 0.98 3.89

0 1.414 74 75.46 -1.46 0.625 0.99 -3.90

0 0 76 79.75 -3.75 0.250 -1.78 -5.00

0 0 79 79.75 -0.75 0.250 -0.36 -1.00

0 0 83 79.75 3.25 0.250 1.55 4.33

0 0 81 79.75 1.25 0.250 0.59 1.67

1

ii i ih XX x x

11

1 1 10 0 0

4 8 81

0 0 0 0 0 1811

0 0 0 0 0181 1 1 1 1 1 0.625

11 5 10 0 0

8 32 32 11 1 5 10 0 08 32 32

10 0 0 0 0

4

h

R2=0.98 , R2Adj=0.96 , R2

Predicted=0.94



Normality is hold,

because:

Normality is hold,

because:

Variance is stable,

because:

Variance is stable,

because:

independence is hold,

because:

independence is hold,

because:

Ch 7: Polynomial regression models: Ch 7: Polynomial regression models: Orthogonal polynomial Orthogonal polynomial

Consider the kth-order polynomial model in one variable as

Y=0+1x+ 2x2+ … +kxk+

Generally the columns of the X matrix will not be orthogonal. One

approach to deal with this problem is orthogonal polynomial. In this

approach we fit the following model:

Y=0+ 1 P1(x)+ 2P1(x)+ … + kPk(x)+

Where Pj(x) is jth-order orthogonal polynomial defined such as:


01

( ) ( ) 0 , ; ( ) 1 , 1,...,n

r i s i ii

P x P x r s P x i n


With this model we have

Pj(x) can be determined by gram-Schmidt process. In the cases where the level of x are equally spaced we have:


20

10 1 1 1 1

20 2 1 2 2 1 1

12

10 12

1

( ) 0 0

( ) ( ) ( )( )

( ) ( ) ( ) 0 ( )) 0ˆ , 0,...,

( )( ) ( ) ( )

0 0 ( )

n

ii

nk n

j i ik i i

i j n

j iin n k n n

k ii

P x

P x P x P xP x y

P x P x P x P xX X X α j k

P xP x P x P x

P x

0

11

2 2

22

3 2

33

2 24 2 2

44

( ) 1

1( )

1 1( )

12

1 3 7( )

20

3 1 91 3 13( )

14 560

i

ii

ii

i ii

i ii

P x

x xP x

λ d

x x nP x

λ d

x x x x nP x

λ d d

n nx x x x nP x

λ d d


Gram-Schmidt process: Consider an arbitrary set S={U1,…,Uk} and

denote by 〈 Ui , Uj 〉 the inner product of Ui and Uj. Then the set

S’={V1,…,Vk} are orthogonal when computed as bellow:


1 1

2 12 2 1

1 1

3 2 3 13 3 2 1

2 2 1 1

1

1

,

,

, ,

, ,

,

,

kk j

k k jj j j

V U

U VV U V

V V

U V U VV U V V

V V V V

U VV U V

V V

〈〉〈〉〈〉〈〉〈〉〈〉

〈〉〈〉

NormalizingNormalizing 1,...,,

jj

j j

Ve j k

V V

〈〉


In polynomial regression with one variable assume that Ui =xi-1 . Now, applying

Gram-Schmidt process, we have:

If the levels of x are equally spaced we have:

So, in this case we have:

Exercise : Give a proof for other Pj(x) in page 72 by similar method.

Note: every arbitrary constant can be substituted by j in Pj(x).


2

,11

,1V x x x

x〈〉1〈〉

NormalizingNormalizing

21

22 2

1

( ), n

ii

V x xP x

V Vx x

〈〉

2 2 2

2 2 20

1 1 1

1 1 1( 1) 0 1

2 2 12

n n n

ii i i

n n nx x x d i x d d i nd

1 2 212

1 1( )

1 112 12

x x x x x xP x

d dn nd n n


Example:


x y50 335

75 326

100 316

125 313

150 311

175 314

200 318

225 328

250 337

275 345

i P0(x)=1

1 1 -9 335

2 1 -7 326

3 1 -5 316

4 1 -3 313

5 1 -1 311

6 1 1 314

7 1 3 318

8 1 5 328

9 1 7 337

10 1 9 345

xi-xi-1 =25 for all i, so the levels of X

are equally spaced and we have

xi-xi-1 =25 for all i, so the levels of X

are equally spaced and we have

2

2

99( ) 0.5 5.5

12iP x i

1( ) 2 5.5iP x i

10 10 10

0 1 21 1 1

0 1 210 10 102 2 2

0 1 21 1 1

( ) ( ) ( )ˆ ˆ ˆ324.3 , 0.74 , 2.8

( ) ( ) ( )

i i i i i ii i i

i i ii i i

P x y P x y P x yα y α α

P x P x P x


Then, the fitted model is:



Regression (SSR) Linear Quadratic

1213.43181.891031.54

2(1)(1)

606.72(181.89)(1031.54)

159.2447.74270.75

<0.0001<0.0002<0.0001

Residual 26.67 7 3.81

Total 1240.1 9

1 2

2

2

324 3 0 74 2 8

99324 3 1 48 5 5 1 4 5 5

12

346 96 13 92 1 4

i i iy . . P ( x ) . P ( x )

. . i . . i .

. . i . i

Chapter 8Chapter 8

Indicator Variables


Ch 8: Indicator VariablesCh 8: Indicator Variables

The variables employed in regression analysis, are often quantitative variables: Example: temperature, distance, income

These variables have well defined scale of measurement

In some situation it is necessary to use qualitative or categorical variables, as

predictor variables Example: sex, operators, employment status,

In general, these variables have no natural scale of measurement,

Question: How we can account for the effect that these variables may have on the response?

This is done through the use of indicator variable. Sometimes, indicator variables are called dummy variables .


Ch 8: Indicator Variables: Example 1Ch 8: Indicator Variables: Example 1

Y=life of cutting tool: X1 = lathe speed per minute;

X2 = Type of cutting tool; is qualitative and has two levels (e.g. tool types A and B)

Let

Assuming that a first-order model is appropriate, we have:Y=β0+ β1x1+ β2x2+ϵ


0 if the observation is from tool types AX2= 1 if the observation is from tool types B

Y=β0+ β1x1+ β2(0)+ϵ=β0+ β1x1+ϵ Y=β0+ β1x1+ β2(1)+ϵ=(β0+β2)+ β1x1+ϵA ← tool type →BA ← tool type →B

Ch 8: Indicator VariablesCh 8: Indicator Variables


50

β0+β2

β0

0

500 1000 Lathe speed, x1 (RPM)

E(Y|x2=0)=β0+ β1x1 , tool type A

E(Y|x2=1)=β0+ β2 + β1x1 , tool type B↑β2

↓ β1

β1

• Regression lines are parallel;

• 2 is a measure of the difference

in mean tool life resulting from

changing from tool type A to tool

type B;

• Variance of the error is assumed

to be the same for both tool types

A and B.

• Regression lines are parallel;

• 2 is a measure of the difference

in mean tool life resulting from

changing from tool type A to tool

type B;

• Variance of the error is assumed

to be the same for both tool types

A and B.

Ch 8: Indicator Variables: Example 2Ch 8: Indicator Variables: Example 2

Consider again the example 1 , but here assume that X2 = Type of cutting tool; is qualitative and has three levels (e.g. tool types A, B and C)

Define two indicator variables:

Assuming that a first-order model is appropriate, we have:

Y=β0+ β1x1+ β2x2+ β3x3+ ϵ=


(0,0) if the observation is from tool types A

(x2,x3)= (1,0) if the observation is from tool types B

(0,1) if the observation is from tool types C

β0+ β1x1+ β2(0)+ β3(0)+ ϵ=β0+ β1x1+ϵ tool type=A

β0+ β1x1+ β2(1)+ β3(0)+ ϵ=(β0+β2)+ β1x1+ϵ tool type=B

β0+ β1x1+ β2(0)+ β3(1)+ ϵ=(β0+β3)+ β1x1+ϵ tool type=C

In general, a qualitative variable with l levels is represented by l -1 indicator variables, each taking on the values 0 and 1.

In general, a qualitative variable with l levels is represented by l -1 indicator variables, each taking on the values 0 and 1.

Ch 8: Indicator Variables: Ch 8: Indicator Variables: Numerical exampleNumerical example


yi xi1 xi2

18.73 610 A

14.52 950 A

17.43 720 A

14.54 840 A

13.44 980 A

24.39 530 A

13.34 680 A

22.71 540 A

12.68 890 A

19.32 730 A

30.16 670 B

27.09 770 B

25.40 880 B

26.05 1000 B

33.49 760 B

35.62 590 B

26.07 910 B

36.78 650 B

34.95 810 B

43.67 500 B

We fit model Y=β0+ β1x1+ β2x2+ϵ

0 1 2

0 1 2

0 1 2

20 15010 10 490.38

15010 11717500 7540 , 356515.7

10 7540 10 319.28

ˆ ˆ ˆ20 15010 10 490.38

ˆ ˆ ˆ ˆ15010 11717500 7540 356515.7

ˆ ˆ ˆ10 7540 10 319.28

36.99ˆ

X X X y

β β β

X X β X y β β β

β β β

β

0.03

15.00

20 20

2 2 2 2

1 1

1 1 1 1 1 1ˆˆ [36.99 0.03 15] 610 950 730 670 770 500

0 0 0 1 1 1

ˆ 20.76 11.71 38.69

ˆ 20 1418.03 ; 20 1575.09R i T ii i

y β X

y

SS y y SS y y

Ch 8: Indicator Variables: Ch 8: Indicator Variables: Numerical exampleNumerical example

Then, the fitted model is:



Regression (SSR) 1418.03 2 709.02 79.75 <0.0001

Residual 157.06 17 9.24

Total 1575.09 19

1 236 99 0 03 15 00iy . . x . x

Variable Estimated coefficient Standard error

t P-value

Intercept 36.99

x1 -0.03 0.005 -5.89 <0.00001

x2 15 1.360 11.04 <0.00001

Ch 8: Indicator Variables: Ch 8: Indicator Variables: Comparing regression modelsComparing regression models

Consider the case of simple linear regression where the n observations can be formed into M groups, with the

m th group having nm observations. The most general model consists of M separate equations such as:

Y=0m+1mx+ , m=1,2,…,M

It is often of interest to compare this general model to a more restrictive one

Indicator variables are helpful in this regard. Using indicator variables we can write:

Y=(01+11x)D1 +(02+12x)D2 +…+(0M+1Mx)DM +

Where Di is 1 when group i is selected. We call this model as full model (FM). In this model we have

2M parameters and so degree of freedom for SSRes(FM) is n-2M.

Exercise: Let SSRes(FMm) denotes sum of square of residual in model Y=0m+1mx+. Show

that SSRes(FM)= SSRes(FM1)+ SSRes(FM2)+…+ SSRes(FMM)

We consider three cases:1) Parallel lines: 11=12 =…= 1M

2) Concurrent lines: 01=02 =…= 0M

3) Coincide lines: 11=12 =…= 1M and 01=02 =…= 0M


Ch8: Indicator Variables: Ch8: Indicator Variables: parallel linesparallel lines

In the parallel lines all M slopes are identical but the intercepts may differ. So, here we want to test:

H0:11=12 =…= 1M =1

Recall that this procedure involves fitting a full model (FM) and a reduced model(RM) restricted to the null hypothesis

and computing the F statistics

Under H0 the full model will be reduced to the following model

Y=01+1x +2D2 +…+MDM +

In this model dfRM=n-(M+1)

Therefore, using the above F statistics we can test hypothesis H0.


FM RM FM

Res Res

FM RM0 1 , ,

Res

FM

(FM) (RM)

H is rejected when(FM) df df df

SS SS

df dfF F F

SS

df

Analysis of covarianceAnalysis of covariance

Ch 8: Indicator Variables: Ch 8: Indicator Variables: Concurrent and coincide linesConcurrent and coincide lines


In the concurrent lines all M intercepts are identical but the slopes may differ:

H0:01=02 =…= 0M =0


Y=0+1x +2 xD2 +…+M xDM +


In this way, similar to parallel lines, we can test hypothesis H0 using the above F statistics.

In the concurrent lines all M intercepts are identical but the slopes may differ:

H0:01=02 =…= 0M =0


Y=0+1x +2 xD2 +…+M xDM +



In the coincide lines we want to test:

H0:01=02 =…= 0M =0 and 11=12 =…= 1M =1

Under H0 the full model will be reduced to the simple model

Y=0+1x +

In this model dfRM=n-2


In the coincide lines we want to test:

H0:01=02 =…= 0M =0 and 11=12 =…= 1M =1

Under H0 the full model will be reduced to the simple model

Y=0+1x +

In this model dfRM=n-2


Ch 8: Indicator Variables: Ch 8: Indicator Variables: Regression approach to analysis varianceRegression approach to analysis variance

Consider a one way model:

yij=+i+ij =i+ij , i=1,…,k; j=1,2,…,n

In the fixed effect case:

H0:1= 2 =…= k =0

H1: 10 at least for one i


Source of variation

SS Df MS F

Treatment K-1 ssT/(k-1)

Error K(n-1) SSRes /k(n-1)

Total Kn-1

2

. ..1

k

ii

n y y

2

.1 1

k n

ij ii j

y y

2

..1 1

k n

iji j

y y


Equivalent regression model for the one way model:

yij=i+ij , i=1,…,k; j=1,2,…,n

is:

Yij=0+1x1j + 2x2j+ …+k-1xk-1,j +ij

where

Relationship between two models:

0=k

i= i -k ; i=1,…,k

Exercise: Find the relationship among Sum of Squares in regression and one way Anova.


1 if the observation j is from treatment iXij= 0 otherwise


For the case k=3 we have:


1 2

11

12

13

0 021 ..

22 1 1. 1

23 2.2

31

32

33

1 1 0

1 1 0

1 1 0ˆ ˆ9 3 31 0 1

ˆ ˆ ˆ( ) 3 3 01 0 1

ˆ ˆ3 0 31 0 1

1 0 0

1 0 0

1 0 0

x x

y

y

yβ βy y

X y y X X β X y β y β

y yβ βy

y

y

3.

1. 3.

2. 3.2

y

y y

y y

H0:1= 2 = 3 =0

H1: 10 at least for one i

H0:0= and 1= 2=0

H1: 10 or 10 or bothequivalentequivalent

regression ii dr. rahim mahmoudvand department of statistics, bu-ali sina university...

Documents

buali sina universityfall

buali sina universitych4

partial regression plots

buali sina universitychapter

regression iidr

plot of residuals

partial residual plots

usagepartial residual