advanced regression topics: violation of assumptions · advanced regression topics: violation of...

Lecture #7 - 2/15/2005 Slide 1 of 36

Advanced Regression Topics:Violation of Assumptions

Lecture 7

February 15, 2005Applied Regression Analysis

Overview

Today’s Lecture

Introductory Example

Nonconstant Variance

Nonlinearity

Optimal Transformations

Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 2 of 36

Today’s Lecture

■ Revisiting residuals.

◆ Outliers aside, what other things are important to look for:

■ Nonconstant variance.

■ Nonlinearity.

■ Nonnormality.

Overview


Snow geese

Regression Analysis

Assumptions

Residual Plot


Nonlinearity


Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 3 of 36

Snow Geese

From Weisberg (1985, p. 102):

“Aerial survey methods are regularly used to estimated thenumber of snow geese in their summer range areas west ofHudson Bay in Canada. To obtain estimates, small aircraft flyover the range and, when a flock of geese is spotted, anexperienced person estimates the number of geese in theflock. To investigate the reliability of this method of counting,an experiment was conducted in which an airplane carryingtwo observers flew over 45 flocks, and each observer made anindependent estimate of the number of birds in each flock.Also, a photograph of the flock was taken so that an exactcount of the number of birds in the flock could be made (databy Cook and Jacobsen, 1978).”

Lecture #7 - 2/15/2005 Slide 4 of 36

Snow Geese

Lecture #7 - 2/15/2005 Slide 5 of 36

Hudson Bay

Overview


Snow geese

Regression Analysis

Assumptions

Residual Plot


Nonlinearity


Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 6 of 36

Regression Analysis

■ Using the first observer in the plane, we consider therelationship between this person’s count and that from thephotograph:

0 100 200 300 400 500

observer 1 count

0

100

200

300

400

ph

oto

co

un

t

WWW

WWWW

WW

WW

WWW

W

W

W

W

W

W

WW

W

W

WW

W

W

W

W

W

W

W

W

W

WW

WW

W

W

W

WW

W

Overview


Snow geese

Regression Analysis

Assumptions

Residual Plot


Nonlinearity


Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 7 of 36

Regression Analysis

■ One way of analyzing these data is to fit a regression thatattempts to predict the count in the photo from the count bythe observer.

■ Using SPSS, this regression was estimated, giving thefollowing statistics:

Coefficient Estimate SE t p-value

a - intercept 26.65 8.61 3.09 0.003

b - slope 0.88 0.08 11.37 < 0.001

Statistic Estimate

SSreg 254,769.50SSres 84,790.18R2 0.750

Overview


Snow geese

Regression Analysis

Assumptions

Residual Plot


Nonlinearity


Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 8 of 36

Assumptions

■ But, remember, before we can interpret these results, wemust first check our assumptions.

■ Assumptions of regression analyses revolve around theresiduals, e = Y − Y ′ = Y − (a + bX).

■ In particular, we specified that all residuals were:

◆ Independent (or non-correlated).

◆ Identically distributed.

◆ Distribution was normal, with:

■ A zero mean.

■ A constant variance.

e ∼ N(0, σ2e)

Overview


Snow geese

Regression Analysis

Assumptions

Residual Plot


Nonlinearity


Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 9 of 36

Residual Plot

■ From a previous lecture, recall that an easy way to checkassumptions was to look at a plot of the standardizedresiduals against the unstandardized predicted values:

100.00000 200.00000 300.00000 400.00000

Unstandardized Predicted Value

−2.00000

0.00000

2.00000

Sta

nd

ard

ize

d R

esid

ua

l

WW

W

WWW

W WWWWWWW

W

W

WW

W

WWW

W W

W

W

W

W

W

WW

W

WWW

W

W

WW

W

W

WWW

W

■ Do you notice any problems from this plot?

Overview



Detection

Example Test

Remedies

Alternate Estimation

Transformations

Nonlinearity


Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 10 of 36


■ One of the primary assumptions in a linear regression is thatvar(ei) = σ2

e for all i = 1, . . . , N observations.

100.00000 200.00000 300.00000 400.00000


−2.00000

0.00000

2.00000

Sta

nd

ard

ize

d R

esid

ua

lWW

W

WWW

W WWWWWWW

W

W

WW

W

WWW

W W

W

W

W

W

W

WW

W

WWW

W

W

WW

W

W

WWW

W

■ In our example, this assumption is clearly violated.

Overview



Detection

Example Test

Remedies


Transformations

Nonlinearity


Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 11 of 36

Nonconstant Variance Detection

■ Detecting nonconstant variance is often accomplished byexamining the residual plot.

■ However, using visual inspection can lead to some problems:

◆ Subjective interpretation, relying on experience.

◆ “How much is too much?” Nonconstant variance is really amatter of degree.

■ For this reason, one can construct a statistical hypothesistest for the constancy of variance.

Overview



Detection

Example Test

Remedies


Transformations

Nonlinearity


Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 12 of 36

Nonconstant Variance Detection Test

■ Note that var(ei) is caused by:

◆ The response, y.

◆ The predictors X.

◆ Some other quantity not involved in the regression, suchas:

■ Observations over time.

■ Observations related by space (spatial orientation).

■ Any (or all) of these features can be put into a large matrix,Z, so each observation i has a row vector zi.

Overview



Detection

Example Test

Remedies


Transformations

Nonlinearity


Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 13 of 36

Nonconstant Variance Detection Test

■ Given our suspected cause of nonconstancy of variance foreach observation zi, we can assume:

var(ei) = σ2e [exp(λ′zi)]

■ This is a very technical way of making variance a function ofother variables.

■ This form specifies the following constraints on our variance:

1. var(ei) > 0 for all observations zi.

2. The variance depends on zi and λ, but only because inthe linear function λ

′zi.

3. var(ei) is monotonic (either increasing or decreasing)across each component of zi.

4. If λ = 0, then var(ei) = σ2e for all i.

Overview



Detection

Example Test

Remedies


Transformations

Nonlinearity


Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 14 of 36

Nonconstant Variance Detection Test: Steps

■ STEP 0: Determination of what is causing nonconstantvariance.

◆ Let’s go back to our geese data example, and assume thenonconstancy in variance was due to the predictor.

■ Human’s have a more difficult time detecting the numberof geese consistently as the number observed gets verylarge.

◆ Because we feel that the nonconstancy in variability iscaused by our predictor variables, we will construct Z fromX.

◆ Note that X has a column vector of ones for the intercept.

Overview



Detection

Example Test

Remedies


Transformations

Nonlinearity


Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 15 of 36


1. Estimate regression line for original model (Y ′ = a + bX),and save the unstandardized residuals for each observation:

ei = Yi − (a + bXi).

2. For each observation, compute scaled squared residuals,ui:

ui =e2i

σ2e

,

where σ2e is the ML estimate of σ2

e (differing from MSerror

because of denominator of N rather than N − k − 1):

σ2e =

∑N

i=1 e2i

N

Overview



Detection

Example Test

Remedies


Transformations

Nonlinearity


Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 16 of 36


3. Compute the regression of ui onto zi.

■ Obtain from this regression the SSreg.

■ Obtain the dfreg, where this is the number of predictors inZ (not including the intercept).

4. Compute the Score statistic (using SSreg from step 3):

S =SSreg

2

5. Test the hypothesis that λ = 0 by obtaining a p-value for S,which is distributed χ2(dfreg) where dfreg is from step 3.

Overview



Detection

Example Test

Remedies


Transformations

Nonlinearity


Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 17 of 36

From Our Example

1. Original regression estimates found from SPSS:Analyze...Regression...Linear.■ Save unstandardized residuals from Save button menu.

2. For each observation, compute scaled squared residuals,

ui =e2

i

σ2e:

■ Compute σ2e =

Ni=1 e2

i

N.

◆ In SPSS: Transform...Compute, and make a newvariable that is the squared value of the unstandardizedresidual.

◆ Then find average of the new variable fromAnalyze...Descriptive Statistics...Descriptives.

◆ Alternative: take SSres from step 1 output and divide byN .

◆ σ2e = 1, 884.23.

■ Compute ui in SPSS by going to Transform...Compute.

Overview



Detection

Example Test

Remedies


Transformations

Nonlinearity


Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 18 of 36

From Our Example

3. Compute the regression of ui onto zi.

■ In SPSS: Analyze...Regression...Linear.

4. Compute the Score statistic S = SSreg2 , using SSreg from

step 3. Get this from SPSS output.

■ S = 162.82/2 = 81.41

■ This has dfreg = 1.

5. Get p-value for S.

■ In Excel type “=chidist(81.41,1)”.

■ p < 0.001.

Based on these results, we reject the null hypothesis ofconstant variance, and conclude that our example violates theconstant variance assumption.

Overview



Detection

Example Test

Remedies


Transformations

Nonlinearity


Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 19 of 36

Nonconstant Variance...Now What?

■ We found in our example that we have statistical evidence fornonconstant variance.

■ The biggest result of nonconstant variance is that ourregression line does not accurately represent all cases in oursample.

■ Also a problem is that the hypothesis tests we use are basedon the assumption that ei ∼ N(0, σ2

e).

■ When nonconstant variance is found, two options arepossible:

1. Estimate the regression using alternate methods.

◆ Weighted least squares.◆ Median regression.

2. Transform either the response or the predictors.

Overview



Detection

Example Test

Remedies


Transformations

Nonlinearity


Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 20 of 36

Remedy #1: Alternate Estimation Algorithms

■ Much like the mean is extremely sensitive to highly skewed(outlying) observations, least squares estimates have what iscalled a high “breakdown” point.

■ Instead of finding regression parameters that minimize:

N∑

i=1

(Y − Y ′)2,

alternative optimization criteria exist.

Overview



Detection

Example Test

Remedies


Transformations

Nonlinearity


Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 21 of 36

Alternate Estimation Algorithms

■ Two possible alternatives:

◆ Weighted Least Squares (WLS):

N∑

i=1

wi(Y − Y ′)2

■ Can be performed in SPSS.

◆ Minimum absolute deviation:

N∑

i=1

|Y − Y ′|

■ Much more technical.

■ Simplex optimization method involves linearprogramming.

Overview



Detection

Example Test

Remedies


Transformations

Nonlinearity


Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 22 of 36

Remedy #2: Variance Stabilizing Transformations

■ The second (and perhaps most commonly used) remedy fornonconstant variance is to transform the response variableY .

(Weisberg, 1985; p. 134)

Transformation Situation Reason√

Y var(ei) ∝ E(Yi) Poisson counts√

Y +√

Y + 1 ’ ’ Poisson, small Y values

ln(Y ) var(ei) ∝ [E(Yi)]2 Broad range of Y

ln(Y + 1) ’ ’ Some Y are zero1Y var(ei) ∝ [E(Yi)]

4 Y bunched near zero1

(Y +1)’ ’ Some Y are zero

sin−1(√

Y ) var(ei) ∝ E(Yi)(1 − E(Yi)) Binomial proportions

Overview



Nonlinearity

Detection

Remedy


Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 23 of 36

Nonlinear Relationship Between Predictor(s) and Y

■ Situations occur where a non-linear relationship betweenpredictors and Y is present.

■ For example, imagine that the true relationship between Yand X is something like:

Y = aXb

■ To use linear regression techniques we are well aware of,this function must be transformed (both Y and X):

ln(Y ) = ln(a) + b ln(X)

Overview



Nonlinearity

Detection

Remedy


Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 24 of 36

Nonlinear Relationship Between Predictor(s) and Y

■ Depending on the situation, not all functions can be madelinear:

Y = a1eb1X1 + a2e

b2X2

■ Furthermore, depending on the error dependency(multiplicative or additive), transformations will not lead toerrors with the distributional assumptions of linearregression.

■ Linear regression can only go so far, so if data have afunctional relationship other methods may be better suited.

Overview



Nonlinearity

Detection

Remedy


Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 25 of 36

Detection of Nonlinearity

■ Often, nonlinearity is detected visually, through use of theresidual plots.

5.00000 5.50000 6.00000 6.50000 7.00000


−1.00000

0.00000

1.00000S

tan

da

rdiz

ed

Re

sid

ua

l

W

W

W

W

W

W

W

W

W

W

W

W

W

W

W

W

W

W

W

W

Overview



Nonlinearity

Detection

Remedy


Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 26 of 36

Possible Remedy for Nonlinearity

■ As discussed, a possible remedy for nonlinearity is to use atransformation of both Y and X.

◆ Situations occur where a non-linear relationship betweenpredictors and Y is present.

(Weisberg, 1985; p. 142)

Y Transformation X Transformation Form

ln(Y ) ln(X) Y = aXb11 X

b22 . . . X

bkk

ln(Y ) X Y = aeb1X1+b2X2+...+bkXk

Y ln(X) Y = a + b1 ln(X1) + b2 ln(X2) + . . . + ln(bk)Xk1Y

1X

Y = 1a+(b1/X1)+(b2/X2)+...+(bk/Xk)

1Y

X Y = 1a+(b1X1)+(b2X2)+...+(bkXk)

Y 1X

Y = a + (b11

X1+ b2

1X2

+ . . . + bk1

Xk

Overview



Nonlinearity


Transformation Parameter 1


Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 27 of 36

Parameterizing Transformations

■ Instead of choosing some type of transformation functionseemingly arbitrarily, statistical techniques have beendeveloped to transform both Y and the set of all predictors Xbased on known functions.

■ These techniques bear mention because from time to timeyou will encounter estimates based on these.

■ Furthermore, a clear functional relationship between Y andX may not be known, either from substantive theory orempirical results.

■ In these situations, linear methods are often easier to relyupon because of parsimony (real or perceived).

Overview



Nonlinearity




Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 28 of 36

Transformation of Y : Optimization

■ Consider the family of regression models (power models):

yλ = Xb + e.

■ Finding λ gives an idea of the power relationship betweenthe response and the predictor variables.

■ Such transformations are often referred to as Box-Coxtransformations, and involve iterative techniques to find themost likely value of λ.

■ Another transformation method is called Atkinson’s scoremethod.

Overview



Nonlinearity




Nonnormality

Wrapping Up

Lecture #7 - 2/15/2005 Slide 29 of 36

Transformation of X: Optimization

■ Similar methods have been developed for transforming X.

■ Transforming X is inherently more difficulty.

■ Often times absurd results can be the outcome.

■ Many times non-significant relationships obscure anyneeded transformations.

Overview



Nonlinearity


Nonnormality

Detection 1: Plots

Detection 2: Tests

Special Cases

Wrapping Up

Lecture #7 - 2/15/2005 Slide 30 of 36

Nonnormality

■ The last assumption to be checked for is the normality of theresiduals.

■ Detecting nonnormality can be tricky, and is often basedupon sample size.

■ Statistically speaking, there is not a hypothesis test that canconclude that a variable is normally distributed.

■ Violations of this assumption lead to inaccurate p-values inhypothesis tests (only).

■ The p-values, however, are fairly robust to violations of thisassumption.

Overview



Nonlinearity


Nonnormality

Detection 1: Plots

Detection 2: Tests

Special Cases

Wrapping Up

Lecture #7 - 2/15/2005 Slide 31 of 36

Detection of Nonnormality: Probability Plots

■ The easiest way to detect nonnormal errors is to use a Q-Qplot.

■ A Q-Q plot is a plot of an ordered variable against what itsexpected value should be for a variable from a normaldistribution with size N.

■ In SPSS: Graphs...Q-Q (check Standardize values and besure Test distribution is Normal).

■ The plot of the data should fall on the line produced on theplot.

■ If not, the data are not from a normal distribution.

Overview



Nonlinearity


Nonnormality

Detection 1: Plots

Detection 2: Tests

Special Cases

Wrapping Up

Lecture #7 - 2/15/2005 Slide 32 of 36

Detection of Nonnormality: Probability Plots

Overview



Nonlinearity


Nonnormality

Detection 1: Plots

Detection 2: Tests

Special Cases

Wrapping Up

Lecture #7 - 2/15/2005 Slide 33 of 36

Detection of Nonnormality: Hypothesis Tests

■ Additionally, statistical hypothesis tests have been developedto test the null hypothesis that the data (in this case theresiduals) can from a normal distribution.

◆ Shapiro-Wilk test.

◆ Kolmogorov-Smirnov test.

■ In SPSS: get the unstandardized residuals and go toAnalyze...Descriptive Statistics...Explore.

◆ Put the residuals in the Dependent List box.

◆ Click on Plots and check “Normality plots with tests".

■ If p-value is less than some α, then reject the nullhypothesis...residuals are not from a normal distribution.

■ Little validity under small samples.

Overview



Nonlinearity


Nonnormality

Detection 1: Plots

Detection 2: Tests

Special Cases

Wrapping Up

Lecture #7 - 2/15/2005 Slide 34 of 36

Test for Correlated Errors

■ Finally, there is a hypothesis test for seriation effects in thepredictor variables.

■ The Durbin-Watson statistic tests for correlation betweenadjacent observations.

■ Really, this test is only valid if observations were made onequal time intervals.

■ In SPSS: Analyze...Regresssion...Linear.

◆ Click on the Statistics box.

◆ Under Residuals check Durbin-Watson.

Overview



Nonlinearity


Nonnormality

Wrapping Up

Moment of Zen

Next Class

Lecture #7 - 2/15/2005 Slide 35 of 36

Here it is: Your Moment of Zen

■ Regression diagnostics areimportant parts of ananalysis that must not beoverlooked.

■ Often times, inferences canbe wrong if assumptions ofthe regression have notbeen met.

■ Transformations can take forever, and may not get you closerto a good result with respect to assumptions.

■ Listen to your data, they are trying to tell you something.

Overview



Nonlinearity


Nonnormality

Wrapping Up

Moment of Zen

Next Class

Lecture #7 - 2/15/2005 Slide 36 of 36

Next Time

■ Bringing it all together: Case studies in regression.

advanced regression topics: violation of assumptions · advanced regression topics: violation of...

Documents