advanced regression topics: violation of assumptions · advanced regression topics: violation of...
TRANSCRIPT
Lecture #7 - 2/15/2005 Slide 1 of 36
Advanced Regression Topics:Violation of Assumptions
Lecture 7
February 15, 2005Applied Regression Analysis
Overview
Today’s Lecture
Introductory Example
Nonconstant Variance
Nonlinearity
Optimal Transformations
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 2 of 36
Today’s Lecture
■ Revisiting residuals.
◆ Outliers aside, what other things are important to look for:
■ Nonconstant variance.
■ Nonlinearity.
■ Nonnormality.
Overview
Introductory Example
Snow geese
Regression Analysis
Assumptions
Residual Plot
Nonconstant Variance
Nonlinearity
Optimal Transformations
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 3 of 36
Snow Geese
From Weisberg (1985, p. 102):
“Aerial survey methods are regularly used to estimated thenumber of snow geese in their summer range areas west ofHudson Bay in Canada. To obtain estimates, small aircraft flyover the range and, when a flock of geese is spotted, anexperienced person estimates the number of geese in theflock. To investigate the reliability of this method of counting,an experiment was conducted in which an airplane carryingtwo observers flew over 45 flocks, and each observer made anindependent estimate of the number of birds in each flock.Also, a photograph of the flock was taken so that an exactcount of the number of birds in the flock could be made (databy Cook and Jacobsen, 1978).”
Lecture #7 - 2/15/2005 Slide 4 of 36
Snow Geese
Lecture #7 - 2/15/2005 Slide 5 of 36
Hudson Bay
Overview
Introductory Example
Snow geese
Regression Analysis
Assumptions
Residual Plot
Nonconstant Variance
Nonlinearity
Optimal Transformations
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 6 of 36
Regression Analysis
■ Using the first observer in the plane, we consider therelationship between this person’s count and that from thephotograph:
0 100 200 300 400 500
observer 1 count
0
100
200
300
400
ph
oto
co
un
t
WWW
WWWW
WW
WW
WWW
W
W
W
W
W
W
WW
W
W
WW
W
W
W
W
W
W
W
W
W
WW
WW
W
W
W
WW
W
Overview
Introductory Example
Snow geese
Regression Analysis
Assumptions
Residual Plot
Nonconstant Variance
Nonlinearity
Optimal Transformations
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 7 of 36
Regression Analysis
■ One way of analyzing these data is to fit a regression thatattempts to predict the count in the photo from the count bythe observer.
■ Using SPSS, this regression was estimated, giving thefollowing statistics:
Coefficient Estimate SE t p-value
a - intercept 26.65 8.61 3.09 0.003
b - slope 0.88 0.08 11.37 < 0.001
Statistic Estimate
SSreg 254,769.50SSres 84,790.18R2 0.750
Overview
Introductory Example
Snow geese
Regression Analysis
Assumptions
Residual Plot
Nonconstant Variance
Nonlinearity
Optimal Transformations
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 8 of 36
Assumptions
■ But, remember, before we can interpret these results, wemust first check our assumptions.
■ Assumptions of regression analyses revolve around theresiduals, e = Y − Y ′ = Y − (a + bX).
■ In particular, we specified that all residuals were:
◆ Independent (or non-correlated).
◆ Identically distributed.
◆ Distribution was normal, with:
■ A zero mean.
■ A constant variance.
e ∼ N(0, σ2e)
Overview
Introductory Example
Snow geese
Regression Analysis
Assumptions
Residual Plot
Nonconstant Variance
Nonlinearity
Optimal Transformations
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 9 of 36
Residual Plot
■ From a previous lecture, recall that an easy way to checkassumptions was to look at a plot of the standardizedresiduals against the unstandardized predicted values:
100.00000 200.00000 300.00000 400.00000
Unstandardized Predicted Value
−2.00000
0.00000
2.00000
Sta
nd
ard
ize
d R
esid
ua
l
WW
W
WWW
W WWWWWWW
W
W
WW
W
WWW
W W
W
W
W
W
W
WW
W
WWW
W
W
WW
W
W
WWW
W
■ Do you notice any problems from this plot?
Overview
Introductory Example
Nonconstant Variance
Detection
Example Test
Remedies
Alternate Estimation
Transformations
Nonlinearity
Optimal Transformations
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 10 of 36
Nonconstant Variance
■ One of the primary assumptions in a linear regression is thatvar(ei) = σ2
e for all i = 1, . . . , N observations.
100.00000 200.00000 300.00000 400.00000
Unstandardized Predicted Value
−2.00000
0.00000
2.00000
Sta
nd
ard
ize
d R
esid
ua
lWW
W
WWW
W WWWWWWW
W
W
WW
W
WWW
W W
W
W
W
W
W
WW
W
WWW
W
W
WW
W
W
WWW
W
■ In our example, this assumption is clearly violated.
Overview
Introductory Example
Nonconstant Variance
Detection
Example Test
Remedies
Alternate Estimation
Transformations
Nonlinearity
Optimal Transformations
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 11 of 36
Nonconstant Variance Detection
■ Detecting nonconstant variance is often accomplished byexamining the residual plot.
■ However, using visual inspection can lead to some problems:
◆ Subjective interpretation, relying on experience.
◆ “How much is too much?” Nonconstant variance is really amatter of degree.
■ For this reason, one can construct a statistical hypothesistest for the constancy of variance.
Overview
Introductory Example
Nonconstant Variance
Detection
Example Test
Remedies
Alternate Estimation
Transformations
Nonlinearity
Optimal Transformations
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 12 of 36
Nonconstant Variance Detection Test
■ Note that var(ei) is caused by:
◆ The response, y.
◆ The predictors X.
◆ Some other quantity not involved in the regression, suchas:
■ Observations over time.
■ Observations related by space (spatial orientation).
■ Any (or all) of these features can be put into a large matrix,Z, so each observation i has a row vector zi.
Overview
Introductory Example
Nonconstant Variance
Detection
Example Test
Remedies
Alternate Estimation
Transformations
Nonlinearity
Optimal Transformations
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 13 of 36
Nonconstant Variance Detection Test
■ Given our suspected cause of nonconstancy of variance foreach observation zi, we can assume:
var(ei) = σ2e [exp(λ′zi)]
■ This is a very technical way of making variance a function ofother variables.
■ This form specifies the following constraints on our variance:
1. var(ei) > 0 for all observations zi.
2. The variance depends on zi and λ, but only because inthe linear function λ
′zi.
3. var(ei) is monotonic (either increasing or decreasing)across each component of zi.
4. If λ = 0, then var(ei) = σ2e for all i.
Overview
Introductory Example
Nonconstant Variance
Detection
Example Test
Remedies
Alternate Estimation
Transformations
Nonlinearity
Optimal Transformations
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 14 of 36
Nonconstant Variance Detection Test: Steps
■ STEP 0: Determination of what is causing nonconstantvariance.
◆ Let’s go back to our geese data example, and assume thenonconstancy in variance was due to the predictor.
■ Human’s have a more difficult time detecting the numberof geese consistently as the number observed gets verylarge.
◆ Because we feel that the nonconstancy in variability iscaused by our predictor variables, we will construct Z fromX.
◆ Note that X has a column vector of ones for the intercept.
Overview
Introductory Example
Nonconstant Variance
Detection
Example Test
Remedies
Alternate Estimation
Transformations
Nonlinearity
Optimal Transformations
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 15 of 36
Nonconstant Variance Detection Test: Steps
1. Estimate regression line for original model (Y ′ = a + bX),and save the unstandardized residuals for each observation:
ei = Yi − (a + bXi).
2. For each observation, compute scaled squared residuals,ui:
ui =e2i
σ2e
,
where σ2e is the ML estimate of σ2
e (differing from MSerror
because of denominator of N rather than N − k − 1):
σ2e =
∑N
i=1 e2i
N
Overview
Introductory Example
Nonconstant Variance
Detection
Example Test
Remedies
Alternate Estimation
Transformations
Nonlinearity
Optimal Transformations
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 16 of 36
Nonconstant Variance Detection Test: Steps
3. Compute the regression of ui onto zi.
■ Obtain from this regression the SSreg.
■ Obtain the dfreg, where this is the number of predictors inZ (not including the intercept).
4. Compute the Score statistic (using SSreg from step 3):
S =SSreg
2
5. Test the hypothesis that λ = 0 by obtaining a p-value for S,which is distributed χ2(dfreg) where dfreg is from step 3.
Overview
Introductory Example
Nonconstant Variance
Detection
Example Test
Remedies
Alternate Estimation
Transformations
Nonlinearity
Optimal Transformations
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 17 of 36
From Our Example
1. Original regression estimates found from SPSS:Analyze...Regression...Linear.■ Save unstandardized residuals from Save button menu.
2. For each observation, compute scaled squared residuals,
ui =e2
i
σ2e:
■ Compute σ2e =
Ni=1 e2
i
N.
◆ In SPSS: Transform...Compute, and make a newvariable that is the squared value of the unstandardizedresidual.
◆ Then find average of the new variable fromAnalyze...Descriptive Statistics...Descriptives.
◆ Alternative: take SSres from step 1 output and divide byN .
◆ σ2e = 1, 884.23.
■ Compute ui in SPSS by going to Transform...Compute.
Overview
Introductory Example
Nonconstant Variance
Detection
Example Test
Remedies
Alternate Estimation
Transformations
Nonlinearity
Optimal Transformations
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 18 of 36
From Our Example
3. Compute the regression of ui onto zi.
■ In SPSS: Analyze...Regression...Linear.
4. Compute the Score statistic S = SSreg2 , using SSreg from
step 3. Get this from SPSS output.
■ S = 162.82/2 = 81.41
■ This has dfreg = 1.
5. Get p-value for S.
■ In Excel type “=chidist(81.41,1)”.
■ p < 0.001.
Based on these results, we reject the null hypothesis ofconstant variance, and conclude that our example violates theconstant variance assumption.
Overview
Introductory Example
Nonconstant Variance
Detection
Example Test
Remedies
Alternate Estimation
Transformations
Nonlinearity
Optimal Transformations
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 19 of 36
Nonconstant Variance...Now What?
■ We found in our example that we have statistical evidence fornonconstant variance.
■ The biggest result of nonconstant variance is that ourregression line does not accurately represent all cases in oursample.
■ Also a problem is that the hypothesis tests we use are basedon the assumption that ei ∼ N(0, σ2
e).
■ When nonconstant variance is found, two options arepossible:
1. Estimate the regression using alternate methods.
◆ Weighted least squares.◆ Median regression.
2. Transform either the response or the predictors.
Overview
Introductory Example
Nonconstant Variance
Detection
Example Test
Remedies
Alternate Estimation
Transformations
Nonlinearity
Optimal Transformations
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 20 of 36
Remedy #1: Alternate Estimation Algorithms
■ Much like the mean is extremely sensitive to highly skewed(outlying) observations, least squares estimates have what iscalled a high “breakdown” point.
■ Instead of finding regression parameters that minimize:
N∑
i=1
(Y − Y ′)2,
alternative optimization criteria exist.
Overview
Introductory Example
Nonconstant Variance
Detection
Example Test
Remedies
Alternate Estimation
Transformations
Nonlinearity
Optimal Transformations
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 21 of 36
Alternate Estimation Algorithms
■ Two possible alternatives:
◆ Weighted Least Squares (WLS):
N∑
i=1
wi(Y − Y ′)2
■ Can be performed in SPSS.
◆ Minimum absolute deviation:
N∑
i=1
|Y − Y ′|
■ Much more technical.
■ Simplex optimization method involves linearprogramming.
Overview
Introductory Example
Nonconstant Variance
Detection
Example Test
Remedies
Alternate Estimation
Transformations
Nonlinearity
Optimal Transformations
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 22 of 36
Remedy #2: Variance Stabilizing Transformations
■ The second (and perhaps most commonly used) remedy fornonconstant variance is to transform the response variableY .
(Weisberg, 1985; p. 134)
Transformation Situation Reason√
Y var(ei) ∝ E(Yi) Poisson counts√
Y +√
Y + 1 ’ ’ Poisson, small Y values
ln(Y ) var(ei) ∝ [E(Yi)]2 Broad range of Y
ln(Y + 1) ’ ’ Some Y are zero1Y var(ei) ∝ [E(Yi)]
4 Y bunched near zero1
(Y +1)’ ’ Some Y are zero
sin−1(√
Y ) var(ei) ∝ E(Yi)(1 − E(Yi)) Binomial proportions
Overview
Introductory Example
Nonconstant Variance
Nonlinearity
Detection
Remedy
Optimal Transformations
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 23 of 36
Nonlinear Relationship Between Predictor(s) and Y
■ Situations occur where a non-linear relationship betweenpredictors and Y is present.
■ For example, imagine that the true relationship between Yand X is something like:
Y = aXb
■ To use linear regression techniques we are well aware of,this function must be transformed (both Y and X):
ln(Y ) = ln(a) + b ln(X)
Overview
Introductory Example
Nonconstant Variance
Nonlinearity
Detection
Remedy
Optimal Transformations
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 24 of 36
Nonlinear Relationship Between Predictor(s) and Y
■ Depending on the situation, not all functions can be madelinear:
Y = a1eb1X1 + a2e
b2X2
■ Furthermore, depending on the error dependency(multiplicative or additive), transformations will not lead toerrors with the distributional assumptions of linearregression.
■ Linear regression can only go so far, so if data have afunctional relationship other methods may be better suited.
Overview
Introductory Example
Nonconstant Variance
Nonlinearity
Detection
Remedy
Optimal Transformations
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 25 of 36
Detection of Nonlinearity
■ Often, nonlinearity is detected visually, through use of theresidual plots.
5.00000 5.50000 6.00000 6.50000 7.00000
Unstandardized Predicted Value
−1.00000
0.00000
1.00000S
tan
da
rdiz
ed
Re
sid
ua
l
W
W
W
W
W
W
W
W
W
W
W
W
W
W
W
W
W
W
W
W
Overview
Introductory Example
Nonconstant Variance
Nonlinearity
Detection
Remedy
Optimal Transformations
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 26 of 36
Possible Remedy for Nonlinearity
■ As discussed, a possible remedy for nonlinearity is to use atransformation of both Y and X.
◆ Situations occur where a non-linear relationship betweenpredictors and Y is present.
(Weisberg, 1985; p. 142)
Y Transformation X Transformation Form
ln(Y ) ln(X) Y = aXb11 X
b22 . . . X
bkk
ln(Y ) X Y = aeb1X1+b2X2+...+bkXk
Y ln(X) Y = a + b1 ln(X1) + b2 ln(X2) + . . . + ln(bk)Xk1Y
1X
Y = 1a+(b1/X1)+(b2/X2)+...+(bk/Xk)
1Y
X Y = 1a+(b1X1)+(b2X2)+...+(bkXk)
Y 1X
Y = a + (b11
X1+ b2
1X2
+ . . . + bk1
Xk
Overview
Introductory Example
Nonconstant Variance
Nonlinearity
Optimal Transformations
Transformation Parameter 1
Transformation Parameter 2
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 27 of 36
Parameterizing Transformations
■ Instead of choosing some type of transformation functionseemingly arbitrarily, statistical techniques have beendeveloped to transform both Y and the set of all predictors Xbased on known functions.
■ These techniques bear mention because from time to timeyou will encounter estimates based on these.
■ Furthermore, a clear functional relationship between Y andX may not be known, either from substantive theory orempirical results.
■ In these situations, linear methods are often easier to relyupon because of parsimony (real or perceived).
Overview
Introductory Example
Nonconstant Variance
Nonlinearity
Optimal Transformations
Transformation Parameter 1
Transformation Parameter 2
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 28 of 36
Transformation of Y : Optimization
■ Consider the family of regression models (power models):
yλ = Xb + e.
■ Finding λ gives an idea of the power relationship betweenthe response and the predictor variables.
■ Such transformations are often referred to as Box-Coxtransformations, and involve iterative techniques to find themost likely value of λ.
■ Another transformation method is called Atkinson’s scoremethod.
Overview
Introductory Example
Nonconstant Variance
Nonlinearity
Optimal Transformations
Transformation Parameter 1
Transformation Parameter 2
Nonnormality
Wrapping Up
Lecture #7 - 2/15/2005 Slide 29 of 36
Transformation of X: Optimization
■ Similar methods have been developed for transforming X.
■ Transforming X is inherently more difficulty.
■ Often times absurd results can be the outcome.
■ Many times non-significant relationships obscure anyneeded transformations.
Overview
Introductory Example
Nonconstant Variance
Nonlinearity
Optimal Transformations
Nonnormality
Detection 1: Plots
Detection 2: Tests
Special Cases
Wrapping Up
Lecture #7 - 2/15/2005 Slide 30 of 36
Nonnormality
■ The last assumption to be checked for is the normality of theresiduals.
■ Detecting nonnormality can be tricky, and is often basedupon sample size.
■ Statistically speaking, there is not a hypothesis test that canconclude that a variable is normally distributed.
■ Violations of this assumption lead to inaccurate p-values inhypothesis tests (only).
■ The p-values, however, are fairly robust to violations of thisassumption.
Overview
Introductory Example
Nonconstant Variance
Nonlinearity
Optimal Transformations
Nonnormality
Detection 1: Plots
Detection 2: Tests
Special Cases
Wrapping Up
Lecture #7 - 2/15/2005 Slide 31 of 36
Detection of Nonnormality: Probability Plots
■ The easiest way to detect nonnormal errors is to use a Q-Qplot.
■ A Q-Q plot is a plot of an ordered variable against what itsexpected value should be for a variable from a normaldistribution with size N.
■ In SPSS: Graphs...Q-Q (check Standardize values and besure Test distribution is Normal).
■ The plot of the data should fall on the line produced on theplot.
■ If not, the data are not from a normal distribution.
Overview
Introductory Example
Nonconstant Variance
Nonlinearity
Optimal Transformations
Nonnormality
Detection 1: Plots
Detection 2: Tests
Special Cases
Wrapping Up
Lecture #7 - 2/15/2005 Slide 32 of 36
Detection of Nonnormality: Probability Plots
Overview
Introductory Example
Nonconstant Variance
Nonlinearity
Optimal Transformations
Nonnormality
Detection 1: Plots
Detection 2: Tests
Special Cases
Wrapping Up
Lecture #7 - 2/15/2005 Slide 33 of 36
Detection of Nonnormality: Hypothesis Tests
■ Additionally, statistical hypothesis tests have been developedto test the null hypothesis that the data (in this case theresiduals) can from a normal distribution.
◆ Shapiro-Wilk test.
◆ Kolmogorov-Smirnov test.
■ In SPSS: get the unstandardized residuals and go toAnalyze...Descriptive Statistics...Explore.
◆ Put the residuals in the Dependent List box.
◆ Click on Plots and check “Normality plots with tests".
■ If p-value is less than some α, then reject the nullhypothesis...residuals are not from a normal distribution.
■ Little validity under small samples.
Overview
Introductory Example
Nonconstant Variance
Nonlinearity
Optimal Transformations
Nonnormality
Detection 1: Plots
Detection 2: Tests
Special Cases
Wrapping Up
Lecture #7 - 2/15/2005 Slide 34 of 36
Test for Correlated Errors
■ Finally, there is a hypothesis test for seriation effects in thepredictor variables.
■ The Durbin-Watson statistic tests for correlation betweenadjacent observations.
■ Really, this test is only valid if observations were made onequal time intervals.
■ In SPSS: Analyze...Regresssion...Linear.
◆ Click on the Statistics box.
◆ Under Residuals check Durbin-Watson.
Overview
Introductory Example
Nonconstant Variance
Nonlinearity
Optimal Transformations
Nonnormality
Wrapping Up
Moment of Zen
Next Class
Lecture #7 - 2/15/2005 Slide 35 of 36
Here it is: Your Moment of Zen
■ Regression diagnostics areimportant parts of ananalysis that must not beoverlooked.
■ Often times, inferences canbe wrong if assumptions ofthe regression have notbeen met.
■ Transformations can take forever, and may not get you closerto a good result with respect to assumptions.
■ Listen to your data, they are trying to tell you something.
Overview
Introductory Example
Nonconstant Variance
Nonlinearity
Optimal Transformations
Nonnormality
Wrapping Up
Moment of Zen
Next Class
Lecture #7 - 2/15/2005 Slide 36 of 36
Next Time
■ Bringing it all together: Case studies in regression.