checking model adequacy can we trust our analytic results?bacraig/notes514/topic5a.pdf ·...

39
Checking Model Adequacy Can we trust our analytic results? Bruce A Craig Department of Statistics Purdue University STAT 514 Topic 5 1

Upload: others

Post on 20-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

Checking Model Adequacy

Can we trust our analytic results?

Bruce A Craig

Department of StatisticsPurdue University

STAT 514 Topic 5 1

Page 2: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

Model Checking Diagnostics

Theoretical justification of standard analysis assumes

1 Errors independent2 Errors normally distributed3 Errors have constant variance

Need to check if these conditions reasonably met

yij = (y .. + (y i . − y ..)) + (yij − y i .)yij = yij + ǫij

observed = predicted + residual

Diagnostics primarily use predicted and residual values

Residuals are our “estimates” of unobservable errors

STAT 514 Topic 5 2

Page 3: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

Common Diagnostics

Normally distributed errorsHistogram of residualsNormal probability plot / QQ plot of residuals

Shapiro-Wilks/Kolmogorov-Smirnov test of residuals

Errors have constant variancePlot residuals ǫij vs predicted values yij (residual plot)

Modified Levene’s test

Errors are independentPlot residuals ǫij vs time/space variableDurbin-Watson test

Plot residuals ǫij vs other variable of interest

STAT 514 Topic 5 3

Page 4: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

Normally distributed errors

Histogram of residuals

Is histogram approximately bell-shaped?

QQ Plot of residuals

Does the relationship look approximately linear?

Created by plotting:Ordered residuals ǫ(t) vs associated quantile values ztwhere zt are such that P(Z ≤ zt) = Pt

with Pt = (t − .5)/N for t = 1, 2, . . . ,N

STAT 514 Topic 5 4

Page 5: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

Constant Variance

Often the variance of a response is linked to its mean

Can investigate visually through residual plot

Plot ǫij vs yijIs the range of ǫij constant for different levels of yij

Modified Levene’s (Brown-Forsythe) test

Compute |yij −mi | where mi is median of group i

Compare average “deviations” using F-testAbsolute value a more robust measure of deviationMedian a more robust measure of centerOther tests more sensitive to Normal assumption

STAT 514 Topic 5 5

Page 6: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

Independence

A well-planned experiment often goes a long way towardssatisfying this assumption. Randomization protectsagainst unknown factors

Plot of the residuals over time (or space)

Is there a drift or pattern over time/space?

Plot residuals versus other relevant variables

Often variables omitted from analysisExperimental conditions (e.g., temp)May result in inclusion of new factor in next experiment

STAT 514 Topic 5 6

Page 7: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

Outliers/Unusual Observations

May observe “unusual” responses in diagnostic plots

Results in large |ǫij |Do not simply remove. Investigate!

Helpful to have detailed lab/experimental notesDon’t look only for excuses or typos

How important are they? Do conclusions change with andwithout them in the analysis?

Formal tests (e.g. standardized residuals) can be used butonly confirm whether they are “unusual.”

STAT 514 Topic 5 7

Page 8: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

Diagnostics Example

data one;

infile ’c:\saswork\data\tensile.dat’;

input percent strength time;

proc glm data=one;

class percent; model strength=percent;

means percent / hovtest=bf; ****Modified Levene’s test;

output out=diag p=pred r=res; run;

proc sort; by pred;

symbol1 v=circle i=sm50; title1 ’Residual Plot’;

proc gplot; plot res*pred/frame; run;

proc univariate data=diag noprint;

var res; qqplot res / normal (L=1 mu=est sigma=est);

histogram res / normal; run;

proc sort; by time;

symbol1 v=circle i=sm75;

proc gplot; plot res*time / vref=0 vaxis=-6 to 6 by 1;

symbol1 v=circle i=sm50;

proc gplot; plot res*time / vref=0 vaxis=-6 to 6 by 1;

STAT 514 Topic 5 8

Page 9: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

The GLM Procedure

Dependent Variable: strength

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 4 475.7600000 118.9400000 14.76 <.0001

Error 20 161.2000000 8.0600000

Corrected Total 24 636.9600000

R-Square Coeff Var Root MSE strength Mean

0.746923 18.87642 2.839014 15.04000

Source DF Type I SS Mean Square F Value Pr > F

percent 4 475.7600000 118.9400000 14.76 <.0001

Source DF Type III SS Mean Square F Value Pr > F

percent 4 475.7600000 118.9400000 14.76 <.0001

Brown and Forsythe’s Test for Homogeneity of strength Variance

ANOVA of Absolute Deviations from Group Medians

Sum of Mean

Source DF Squares Square F Value Pr > F

percent 4 4.9600 1.2400 0.32 0.8626

Error 20 78.0000 3.9000

STAT 514 Topic 5 9

Page 10: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

STAT 514 Topic 5 10

Page 11: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

STAT 514 Topic 5 11

Page 12: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

STAT 514 Topic 5 12

Page 13: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

STAT 514 Topic 5 13

Page 14: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

Non-constant Variance

Often response variable’s mean and variance linked

F test robust to violations (especially if balanced)

Why concern?

Comparison of treatments depends on MSEIncorrect confidence interval lengthsMore Type I and Type II errors

Variance-Stabilizing TransformationsCommon transformations√

y , log(y), 1/y , arcsin(√y), and 1/

√y

Box-Cox transformations

uses maximum likelihood procedurescan approximate using relationship σi = θµβ

i

transformation is y1−β

Distribution often more “Normal” after transformation

STAT 514 Topic 5 14

Page 15: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

Transformations

Consider an RV X with E(X )=µx and V(X )=σ2x

Define Y = f (X ); What is the mean and variance of Y ?

Delta Method Approximation

First-order Taylor Series Expansion about µX

Assume f (X ) such that f ′(µx ) 6= 0

Then f (X ) ≈ f (µx ) + (X − µx )f ′(µx )

E(Y )=E(f (X ))≈ E(f (µx)) + E((X − µx)f′(µx))= f (µx)

V(Y )=V(f (X )) ≈ [f ′(µx)]2V(X ) = [f ′(µx)]

2σ2x

STAT 514 Topic 5 15

Page 16: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

Variance-Stabilizing Transformation

Suppose response y is such that σ2y = g(µy )

Want to find f (y) such that V(f (y))≈ c

Using Delta method : V(f (y))≈ [f ′(µy )]2σ2

y

Therefore want to choose f such that [f ′(µy )]2g(µy ) ≈ c

Examples

g(µ) = µ (Poisson) f (y) =∫

1√µdµ → f (y) =

√y

g(µ) = µ(1− µ) (Binomial) f (y) =∫

1√µ(1−µ)

dµ → f (X ) = asin(√y)

g(µ) = µ2β (Box-Cox) f (y) =∫

µ−βdµ → f (y) = y1−β

g(µ) = µ2 (Box-Cox) f (y) =∫

1µdµ → f (y) = log y

STAT 514 Topic 5 16

Page 17: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

Box-Cox Transformation

Perform analysis of variance on transformed y

yλ =

yλ−1

λyλ−1 λ 6= 0

y logy λ = 0

y is the geometric mean of the observations

y =

a∏

i=1

ni∏

j=1

yij

1/N

Find λ which minimizes SSE

Need to rescale yλ for direct comparison

STAT 514 Topic 5 17

Page 18: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

Box-Cox λ Approximation

Box-Cox is assuming σ2y = Cµ2β

y

The appropriate transform is 1− β

Consider estimating β using the fact that

log(σy ) = log(C )/2 + β log(µy )

Regress sample estimates, log(Si) versus log(yi), and usethe slope as the estimate of β

STAT 514 Topic 5 18

Page 19: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

trans.sas

title1 ’Increasing Variance Example’;

data one;

infile ’c:\saswork\data\boxcox.dat’; input trt resp;

proc glm data=one; class trt;

model resp=trt; output out=diag p=pred r=res;

title1 ’Residual Plot’;

symbol1 v=circle i=none; proc sort; by pred;

proc gplot data=diag; plot res*pred /frame;

proc univariate data=one noprint;

var resp; by trt; output out=two mean=mu std=sigma;

data three; set two;

logmu = log(mu); logsig = log(sigma);

proc reg; model logsig = logmu;

title1 ’Mean vs Std Dev’; symbol1 v=circle i=rl;

proc gplot; plot logsig*logmu / regeqn;

proc transreg data=one;

model boxcox(resp / lambda=-2 to 2 by .2) = class(trt);

run;

STAT 514 Topic 5 19

Page 20: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

STAT 514 Topic 5 20

Page 21: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

STAT 514 Topic 5 21

Page 22: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

STAT 514 Topic 5 22

Page 23: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

STAT 514 Topic 5 23

Page 24: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

What if no transformation works?

Can be situations when there is not a power relationshipbetween the group variances and means

If residuals appear Normal, consider weighted ANOVA orlinear mixed model

If residuals skewed, consider a generalized linear modelwhere you assume an alternative distribution for y

Seek assistance if uncertain what to do or how to do it

STAT 514 Topic 5 24

Page 25: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

Example: Soldering Study

Experiment designed to assess the strength of five typesof flux used in soldering wire boards

Forty units were used in the experiment

Units randomly and equally assigned to flux type (ni = 8)

Y strength

X type of flux

STAT 514 Topic 5 25

Page 26: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

SAS Commands

data a1; infile ’u:\.www\datasets525\CH18TA02.txt’;

input strength type;

proc gplot; ***scatterplot;

plot strength*type;

proc glm;

class type;

model strength=type;

means type / hovtest=bf; ***Modified Levene test;

lsmeans type / stderr; ***Alternative to means;

run;

proc transreg;

model boxcox(strength / lambda=-2 to 2 by .2) = class(type);

run;

STAT 514 Topic 5 26

Page 27: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

Scatterplot

STAT 514 Topic 5 27

Page 28: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

Output

Source DF Sum of Squares Mean Square F Value Pr > F

Model 4 353.6120850 88.4030213 41.93 <.0001

Error 35 73.7988250 2.1085379

Cor Total 39 427.4109100

Brown and Forsythe’s Test for Homogeneity of strength Variance

ANOVA of Absolute Deviations from Group Medians

Sum of Mean

Source DF Squares Square F Value Pr > F

type 4 9.3477 2.3369 2.94 0.0341

Error 35 27.8606 0.7960

strength Standard

type LSMEAN Error Pr > |t|

1 15.4200000 0.5133880 <.0001

2 18.5275000 0.5133880 <.0001

3 15.0037500 0.5133880 <.0001

4 9.7412500 0.5133880 <.0001

5 12.3400000 0.5133880 <.0001

STAT 514 Topic 5 28

Page 29: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

Log Transform Suggested?

STAT 514 Topic 5 29

Page 30: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

SAS Commands

proc means data=a1; ***Obtain sample variances for weights;

var strength;

by type;

output out=a2 var=s2;

data a2; set a2; wt=1/s2;

data a3; merge a1 a2; by type;

proc glm data=a3; ***Weighted ANOVA;

class type;

model strength=type;

weight wt;

lsmeans type / stderr cl; ***Must use lsmeans over means here;

proc mixed data=a1; ***Diff variances for all types;

class type;

model strength=type / ddfm=kr;

repeated / group=type;

STAT 514 Topic 5 30

Page 31: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

GLM Output

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 4 324.2130988 81.0532747 81.05 <.0001

Error 35 35.0000000 1.0000000

Corrected Total 39 359.2130988

Least Squares Means

strength Standard

type LSMEAN Error Pr > |t| 95% Confidence Limits

1 15.4200000 0.4373949 <.0001 14.532041 16.307959

2 18.5275000 0.4429921 <.0001 17.628178 19.426822

3 15.0037500 0.8791614 <.0001 13.218957 16.788543

4 9.7412500 0.2887129 <.0001 9.155132 10.327368

5 12.3400000 0.2720294 <.0001 11.787751 12.892249

STAT 514 Topic 5 31

Page 32: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

MIXED Output

Cov Parm Group Estimate

Residual type 1 1.5305

Residual type 2 1.5699

Residual type 3 6.1834

Residual type 4 0.6668

Residual type 5 0.5920

Fit Statistics

-2 Res Log Likelihood 122.1

AIC (smaller is better) 132.1

AICC (smaller is better) 134.2

BIC (smaller is better) 140.6

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

4 13.73 0.0082

STAT 514 Topic 5 32

Page 33: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

MIXED Output

Type 3 Tests of Fixed Effects

Num Den

Effect DF DF F Value Pr > F

type 4 14.8 71.78 <.0001

Least Squares Means

Standard

Effect type Estimate Error DF t Value Pr > |t|

type 1 15.4200 0.4374 7 35.25 <.0001

type 2 18.5275 0.4430 7 41.82 <.0001

type 3 15.0038 0.8792 7 17.07 <.0001

type 4 9.7413 0.2887 7 33.74 <.0001

type 5 12.3400 0.2720 7 45.36 <.0001

STAT 514 Topic 5 33

Page 34: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

Summary

GLM (weighted ANOVA) and MIXED analysis provide thesame factor level estimates and standard errors.

Without ddfm=kr, F test and df are also the same.

With ddfm=kr, F test similar to Welch F test and factorlevel df more reasonable.

Can consider groups of factor levels with similar variances

Group1=1 : Type 1 and 2Group1=2 : Type 3Group1=3 : Type 4 and 5

STAT 514 Topic 5 34

Page 35: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

SAS Commands

data a1; ***Defining the groups;

set a1;

group1 = 3;

if type=1 | type=2 then group1=1;

if type=3 then group1=2;

proc mixed data=a1; ***Diff variances for each group1;

all types;

class type;

model strength=type / ddfm=kr;

repeated / group=group1;

STAT 514 Topic 5 35

Page 36: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

MIXED Output

Cov Parm Group Estimate

Residual Group 1 1.5502

Residual Group 2 6.1834

Residual Group 3 0.6294

Fit Statistics

-2 Res Log Likelihood 122.1

AIC (smaller is better) 128.1

AICC (smaller is better) 128.9 ***Much better fit

BIC (smaller is better) 133.2

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

2 13.70 0.0011

STAT 514 Topic 5 36

Page 37: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

MIXED Output

Type 3 Tests of Fixed Effects

Num Den

Effect DF DF F Value Pr > F

type 4 19.8 77.68 <.0001

Least Squares Means

Standard

Effect type Estimate Error DF t Value Pr > |t|

type 1 15.4200 0.4402 14 35.03 <.0001

type 2 18.5275 0.4402 14 42.09 <.0001

type 3 15.0038 0.8792 7 17.07 <.0001

type 4 9.7413 0.2805 14 34.73 <.0001

type 5 12.3400 0.2805 14 43.99 <.0001

STAT 514 Topic 5 37

Page 38: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

Near-Zero/Truncated Values

Can have heavily skewed dist near zero

Concentration of rare contaminantNumber of defects in assembly lineNumber of birds at a given site

Measurements may also be truncated or censored

Concentration of contaminant (below detectable level)Lifetime of component (does not fail during study)

Transformations sometimes successful

log(x+.001),√x + .001

Should likely consider more advanced models

Assume non-Normal errors (generalized linear models)Zero-inflated or two-part models (e.g., ZIP model)

STAT 514 Topic 5 38

Page 39: Checking Model Adequacy Can we trust our analytic results?bacraig/notes514/topic5a.pdf · CheckingModelAdequacy Can we trust our analytic results? Bruce A Craig DepartmentofStatistics

Background Reading

Normality assumption: Montgomery 3.4.1

Residual plots: Montgomery 3.4.2 - 3.4.4

Variance-stabilizing transformation: Montgomery 3.4.3

Box-Cox transformation: Montgomery 15.1.1

STAT 514 Topic 5 39