spssmixed

8/8/2019 SPSSMixed

1/82

1

Mixed Analysis ofVariance Models with

SPSS

Robert A.Yaffee, Ph.D.

Statistics, Social Science, and MappingGroup

Information Technology Services/Academic

Computing ServicesOffice location: 75 Third Avenue, Level C-3

Phone: 212-998-3402

8/8/2019 SPSSMixed

2/82

2

Outline

1. Classification of Effects

2. Random Effects

1. Two-Way Random Layout

2. Solutions and estimates

3. General linear model

1. Fixed Effects Models1. The one-way layout

4. Mixed Model theory1. Proper error terms

5. Two-way layout6. Full-factorial model

1. Contrasts with interaction terms

2. Graphing Interactions

8/8/2019 SPSSMixed

3/82

3

Outline-Contd

Repeated Measures ANOVA

Advantages of Mixed Modelsover GLM.

8/8/2019 SPSSMixed

4/82

4

Definition of Mixed

Models by their

component effects

1. Mixed Models contain bothfixed and random effects

2. Fixed Effects: factors forwhich the only levels underconsideration are contained

in the coding of those effects3. Random Effects: Factors for

which the levels contained inthe coding of those factors

are a random sample of thetotal number of levels in thepopulation for that factor.

8/8/2019 SPSSMixed

5/82

8/8/2019 SPSSMixed

6/82

6

Classification of effects

1. There are main effects:

Linear Explanatory Factors

2. There are interaction effects:

Joint effects over and above

the component main effects.

8/8/2019 SPSSMixed

7/82

7

8/8/2019 SPSSMixed

8/82

8

Classification of Effects-

contd

Hierarchical designs have nested

effects. Nested effects are

those with subjects within

groups.

An example would be patients

nested within doctors anddoctors nested within hospitals

This could be expressed by

patients(doctors)

doctors(hospitals)

8/8/2019 SPSSMixed

9/82

9

8/8/2019 SPSSMixed

10/82

10

Between and Within-

Subject effects

Such effects may sometimes be fixed orrandom. Theirclassification depends on the experimental designBetween-subjects effects are those who are in one group or

another but not in both. Experimental group is a fixed effectbecause the manager is considering only those groups in hisexperiment. One group is the experimental group and theother is the control group. Therefore, this grouping

factor is a between- subject effect.

Within-subject effects are experienced by subjectsrepeatedly over time. Trial is a random effect whenthere are several trials in the repeated measuresdesign; all subjects experience all of the trials. Trial istherefore a within-subject effect.Operator may be a fixed or random effect, dependingupon whether one is generalizing beyond the sample

If operator is a random effect, then themachine*operator interaction is a random effect.There are contrasts: These contrast the values of onelevel with those of other levels of the same effect.

8/8/2019 SPSSMixed

11/82

11

Between Subject

effects

Gender: One is either male or

female, but not both.

Group: One is either in the

control, experimental, or the

comparison group but not more

than one.

8/8/2019 SPSSMixed

12/82

12

Within-Subjects Effects

These are repeated effects.

Observation 1, 2, and 3 mightbe the pre, post, and follow-up

observations on each person.

Each person experiences all of

these levels or categories.

These are found in repeated

measures analysis of variance.

8/8/2019 SPSSMixed

13/82

13

Repeated Observations

are Within-Subjects

effects

Trial 1 Trial 2 Trial 3

Group

Group is a between subjects effect, whereas Trial is a

within subjects effect.

8/8/2019 SPSSMixed

14/82

14

The General

Linear Model1. The main effects general

linear model can be

parameterized as

( )

( )

( )

exp ~ ( , )

ij i j ij

ij

i i

j j

ij

Y b

where

Y observation for ith

grand mean an unknown fixed parm

effect of ith value of a

b effect of jth value of b b

erimental error N

Q E I

E

Q

E E QQ

I W

!

!

!

! !

! 20

8/8/2019 SPSSMixed

15/82

15

A factorial model

If an interaction term were included,

the formula would be

ij i i ij ijy e Q E F EF!

The interaction or crossed effect is the joint effect, over andabove the individual main effects. Therefore, the main effects

must be in the model for the interaction to be properly specified.

( ) ( )

i j ij

ij

y

y

EF Q E Q F Q

E F Q

!

!

8/8/2019 SPSSMixed

16/82

16

Higher-Order

InteractionsIf 3-way interactions are in the

model, then the main effects

and all lower order interactions

must be in the model for the 3-

way interaction to be properly

specified. For example, a3-way interaction model would

be:

ijk i j k ij ik jk

ijk ijk

y a b c ab ac bc

abc e

Q!

8/8/2019 SPSSMixed

17/82

17

The General Linear

Model In matrix terminology, the

general linear model may be

expressed as

Y X

where

Y theobserved data vector

X the design matrix

thevector of unknown fixed effect parameters

thevector of errors

F I

F

I

!

!

!

!

!

8/8/2019 SPSSMixed

18/82

18

Assumptions

Of the general linear model

( )

var( )

var( )

( )

E

I

Y I

E Y

I

I

F

!

!

!!

2

2

0

8/8/2019 SPSSMixed

19/82

19

General Linear Model

Assumptions-contd1. Residual Normality.

2. Homogeneity of error variance3. Functional form of Model:

Linearity of Model

4. No Multicollinearity5. Independence of observations

6. No autocorrelation of errors

7. No influential outliers

We have to test for these to be sure that the model is

valid.

We will discuss the robustness of the model in face

of violations of these assumptions.

We will discuss recourses when these assumptions are

violated.

8/8/2019 SPSSMixed

20/82

20

Explanation of these

assumptions1. Functional form of Model: Linearity of

Model: These models only analyze thelinear relationship.

2. Independence of observations3. Representativeness of sample

4. Residual Normality: So the alpharegions of the significance tests areproperly defined.

5. Homogeneity of error variance: So theconfidence limits may be easily found.

6. No Multicollinearity: Prevents efficientestimation of the parameters.

7. No autocorrelation of errors:

Autocorrelation inflates the R2 ,F and ttests.

8. No influential outliers: They bias theparameter estimation.

8/8/2019 SPSSMixed

21/82

21

Diagnostic tests for these

assumptions1. Functional form of Model:

Linearity of Model: Pair plot

2. Independence of observations:Runs test

3. Representativeness of sample:Inquire about sample design

4. Residual Normality: SK or SW

test5. Homogeneity of error variance

Graph of Zresid * Zpred

6. No Multicollinearity: Corr of X

7. No autocorrelation of errors: ACF8. No influential outliers: Leverage

and Cooks D.

8/8/2019 SPSSMixed

22/82

22

Testing for outliers

Frequencies analysis of stdres

cksd.

Look for standardized residuals

greater than 3.5 or less than

3.5

And look for Cooks D.

8/8/2019 SPSSMixed

23/82

23

Studentized Residuals

( )

( )

( )i

s ii

i

s

i

i

i

ees h

here

e studentized residual

s standard deviation hereith obs is deleted

h leverage statistic

!

!

!

!

21

Belsley et al (1980) recommend the use of studentized

Residuals to determine whether there is an outlier.

8/8/2019 SPSSMixed

24/82

24

Influence of Outliers

1. Leverage is measured by the

diagonal components of the

hat matrix.

2. The hat matrix comes from

the formula for the regression

of Y. '( ' ) '

'( ' ) ' ,

,

Y Y

here the hat atrix H

Therefore

Y HY

F

! !

!

!

1

1

8/8/2019 SPSSMixed

25/82

25

Leverage and the Hat

matrix1. The hat matrix transforms Y into the

predicted scores.

2. The diagonals of the hat matrix indicate

which values will be outliers or not.3. The diagonals are therefore measures

of leverage.

4. Leverage is bounded by two limits: 1/nand 1. The closer the leverage is to

unity, the more leverage the value has.5. The trace of the hat matrix = the

number of variables in the model.

6. When the leverage > 2p/n then there ishigh leverage according to Belsley et

al. (198

0) cited in Long, J.F. ModernMethods of Data Analysis (p.262). Forsmaller samples, Vellman and Welsch(1981) suggested that 3p/n is thecriterion.

8/8/2019 SPSSMixed

26/82

26

Cooks D

1. Another measure of

influence.

2. This is a popular one. The

formula for it is:

' ( )

i i

ii i

h e

Cook s D p h s h

!

2

2

1

1 1

Cook and Weisberg(1982) suggested that values of

D that exceeded 50% of the F distribution (df = p, n-p)are large.

8/8/2019 SPSSMixed

27/82

27

Cooks D in SPSS

Finding the influential outliers

Select those observations for which

cksd > (4*p)/n

Belsley suggests 4/(n-p-1) as a

cutoff

If cksd > (4*p)/(n-p-1);

8/8/2019 SPSSMixed

28/82

28

What to do with outliers

1. Check coding to spot typos

2. Correct typos

3. If observational outlier is correct,examine the dffits option to seethe influence on the fitting

statistics.4. This will show the standardized

influence of the observation onthe fit. If the influence of the

outlier is bad, then considerremoval or replacement of it withimputation.

8/8/2019 SPSSMixed

29/82

29

Decomposition of the

Sums of Squares1. Mean deviations are computed

when means are subtracted from

individual scores.1. This is done for the total, the

group mean, and the error terms.

2. Mean deviations are squared andthese are called sums of squares

3. Variances are computed bydividing the Sums of Squares by

their degrees of freedom.4. The total Variance = Model

Variance + error variance

F l f D iti

8/8/2019 SPSSMixed

30/82

30

Formula for Decomposition

of Sums of Squares

SS total = SS error + SSmodel

. .

. .

. .

( ) ( ..)

( ) ( ) ( ..)

( ) ( ) ( ..)

i j ij j j

i j ij j j

i j ij j j

y y y y y y

total effect error within model effect

we square the terms

y y y y y y

and sum them over the data set

y y y y y y

SS total SSerror Group SS

where SS Sums of

!

!

!

! !

!

2 2 2

2 2 2

Squares

8/8/2019 SPSSMixed

31/82

31

Variance

DecompositionDividing each of the sums of

squares by their respective

degrees of freedom yields thevariances.

Total variance= error variance

+ model variance.

in fixed effects models

model varianceF

error variance!

SStotal SSerror SSmodel

n n k k ! 1 1

8/8/2019 SPSSMixed

32/82

32

Proportion of Variance

ExplainedR2 = proportion of variance

explained.

SStotal = SSmodel + SSerrror

Divide all sides by SStotal

SSmodel/SStotal

=1 - SSError/SStotal

R2=1 - SSError/SStotal

8/8/2019 SPSSMixed

33/82

33

The Omnibus F test

The omnibus F test is a test that all of the means of the

levels of the main effects and

as well as any interactions specifiedare not significantly different from one another.

Suppose the model is a one way anova on breaking

pressure of bonds of different metals.

Suppose there are three metals: nickel, iron, and

Copper.

H0: Mean(Nickel)= mean (Iron) = mean(Copper)

Ha: Mean(Nickel) ne Mean(Iron) or

Mean(Nickel) ne Mean(Copper)or Mean(Iron) ne Mean(Copper)

Testing different Levels of

8/8/2019 SPSSMixed

34/82

34

Testing different Levels of

a Factor against one

another

Contrast are tests of the mean

of one level of a factor against

other levels.

:

:a

H

H

Q Q Q

Q Q

Q Q

Q Q

! !

{

{ {

0 1 2 3

1 2

2 3

1 3

8/8/2019 SPSSMixed

35/82

35

Contrasts-contd

A contrast statement computes

' '( ' )

( )

L L V L LZ Z

Frank L

F F - - - !

1

The estimated V- is the generalized inverse of the

coefficient matrix of the mixed model.

The L vector is the kb vector.

The numerator df is the rank(L) and the denominator

df is taken from the fixed effects table unless otherwise

specified.

C i f h F

8/8/2019 SPSSMixed

36/82

36

Construction of the F

tests in different modelsThe F test is a ratio of two variances (Mean Squares).

It is constructed by dividing the MS of the effect to be

tested by a MS of the denominator term. The division

should leave only the effect to be tested left over as a remainder.

A Fixed Effects model F test for a = MSa/MSerror.

A Random Effects model F test for a = MSa/MSab

A Mixed Effects model F test for b = MSa/MSab

A Mixed Effects model F test for ab = MSab/MSerror

8/8/2019 SPSSMixed

37/82

37

Data format

The data format for a GLM is that

of wide data.

D t F t f Mi d

8/8/2019 SPSSMixed

38/82

38

Data Format for Mixed

Models is Long

C i f Wid t

8/8/2019 SPSSMixed

39/82

39

Conversion of Wide to

Long Data Format Click on Data in the header bar

Then click on Restructure in the

pop-down menu

A restr ct re i ard

8/8/2019 SPSSMixed

40/82

40

A restructure wizard

appearsSelect restructure selected variables into cases

and click onNext

A Variables to Cases: Number of

8/8/2019 SPSSMixed

41/82

41

A Variables to Cases: Number of

Variable Groups dialog box appears.

We select one and click on next.

We select the repeated

8/8/2019 SPSSMixed

42/82

42

p

variables and move them

to the target variable box

After moving the repeated variables into the target variable

b h fi d i bl i h Fi d i bl

8/8/2019 SPSSMixed

43/82

43

box, we move the fixed variables into the Fixed variable

box, and select a variable for case idin this case,

subject.

Then we click on Next

A create index variables dialog box

8/8/2019 SPSSMixed

44/82

44

appears. We leave the number of index

variables to be created at one and click on

next at the bottom of the box

When the following box

8/8/2019 SPSSMixed

45/82

45

appears we just type in

time and select Next.

When the options dialog box appears we select the

8/8/2019 SPSSMixed

46/82

46

When the options dialog box appears, we select the

option for dropping variables not selected.

We then click on Finish.

We thus obtain our data

8/8/2019 SPSSMixed

47/82

47

We thus obtain our data

in long format

8/8/2019 SPSSMixed

48/82

48

The Mixed Model

The Mixed Model uses long data

format. It includes fixed and

random effects.It can be used to model merely fixed

or random effects, by zeroing out

the other parameter vector.

The F tests for the fixed, random, andmixed models differ.

Because the Mixed Model has the

parameter vector for both of these

and can estimate the error

covariance matrix for each, it can

provide the correct standard errors

for either thefixed or random effects.

8/8/2019 SPSSMixed

49/82

49

The Mixed Model

'

y X Z

where

fixed effects parameter estimates X fixed effects

Z Random effects parameter estimates

random effects

errors

Variance of y V ZGZ R

G and R require covariancestructure

fitting

F K I

F

KI

!

!!

!

!!

! !

Mixed Model Theory-

8/8/2019 SPSSMixed

50/82

50

Mixed Model Theory

contdLittle et al.(p.139) note that u and e areuncorrelated random variables with 0

means and covariances, G and R,

respectively.

' ,

( ' ) '

' ( )

Because the

covariance matrix

V ZGZ R

the solution for

X V X X V y

u GZ V y X

F

F

!

!

!

1 1

V-

is a generalized inverse. Because V is usuallysingular and noninvertible AVA = V- is an

augmented matrix that is invertible. It can later

be transformed back to V.

The G and R matrices must be positive definite.

In the Mixed procedure, the covariance type of

the random (generalized) effects defines thestructure of G and a repeated covariance type

defines structure of R.

Mixed Model

8/8/2019 SPSSMixed

51/82

51

Mixed Model

Assumptions

0u

EI

!

-

0

0

u GVariance

RI

!

- -

A linear relationship between dependent and

independent variables

Random Effects

8/8/2019 SPSSMixed

52/82

52

Random Effects

Covariance Structure This defines the structure of

the G matrix, the random

effects, in the mixed model.

Possible structures permitted

by current version of SPSS:

Scaled Identity Compound Symmetry

AR(1)

Huynh-Feldt

Structures of Repeated

8/8/2019 SPSSMixed

53/82

53

p

effects (R matrix)-contdVariance Components

W

W

WW

! -

2

1

2

2

23

2

4

0 0 0

0 0 0

0 0 0

0 0 0

Co pound Sy etry

W W W W W

W W W W W

W W W W W

W W W W W

! -

2 2 2 2 2

1 1 1 1

2 2 2 2 2

1 2 1 1

2 2 2 2 2

1 1 3 1

2 2 2 2 2

1 1 1 4

( )AR

V V V

V V V

V V V

V V V

! -

2 3

2

2

3 2

1

11

1

1

8/8/2019 SPSSMixed

54/82

8/8/2019 SPSSMixed

55/82

R matrix, defines the

correlation among

8/8/2019 SPSSMixed

56/82

56

correlation among

repeated random effects

.

.R

W W W W

W W W W

W W W W

W W W W

W W W W

W W W W

!

2

1 1 1

2

1 1 1

2

1 1 1

21 1 1

2

1 1 1

2

1 1 1

One can specify the nature of the correlation among the

repeated random effects.

GLM Mixed Model

8/8/2019 SPSSMixed

57/82

57

GLM Mixed Model

The General Linear Model is a special case of the

Mixed Model with Z = 0 (which means that

Zu disappears from the model) and

2R I

W!

Mixed Analysis of a Fixed

8/8/2019 SPSSMixed

58/82

58

Effects model

SPSS tests these fixed effects just as it does with the GLM

Procedure with type III sums of squares.

We analyze the breaking pressure of bonds made

from three metals. We assume that we do not

generalize beyond our sample and that our

effects are all fixed.

Tests of Fixed Effects is performed with the help of

the L matrix by constructing the following F test:

' '[ ( ' ) ']( )

L L X V X L LFrank L

F F

!1

Numerator df = rank(L)

Denominator df = RESID (n-rank(X)df = Satherth

Estimation:Newton

8/8/2019 SPSSMixed

59/82

59

Scoring

i i sH g

here

g gr adient atrix of stderivatives

H Hessian atrix of nd derivatives

s incre ent of step parameter

F F !

!

!

!

11

1

2

Estimation: Minimization

8/8/2019 SPSSMixed

60/82

60

of the objective functions

_ a

1

1

1

1 1

1 1

1( , ): log | | log ' (1 log(2 / ))

2 2 2

1( , ) : log | | log | ' |

2 2

log ' 1 log | 2 /( ) |2 2

( ' ) '

( ' ) '

( '

n nM L G R V r V r n

nREM L G R V X V X

n p n pr V r n p

here r y X X V X X V y

p rank of X

so that the probabilities ofX V X X V y and

GZ V

T

T

F

K

!

!

!

!

!

! 1( ' ) .y X are maximizedF

UsingNewton Scoring, the following functions are

minimized

Significance of

8/8/2019 SPSSMixed

61/82

61

Parameters

1 1

1 1 1

: 0

'

' '

' '

L is a linear combination

Ho

tLCL

where

X R X X R Z C

Z R X ZR Z G

F

K

F

K

F

K

!

-

- !

!

-

Test one covariance

structure against the other

8/8/2019 SPSSMixed

62/82

62

structure against the other

with the IC The rule of thumb is smaller is

better

-2LL

AIC Akaike

AICC Hurvich and Tsay

BIC Bayesian Info Criterion

Bozdogans CAIC

Measures of Lack of fit:

8/8/2019 SPSSMixed

63/82

63

The information Criteria-2LL is called the deviance. It is a

measure of sum of squared errors.

AIC = -2LL + 2p (p=# parms)

BIC = Schwartz Bayesian Info

criterion = 2LL + plog(n)

AICC= Hurvich and Tsays smallsample correction on AIC: -2LL +

2p(n/(n-p-1))

CAIC = -2LL + p(log(n) + 1)

Procedures for Fitting the

8/8/2019 SPSSMixed

64/82

64

Mixed Model One can use the LR test or the

lesser of the information criteria.

The smaller the informationcriterion, the better the model

happens to be.

We try to go from a larger to asmaller information criterion

when we fit the model.

LR test

8/8/2019 SPSSMixed

65/82

65

LR test

1. To test whether one model is

significantly better than the

other.2. To test random effect for

statistical significance

3. To test covariance structureimprovement

4. To test both.

5. Distributed as a6. With df= p2 p1 where pi =#

parms in model i

G2

Applying the LR test

8/8/2019 SPSSMixed

66/82

66

Applying the LR test

We obtain the -2LL from theunrestricted model.

We obtain the -2LL from therestricted model.

We subtract the latter from thelarger former.

That is a chi-square with df=the difference in the number ofparameters.

We can look this up anddetermine whether or not it isstatistically significant.

Advantages of the Mixed

8/8/2019 SPSSMixed

67/82

67

Model1. It can allow random effects to be

properly specified and computed,

unlike the GLM.2. It can allow correlation of errors,

unlike the GLM. It therefore has

more flexibility in modeling the error

covariance structure.3. It can allow the error terms to exhibit

nonconstant variability, unlike the

GLM, allowing more flexibility in

modeling the dependent variable.

4. It can handle missing data, whereas

the repeated measures GLM cannot.

Programming A Repeated

Measures ANOVA with

8/8/2019 SPSSMixed

68/82

68

PROC Mixed

Select the Mixed Linear Option in Analysis

Move subject ID into the

subjects box and the

8/8/2019 SPSSMixed

69/82

69

subjects box and the

repeated variable into the

repeated box.

Click on continue

We specify subjects and

repeated effects with the

8/8/2019 SPSSMixed

70/82

70

next dialog box

We set the repeated covariance type to Diagonal

& click on continue

Defining the Fixed

8/8/2019 SPSSMixed

71/82

71

EffectsWhen the next dialog box appears,

we insert the dependent Response

variable and the fixed effects ofanxiety and tension

Click on continue

We select the Fixed

8/8/2019 SPSSMixed

72/82

72

effects to be tested

Move them into the model box,

selecting main effects, and type III

8/8/2019 SPSSMixed

73/82

73

sum of squares

Click on continue

When the Linear Mixed

Models dialog box

8/8/2019 SPSSMixed

74/82

74

appears, select random

Under random effects, select

scaled identity as covariance

type and move subjects over into

8/8/2019 SPSSMixed

75/82

75

yp j

combinations

Click on continue

Select Statistics and check of the

following in the dialog box that

appears

8/8/2019 SPSSMixed

76/82

76

appears

Then click continue

When the Linear Mixed

Models box appears, click

8/8/2019 SPSSMixed

77/82

77

ok

You will get your tests

8/8/2019 SPSSMixed

78/82

78

Estimates of Fixed effects

and covariance

8/8/2019 SPSSMixed

79/82

79

parameters

R matrix

8/8/2019 SPSSMixed

80/82

80

Rerun the model with

different nested covariance

8/8/2019 SPSSMixed

81/82

81

structures and compare theinformation criteria

The lower the information criterion, the better fit

the nested model has. Caveat: If the models are

not nested, they cannot be compared with the

information criteria.

GLM vs. Mixed

8/8/2019 SPSSMixed

82/82

82

GLM has

means

lsmeans

sstype 1,2,3,4

estimates using OLS or WLSone has to program the correct F tests for randomeffects.

losses cases with missing values.

Mixed has

lsmeans

sstypes 1 and 3

estimates using maximum likelihood, general methodsof moments, or restricted maximum likelihood

ML

MIVQUE0

REML

gives correct std errors and confidence intervals for

random effectsAutomatically provides correct standard errors foranalysis.

Can handle missing values

spssmixed

Documents