spssmixed
TRANSCRIPT
-
8/8/2019 SPSSMixed
1/82
1
Mixed Analysis ofVariance Models with
SPSS
Robert A.Yaffee, Ph.D.
Statistics, Social Science, and MappingGroup
Information Technology Services/Academic
Computing ServicesOffice location: 75 Third Avenue, Level C-3
Phone: 212-998-3402
-
8/8/2019 SPSSMixed
2/82
2
Outline
1. Classification of Effects
2. Random Effects
1. Two-Way Random Layout
2. Solutions and estimates
3. General linear model
1. Fixed Effects Models1. The one-way layout
4. Mixed Model theory1. Proper error terms
5. Two-way layout6. Full-factorial model
1. Contrasts with interaction terms
2. Graphing Interactions
-
8/8/2019 SPSSMixed
3/82
3
Outline-Contd
Repeated Measures ANOVA
Advantages of Mixed Modelsover GLM.
-
8/8/2019 SPSSMixed
4/82
4
Definition of Mixed
Models by their
component effects
1. Mixed Models contain bothfixed and random effects
2. Fixed Effects: factors forwhich the only levels underconsideration are contained
in the coding of those effects3. Random Effects: Factors for
which the levels contained inthe coding of those factors
are a random sample of thetotal number of levels in thepopulation for that factor.
-
8/8/2019 SPSSMixed
5/82
-
8/8/2019 SPSSMixed
6/82
6
Classification of effects
1. There are main effects:
Linear Explanatory Factors
2. There are interaction effects:
Joint effects over and above
the component main effects.
-
8/8/2019 SPSSMixed
7/82
7
-
8/8/2019 SPSSMixed
8/82
8
Classification of Effects-
contd
Hierarchical designs have nested
effects. Nested effects are
those with subjects within
groups.
An example would be patients
nested within doctors anddoctors nested within hospitals
This could be expressed by
patients(doctors)
doctors(hospitals)
-
8/8/2019 SPSSMixed
9/82
9
-
8/8/2019 SPSSMixed
10/82
10
Between and Within-
Subject effects
Such effects may sometimes be fixed orrandom. Theirclassification depends on the experimental designBetween-subjects effects are those who are in one group or
another but not in both. Experimental group is a fixed effectbecause the manager is considering only those groups in hisexperiment. One group is the experimental group and theother is the control group. Therefore, this grouping
factor is a between- subject effect.
Within-subject effects are experienced by subjectsrepeatedly over time. Trial is a random effect whenthere are several trials in the repeated measuresdesign; all subjects experience all of the trials. Trial istherefore a within-subject effect.Operator may be a fixed or random effect, dependingupon whether one is generalizing beyond the sample
If operator is a random effect, then themachine*operator interaction is a random effect.There are contrasts: These contrast the values of onelevel with those of other levels of the same effect.
-
8/8/2019 SPSSMixed
11/82
11
Between Subject
effects
Gender: One is either male or
female, but not both.
Group: One is either in the
control, experimental, or the
comparison group but not more
than one.
-
8/8/2019 SPSSMixed
12/82
12
Within-Subjects Effects
These are repeated effects.
Observation 1, 2, and 3 mightbe the pre, post, and follow-up
observations on each person.
Each person experiences all of
these levels or categories.
These are found in repeated
measures analysis of variance.
-
8/8/2019 SPSSMixed
13/82
13
Repeated Observations
are Within-Subjects
effects
Trial 1 Trial 2 Trial 3
Group
Group is a between subjects effect, whereas Trial is a
within subjects effect.
-
8/8/2019 SPSSMixed
14/82
14
The General
Linear Model1. The main effects general
linear model can be
parameterized as
( )
( )
( )
exp ~ ( , )
ij i j ij
ij
i i
j j
ij
Y b
where
Y observation for ith
grand mean an unknown fixed parm
effect of ith value of a
b effect of jth value of b b
erimental error N
Q E I
E
Q
E E QQ
I W
!
!
!
! !
! 20
-
8/8/2019 SPSSMixed
15/82
15
A factorial model
If an interaction term were included,
the formula would be
ij i i ij ijy e Q E F EF!
The interaction or crossed effect is the joint effect, over andabove the individual main effects. Therefore, the main effects
must be in the model for the interaction to be properly specified.
( ) ( )
i j ij
ij
y
y
EF Q E Q F Q
E F Q
!
!
-
8/8/2019 SPSSMixed
16/82
16
Higher-Order
InteractionsIf 3-way interactions are in the
model, then the main effects
and all lower order interactions
must be in the model for the 3-
way interaction to be properly
specified. For example, a3-way interaction model would
be:
ijk i j k ij ik jk
ijk ijk
y a b c ab ac bc
abc e
Q!
-
8/8/2019 SPSSMixed
17/82
17
The General Linear
Model In matrix terminology, the
general linear model may be
expressed as
Y X
where
Y theobserved data vector
X the design matrix
thevector of unknown fixed effect parameters
thevector of errors
F I
F
I
!
!
!
!
!
-
8/8/2019 SPSSMixed
18/82
18
Assumptions
Of the general linear model
( )
var( )
var( )
( )
E
I
Y I
E Y
I
I
F
!
!
!!
2
2
0
-
8/8/2019 SPSSMixed
19/82
19
General Linear Model
Assumptions-contd1. Residual Normality.
2. Homogeneity of error variance3. Functional form of Model:
Linearity of Model
4. No Multicollinearity5. Independence of observations
6. No autocorrelation of errors
7. No influential outliers
We have to test for these to be sure that the model is
valid.
We will discuss the robustness of the model in face
of violations of these assumptions.
We will discuss recourses when these assumptions are
violated.
-
8/8/2019 SPSSMixed
20/82
20
Explanation of these
assumptions1. Functional form of Model: Linearity of
Model: These models only analyze thelinear relationship.
2. Independence of observations3. Representativeness of sample
4. Residual Normality: So the alpharegions of the significance tests areproperly defined.
5. Homogeneity of error variance: So theconfidence limits may be easily found.
6. No Multicollinearity: Prevents efficientestimation of the parameters.
7. No autocorrelation of errors:
Autocorrelation inflates the R2 ,F and ttests.
8. No influential outliers: They bias theparameter estimation.
-
8/8/2019 SPSSMixed
21/82
21
Diagnostic tests for these
assumptions1. Functional form of Model:
Linearity of Model: Pair plot
2. Independence of observations:Runs test
3. Representativeness of sample:Inquire about sample design
4. Residual Normality: SK or SW
test5. Homogeneity of error variance
Graph of Zresid * Zpred
6. No Multicollinearity: Corr of X
7. No autocorrelation of errors: ACF8. No influential outliers: Leverage
and Cooks D.
-
8/8/2019 SPSSMixed
22/82
22
Testing for outliers
Frequencies analysis of stdres
cksd.
Look for standardized residuals
greater than 3.5 or less than
3.5
And look for Cooks D.
-
8/8/2019 SPSSMixed
23/82
23
Studentized Residuals
( )
( )
( )i
s ii
i
s
i
i
i
ees h
here
e studentized residual
s standard deviation hereith obs is deleted
h leverage statistic
!
!
!
!
21
Belsley et al (1980) recommend the use of studentized
Residuals to determine whether there is an outlier.
-
8/8/2019 SPSSMixed
24/82
24
Influence of Outliers
1. Leverage is measured by the
diagonal components of the
hat matrix.
2. The hat matrix comes from
the formula for the regression
of Y. '( ' ) '
'( ' ) ' ,
,
Y Y
here the hat atrix H
Therefore
Y HY
F
! !
!
!
1
1
-
8/8/2019 SPSSMixed
25/82
25
Leverage and the Hat
matrix1. The hat matrix transforms Y into the
predicted scores.
2. The diagonals of the hat matrix indicate
which values will be outliers or not.3. The diagonals are therefore measures
of leverage.
4. Leverage is bounded by two limits: 1/nand 1. The closer the leverage is to
unity, the more leverage the value has.5. The trace of the hat matrix = the
number of variables in the model.
6. When the leverage > 2p/n then there ishigh leverage according to Belsley et
al. (198
0) cited in Long, J.F. ModernMethods of Data Analysis (p.262). Forsmaller samples, Vellman and Welsch(1981) suggested that 3p/n is thecriterion.
-
8/8/2019 SPSSMixed
26/82
26
Cooks D
1. Another measure of
influence.
2. This is a popular one. The
formula for it is:
' ( )
i i
ii i
h e
Cook s D p h s h
!
2
2
1
1 1
Cook and Weisberg(1982) suggested that values of
D that exceeded 50% of the F distribution (df = p, n-p)are large.
-
8/8/2019 SPSSMixed
27/82
27
Cooks D in SPSS
Finding the influential outliers
Select those observations for which
cksd > (4*p)/n
Belsley suggests 4/(n-p-1) as a
cutoff
If cksd > (4*p)/(n-p-1);
-
8/8/2019 SPSSMixed
28/82
28
What to do with outliers
1. Check coding to spot typos
2. Correct typos
3. If observational outlier is correct,examine the dffits option to seethe influence on the fitting
statistics.4. This will show the standardized
influence of the observation onthe fit. If the influence of the
outlier is bad, then considerremoval or replacement of it withimputation.
-
8/8/2019 SPSSMixed
29/82
29
Decomposition of the
Sums of Squares1. Mean deviations are computed
when means are subtracted from
individual scores.1. This is done for the total, the
group mean, and the error terms.
2. Mean deviations are squared andthese are called sums of squares
3. Variances are computed bydividing the Sums of Squares by
their degrees of freedom.4. The total Variance = Model
Variance + error variance
F l f D iti
-
8/8/2019 SPSSMixed
30/82
30
Formula for Decomposition
of Sums of Squares
SS total = SS error + SSmodel
. .
. .
. .
( ) ( ..)
( ) ( ) ( ..)
( ) ( ) ( ..)
i j ij j j
i j ij j j
i j ij j j
y y y y y y
total effect error within model effect
we square the terms
y y y y y y
and sum them over the data set
y y y y y y
SS total SSerror Group SS
where SS Sums of
!
!
!
! !
!
2 2 2
2 2 2
Squares
-
8/8/2019 SPSSMixed
31/82
31
Variance
DecompositionDividing each of the sums of
squares by their respective
degrees of freedom yields thevariances.
Total variance= error variance
+ model variance.
in fixed effects models
model varianceF
error variance!
SStotal SSerror SSmodel
n n k k ! 1 1
-
8/8/2019 SPSSMixed
32/82
32
Proportion of Variance
ExplainedR2 = proportion of variance
explained.
SStotal = SSmodel + SSerrror
Divide all sides by SStotal
SSmodel/SStotal
=1 - SSError/SStotal
R2=1 - SSError/SStotal
-
8/8/2019 SPSSMixed
33/82
33
The Omnibus F test
The omnibus F test is a test that all of the means of the
levels of the main effects and
as well as any interactions specifiedare not significantly different from one another.
Suppose the model is a one way anova on breaking
pressure of bonds of different metals.
Suppose there are three metals: nickel, iron, and
Copper.
H0: Mean(Nickel)= mean (Iron) = mean(Copper)
Ha: Mean(Nickel) ne Mean(Iron) or
Mean(Nickel) ne Mean(Copper)or Mean(Iron) ne Mean(Copper)
Testing different Levels of
-
8/8/2019 SPSSMixed
34/82
34
Testing different Levels of
a Factor against one
another
Contrast are tests of the mean
of one level of a factor against
other levels.
:
:a
H
H
Q Q Q
Q Q
Q Q
Q Q
! !
{
{ {
0 1 2 3
1 2
2 3
1 3
-
8/8/2019 SPSSMixed
35/82
35
Contrasts-contd
A contrast statement computes
' '( ' )
( )
L L V L LZ Z
Frank L
F F - - - !
1
The estimated V- is the generalized inverse of the
coefficient matrix of the mixed model.
The L vector is the kb vector.
The numerator df is the rank(L) and the denominator
df is taken from the fixed effects table unless otherwise
specified.
C i f h F
-
8/8/2019 SPSSMixed
36/82
36
Construction of the F
tests in different modelsThe F test is a ratio of two variances (Mean Squares).
It is constructed by dividing the MS of the effect to be
tested by a MS of the denominator term. The division
should leave only the effect to be tested left over as a remainder.
A Fixed Effects model F test for a = MSa/MSerror.
A Random Effects model F test for a = MSa/MSab
A Mixed Effects model F test for b = MSa/MSab
A Mixed Effects model F test for ab = MSab/MSerror
-
8/8/2019 SPSSMixed
37/82
37
Data format
The data format for a GLM is that
of wide data.
D t F t f Mi d
-
8/8/2019 SPSSMixed
38/82
38
Data Format for Mixed
Models is Long
C i f Wid t
-
8/8/2019 SPSSMixed
39/82
39
Conversion of Wide to
Long Data Format Click on Data in the header bar
Then click on Restructure in the
pop-down menu
A restr ct re i ard
-
8/8/2019 SPSSMixed
40/82
40
A restructure wizard
appearsSelect restructure selected variables into cases
and click onNext
A Variables to Cases: Number of
-
8/8/2019 SPSSMixed
41/82
41
A Variables to Cases: Number of
Variable Groups dialog box appears.
We select one and click on next.
We select the repeated
-
8/8/2019 SPSSMixed
42/82
42
p
variables and move them
to the target variable box
After moving the repeated variables into the target variable
b h fi d i bl i h Fi d i bl
-
8/8/2019 SPSSMixed
43/82
43
box, we move the fixed variables into the Fixed variable
box, and select a variable for case idin this case,
subject.
Then we click on Next
A create index variables dialog box
-
8/8/2019 SPSSMixed
44/82
44
appears. We leave the number of index
variables to be created at one and click on
next at the bottom of the box
When the following box
-
8/8/2019 SPSSMixed
45/82
45
appears we just type in
time and select Next.
When the options dialog box appears we select the
-
8/8/2019 SPSSMixed
46/82
46
When the options dialog box appears, we select the
option for dropping variables not selected.
We then click on Finish.
We thus obtain our data
-
8/8/2019 SPSSMixed
47/82
47
We thus obtain our data
in long format
-
8/8/2019 SPSSMixed
48/82
48
The Mixed Model
The Mixed Model uses long data
format. It includes fixed and
random effects.It can be used to model merely fixed
or random effects, by zeroing out
the other parameter vector.
The F tests for the fixed, random, andmixed models differ.
Because the Mixed Model has the
parameter vector for both of these
and can estimate the error
covariance matrix for each, it can
provide the correct standard errors
for either thefixed or random effects.
-
8/8/2019 SPSSMixed
49/82
49
The Mixed Model
'
y X Z
where
fixed effects parameter estimates X fixed effects
Z Random effects parameter estimates
random effects
errors
Variance of y V ZGZ R
G and R require covariancestructure
fitting
F K I
F
KI
!
!!
!
!!
! !
Mixed Model Theory-
-
8/8/2019 SPSSMixed
50/82
50
Mixed Model Theory
contdLittle et al.(p.139) note that u and e areuncorrelated random variables with 0
means and covariances, G and R,
respectively.
' ,
( ' ) '
' ( )
Because the
covariance matrix
V ZGZ R
the solution for
X V X X V y
u GZ V y X
F
F
!
!
!
1 1
V-
is a generalized inverse. Because V is usuallysingular and noninvertible AVA = V- is an
augmented matrix that is invertible. It can later
be transformed back to V.
The G and R matrices must be positive definite.
In the Mixed procedure, the covariance type of
the random (generalized) effects defines thestructure of G and a repeated covariance type
defines structure of R.
Mixed Model
-
8/8/2019 SPSSMixed
51/82
51
Mixed Model
Assumptions
0u
EI
!
-
0
0
u GVariance
RI
!
- -
A linear relationship between dependent and
independent variables
Random Effects
-
8/8/2019 SPSSMixed
52/82
52
Random Effects
Covariance Structure This defines the structure of
the G matrix, the random
effects, in the mixed model.
Possible structures permitted
by current version of SPSS:
Scaled Identity Compound Symmetry
AR(1)
Huynh-Feldt
Structures of Repeated
-
8/8/2019 SPSSMixed
53/82
53
p
effects (R matrix)-contdVariance Components
W
W
WW
! -
2
1
2
2
23
2
4
0 0 0
0 0 0
0 0 0
0 0 0
Co pound Sy etry
W W W W W
W W W W W
W W W W W
W W W W W
! -
2 2 2 2 2
1 1 1 1
2 2 2 2 2
1 2 1 1
2 2 2 2 2
1 1 3 1
2 2 2 2 2
1 1 1 4
( )AR
V V V
V V V
V V V
V V V
! -
2 3
2
2
3 2
1
11
1
1
-
8/8/2019 SPSSMixed
54/82
-
8/8/2019 SPSSMixed
55/82
R matrix, defines the
correlation among
-
8/8/2019 SPSSMixed
56/82
56
correlation among
repeated random effects
.
.R
W W W W
W W W W
W W W W
W W W W
W W W W
W W W W
!
2
1 1 1
2
1 1 1
2
1 1 1
21 1 1
2
1 1 1
2
1 1 1
One can specify the nature of the correlation among the
repeated random effects.
GLM Mixed Model
-
8/8/2019 SPSSMixed
57/82
57
GLM Mixed Model
The General Linear Model is a special case of the
Mixed Model with Z = 0 (which means that
Zu disappears from the model) and
2R I
W!
Mixed Analysis of a Fixed
-
8/8/2019 SPSSMixed
58/82
58
Effects model
SPSS tests these fixed effects just as it does with the GLM
Procedure with type III sums of squares.
We analyze the breaking pressure of bonds made
from three metals. We assume that we do not
generalize beyond our sample and that our
effects are all fixed.
Tests of Fixed Effects is performed with the help of
the L matrix by constructing the following F test:
' '[ ( ' ) ']( )
L L X V X L LFrank L
F F
!1
Numerator df = rank(L)
Denominator df = RESID (n-rank(X)df = Satherth
Estimation:Newton
-
8/8/2019 SPSSMixed
59/82
59
Scoring
i i sH g
here
g gr adient atrix of stderivatives
H Hessian atrix of nd derivatives
s incre ent of step parameter
F F !
!
!
!
11
1
2
Estimation: Minimization
-
8/8/2019 SPSSMixed
60/82
60
of the objective functions
_ a
1
1
1
1 1
1 1
1( , ): log | | log ' (1 log(2 / ))
2 2 2
1( , ) : log | | log | ' |
2 2
log ' 1 log | 2 /( ) |2 2
( ' ) '
( ' ) '
( '
n nM L G R V r V r n
nREM L G R V X V X
n p n pr V r n p
here r y X X V X X V y
p rank of X
so that the probabilities ofX V X X V y and
GZ V
T
T
F
K
!
!
!
!
!
! 1( ' ) .y X are maximizedF
UsingNewton Scoring, the following functions are
minimized
Significance of
-
8/8/2019 SPSSMixed
61/82
61
Parameters
1 1
1 1 1
: 0
'
' '
' '
L is a linear combination
Ho
tLCL
where
X R X X R Z C
Z R X ZR Z G
F
K
F
K
F
K
!
-
- !
!
-
Test one covariance
structure against the other
-
8/8/2019 SPSSMixed
62/82
62
structure against the other
with the IC The rule of thumb is smaller is
better
-2LL
AIC Akaike
AICC Hurvich and Tsay
BIC Bayesian Info Criterion
Bozdogans CAIC
Measures of Lack of fit:
-
8/8/2019 SPSSMixed
63/82
63
The information Criteria-2LL is called the deviance. It is a
measure of sum of squared errors.
AIC = -2LL + 2p (p=# parms)
BIC = Schwartz Bayesian Info
criterion = 2LL + plog(n)
AICC= Hurvich and Tsays smallsample correction on AIC: -2LL +
2p(n/(n-p-1))
CAIC = -2LL + p(log(n) + 1)
Procedures for Fitting the
-
8/8/2019 SPSSMixed
64/82
64
Mixed Model One can use the LR test or the
lesser of the information criteria.
The smaller the informationcriterion, the better the model
happens to be.
We try to go from a larger to asmaller information criterion
when we fit the model.
LR test
-
8/8/2019 SPSSMixed
65/82
65
LR test
1. To test whether one model is
significantly better than the
other.2. To test random effect for
statistical significance
3. To test covariance structureimprovement
4. To test both.
5. Distributed as a6. With df= p2 p1 where pi =#
parms in model i
G2
Applying the LR test
-
8/8/2019 SPSSMixed
66/82
66
Applying the LR test
We obtain the -2LL from theunrestricted model.
We obtain the -2LL from therestricted model.
We subtract the latter from thelarger former.
That is a chi-square with df=the difference in the number ofparameters.
We can look this up anddetermine whether or not it isstatistically significant.
Advantages of the Mixed
-
8/8/2019 SPSSMixed
67/82
67
Model1. It can allow random effects to be
properly specified and computed,
unlike the GLM.2. It can allow correlation of errors,
unlike the GLM. It therefore has
more flexibility in modeling the error
covariance structure.3. It can allow the error terms to exhibit
nonconstant variability, unlike the
GLM, allowing more flexibility in
modeling the dependent variable.
4. It can handle missing data, whereas
the repeated measures GLM cannot.
Programming A Repeated
Measures ANOVA with
-
8/8/2019 SPSSMixed
68/82
68
PROC Mixed
Select the Mixed Linear Option in Analysis
Move subject ID into the
subjects box and the
-
8/8/2019 SPSSMixed
69/82
69
subjects box and the
repeated variable into the
repeated box.
Click on continue
We specify subjects and
repeated effects with the
-
8/8/2019 SPSSMixed
70/82
70
next dialog box
We set the repeated covariance type to Diagonal
& click on continue
Defining the Fixed
-
8/8/2019 SPSSMixed
71/82
71
EffectsWhen the next dialog box appears,
we insert the dependent Response
variable and the fixed effects ofanxiety and tension
Click on continue
We select the Fixed
-
8/8/2019 SPSSMixed
72/82
72
effects to be tested
Move them into the model box,
selecting main effects, and type III
-
8/8/2019 SPSSMixed
73/82
73
sum of squares
Click on continue
When the Linear Mixed
Models dialog box
-
8/8/2019 SPSSMixed
74/82
74
appears, select random
Under random effects, select
scaled identity as covariance
type and move subjects over into
-
8/8/2019 SPSSMixed
75/82
75
yp j
combinations
Click on continue
Select Statistics and check of the
following in the dialog box that
appears
-
8/8/2019 SPSSMixed
76/82
76
appears
Then click continue
When the Linear Mixed
Models box appears, click
-
8/8/2019 SPSSMixed
77/82
77
ok
You will get your tests
-
8/8/2019 SPSSMixed
78/82
78
Estimates of Fixed effects
and covariance
-
8/8/2019 SPSSMixed
79/82
79
parameters
R matrix
-
8/8/2019 SPSSMixed
80/82
80
Rerun the model with
different nested covariance
-
8/8/2019 SPSSMixed
81/82
81
structures and compare theinformation criteria
The lower the information criterion, the better fit
the nested model has. Caveat: If the models are
not nested, they cannot be compared with the
information criteria.
GLM vs. Mixed
-
8/8/2019 SPSSMixed
82/82
82
GLM has
means
lsmeans
sstype 1,2,3,4
estimates using OLS or WLSone has to program the correct F tests for randomeffects.
losses cases with missing values.
Mixed has
lsmeans
sstypes 1 and 3
estimates using maximum likelihood, general methodsof moments, or restricted maximum likelihood
ML
MIVQUE0
REML
gives correct std errors and confidence intervals for
random effectsAutomatically provides correct standard errors foranalysis.
Can handle missing values