contrast coding - learning research and development centercontrast coding contrasts: test...
TRANSCRIPT
Contrast CodingOr: One of These Levels is
Not Like the Others
Scott Fraundorf (and Tuan Lam)MLM Reading Group – 03.10.11
Administrivia
● 3/10 (TODAY): Contrast coding overview● 4/7: Simple vs main effects● 4/21: Principal components analysis● 1st week of May: Harald Baayen visit
Outline
● Why use contrast coding?● Example contrasts● Contrast estimates● Contrasts in R● Multiple comparisons● How does it work?● Other kinds of coding● Interactions
Why Use Contrast Coding?
● Scott's example study:
● Examining recall memory for spoken discourse as a function of:
● Location of disfluencies (categorical variable)● Prior story knowledge (continuous variable)
=LOCATION OF DISFLUENCY
SUBJECT ITEM
+ ++PRIOR
KNOWLEDGE
Why Use Contrast Coding?● Regression equation: Predicts values
● Could use this to predict whether or not something will be remembered
● But in cognitive psych:● Often interested in the effect of specific levels● Test which ones differ significantly
=LOCATION OF DISFLUENCY
SUBJECT ITEM
+ ++PRIOR
KNOWLEDGE
Outline
● Why use contrast coding?● Example contrasts● Contrast estimates● Contrasts in R● Multiple comparisons● How does it work?● Other kinds of coding● Interactions
Contrast Coding
● Example: Fluent vs. disfluencies in typical locations vs. in atypical locations
● Which ones differ significantly?
Typical Atypical Fluent0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8%
of s
tory
rec
alle
d
Contrast Coding● Contrasts: Test differences between
specific levels– Same as a planned comparison in an ANOVA
– Also analogous to a post-hoc test
● Planned comparisons vs post-hoc tests– If we are deciding tests post-hoc, greater chance
of capitalizing on chance / spurious effect
– Contrasts are set before you fit the model, but it would be possible to go back and change the contrasts afterwards
– We are basically on the honor system here—no way to prove the comparison was planned ahead of time
Contrasts!
● Contrasts like weighted sums of means– In multiple regression / MLM context, also
subject to other variables in the model
● Using your scale to test what's different
Typical Atypical Fluent
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
% o
f sto
ry r
ecal
led
Contrast CodingIt looks like the Fluentstories might not be
remembered as well.
Let's use a contrast totest this.
Contrasts
TYPICAL ATYPICAL FLUENT
Question 1: Do disfluencies affect recall?
Contrasts
Contrast weights are assigned
.33 .33 -.66
One side positive.One side negative.
This determines whichlevels are being
compared (+ versus -)
Doesn't really matterwhich side you choose
as the + side. It justaffects the sign of the
result, but notmagnitude or statistical
significanceTYPICAL ATYPICAL FLUENT
Contrasts
Contrast weights are assigned
.33 .33 -.66
One side positive.One side negative.
Codes add up to zero.
Also nice to have theabsolute values of the+ code and the – code
sum to 1.(We'll see why later.)
abs(.33) + abs(-.66) = 1
TYPICAL ATYPICAL FLUENT
Contrasts
Can conceptualize the comparison as:Contrast 1: .33(Typical) + .33 (Typical) - .66(Fluent)(holding other variables constant)
.33 .33 -.66
One side positive.One side negative.
Codes add up to zero.
Does contrast differsignificantly from zero?
If so, difference betweenlevels is significant.
TYPICAL ATYPICAL FLUENT
Contrasts
Contrast 1: .33(Typical) + .33 (Typical) - .66(Fluent)
.33 .33
-.66
*
TYPICAL ATYPICAL
FLUENT
Typical Atypical Fluent0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
Contrast Coding
*%
of s
tory
rec
alle
d
Our first contrast revealsthat fluent stories areremembered worse.
Now let's look atTypical vs Atypical
We always have j – 1 contrasts, where j = the # of levels of the factorSo, here 2 contrasts needed to fully describe
Contrasts
TYPICAL ATYPICAL
Question 2: Does location of disfluencies matter?
Contrasts
Contrast 2: .50(Typical) - .50(Atypical) + 0(Rest)
.50 -.50
One side positive.One side negative.
Codes add up to zero.
Sum of absolute valuesof codes is 1.
FLUENT(zeroed
out here!)
0TYPICAL ATYPICAL
Typical Atypical Fluent0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
Contrast Coding
*%
of s
tory
rec
alle
d
n.s.
One Important Point!● Choice of contrasts doesn't affect total
variance accounted for by variable● Only about differences between levels● Can divide this up in multiple different ways
and still account for same total variance
LOCATION IN STORY
Outline
● Why use contrast coding?● Example contrasts● Contrast estimates● Contrasts in R● Multiple comparisons● How does it work?● Other kinds of coding● Interactions
Why -.5 and .5?● Why [-.5 .5] instead of [-1 1]?● Doesn't affect significance test● Does affect β weight (estimate)
– Std error is also scaled accordingly
FILLER LOCATION:[-1 1]
FILLERLOCATION:[-.5 .5]
Contrast Estimates
ATYPICAL LOCATION
TYPICAL LOCATION
.5
-.5
CONTRAST CODE
}1
Beta weight (estimate) represents the effect of a 1-unit change in the contrast, holding everything else constant
In this case, a 1-unit change in contrast IS the difference between the levels' codes
Thus, the contrast correctly represents .04825 as the difference between the conditions
Contrast Estimates
ATYPICAL LOCATION
TYPICAL LOCATION
1
-1
CONTRAST CODE
}2
Here, the total difference between the levels' codes is 2
So, a 1-unit change in the contrast is only HALF the difference between the levels' codes
Thus, the estimate of the contrast is .024 … only half the difference between the conditions
Contrast Estimates
ATYPICAL LOCATION
TYPICAL LOCATION
.5
-.5
CONTRAST CODE
}1
ATYPICAL LOCATION
TYPICAL LOCATION
1
-1
CONTRAST CODE
1 unit change in contrast IS the difference between levels (.04825 in this case)
1 unit change in contrast IS only half the difference between levels
}2
Beta weight (estimate) represents the effect of a 1-unit change in the contrast
So Why -.5 and .5?● Better tell you about difference in means!
– The actual difference between conditions is .048
– It would be perfectly correct to describe .024 as half the difference between levels and you could even put a CI around it … it's just less intuitive for your readers
FILLER LOCATION:[-1 1]
FILLERLOCATION:[-.5 .5]
So Why -.5 and .5?● Better tell you about difference in means!
– The actual difference between conditions is .048
– It would be perfectly correct to describe .024 as half the difference between levels and you could even put a CI around it … it's just less intuitive for your readers
● Both contrasts would account for the same amount of variance
● This is just another case of deciding the scale of a variable
– Akin to measuring temperature in C versus F … both account for the same variance, but the numbers are on different scales
Imbalanced Designs
● You may have an unequal number of observations per cell– e.g. some data lost,
or responses notcodable
● Correct for thisin your contrast codes if you want things centered– Ask Tuan or Scott
about how to do this :)
Outline
● Why use contrast coding?● Example contrasts● Contrast estimates● Contrasts in R● Multiple comparisons● How does it work?● Other kinds of coding● Interactions
Contrasts in R● To check what the current contrasts are:
– contrasts(YourDataFrame$VariableName)
● To set the contrasts:– contrasts(YourDataFrame$VariableName) =
cbind(c(.33,.33,-.66),c(.50,-.50,0))
● Each c(xx,yy,zz) is the weights for one of the contrasts you want to run
● e.g. (.33, .33, -.66) is one contrast
● After setting contrasts, run lmer model to get the results of the contrasts
Contrasts in R
● Should have j – 1 contrasts, where k = # of levels of the factor
● If using a subset of data, some levels of the factor may no longer be present
– e.g. you dropped a condition
– But, R still “remembers” that these levels exist and will get mad you didn't specify enough contrasts
– Fix this by reconverting to a factor:● YourDataFrame$Variable =
factor(YourDataFrame$Variable)
Another R Tip
● To see the mean of each level of an I.V.:– tapply(YourDataFrame$DVName,
YourDataFrame$IVName,mean)
– Could also do median, sd, etc.
● For a 2-way (or more!) table– tapply(YourDataFrame$DVName,
list(YourDataFrame$IVName1, YourDataFrame$IVName2), mean)
● Doesn't work if you have missing values
– But Tuan has made a version of tapply that fixes this problem
Outline
● Why use contrast coding?● Example contrasts● Contrast estimates● Contrasts in R● Multiple comparisons● How does it work?● Other kinds of coding● Interactions
Multiple Comparisons(Here Comes Trouble!)
Multiple Comparisons
● Lots of comparisons you can run● Suppose we tested both young & older
adults on the disfluency task:
FLUENT /YOUNGER
FLUENT /OLDER
TYPICAL /YOUNGER
TYPICAL /OLDER
ATYPICAL /YOUNGER
ATYPICAL /OLDER
Multiple Comparisons
● Some comparisons are (wholly or partial) redundant
● Suppose we find typical > fluent, but typical and atypical don't reliably differ
● Should expect atypical > fluent (to at least some degree)
● Or, we find a main effect of age● Would expect to find an effect of age
within at least some conditions if we looked at them individually
Multiple Comparisons
● Some comparisons are (wholly or partial) redundant
● j – 1 contrasts actually describe everything● j = # of levels
FLUENT
MEAN OF:TypicalAtypical
.35730}TYPICAL
ATYPICAL}.04825Can calculate all
differences between levels based on this!
Multiple Comparisons● Want to avoid multiple comparisons
● Error rate increases if you run overlapping, redundant tests
● Suppose we have the wrong value for one of means (due to sampling error, etc.)
● In a single test, we set alpha so there is a 5% chance of incorrectly rejecting H
0 .05
Multiple Comparisons● But now we run a 2nd test comparing that
same “bad” condition to another condition● Outcome of this test is correlated with the
previous one since they both refer to one of the same conditions
● Not an independent 5% chance of error● Multiple tests compound Type I error rate
Orthogonality● Avoid this issue w/ orthogonal contrasts
– Products of weights (across contrasts) sum to 0
– Matrix of contrast is made up of orthogonal vectors
– Can think of this as the contrasts being uncorrelated with each other
Orthogonality● Avoid this issue w/ orthogonal contrasts
– Products of weights (across contrasts) sum to 0
.25
.25
-.5
.33
.33
-.66
.50
-.50
0
.165
-.165
0
x =
= 0
CONTRAST 1 CONTRAST 2 PRODUCT
TYPICAL
ATYPICAL
FLUENT +
x
x
Orthogonality● Avoid this issue w/ orthogonal contrasts
– Products of weights (across contrasts) sum to 0
.25
.25
-.5
.50
-.50
0
.50
0
-.50
.25
.0
.0
x =
= .25
CONTRAST 1 CONTRAST 2 PRODUCT
TYPICAL
ATYPICAL
FLUENT +
x
x
Corrections
● “But, Scott, I really want to do more than j – 1 comparisons”
● Can apply corrections to control Type I error
● Bonferroni: Multiply p value by # of comparisons
– Worst case scenario
● Less conservative corrections may be available
Outline
● Why use contrast coding?● Example contrasts● Contrast estimates● Contrasts in R● Multiple comparisons● How does it work?● Other kinds of coding● Interactions
How Does it Work?
=LOCATION OF DISFLUENCY
SUBJECT ITEM
+ ++PRIOR
KNOWLEDGE
Behind the scenes...
How Does it Work?
β2X
2 + β
3X
3 + ...Y=β
0● Each categorical factor gets coded as
j - 1 variables● j = number of levels in that factor● Number of contrasts you have
β0+ β
1X
1 +
=LOCATION OF DISFLUENCY
SUBJECT ITEM
+ ++PRIOR
KNOWLEDGE
How Does it Work?
● Each coded variable represents one of your contrasts
β2X
2 + β
3X
3 + ...Y=β
0β
0+ β
1X
1 +
.33
.33-.66
CONTRAST 1
X2 =
if typical location for disfluenciesif atypical
if fluent
Value of
contrast: β2
● Sig. difference between levelsif β differsfrom 0
Outline
● Why use contrast coding?● Example contrasts● Contrast estimates● Contrasts in R● Multiple comparisons● How does it work?● Other kinds of coding● Interactions
Other Kinds of Coding● Dummy/Treatment
Coding– Compare all levels to a
baseline level
– Doesn't allow direct comparisons between non-baseline levels
– R does this by default :(
100
010
TypicalAtypicalFluent
X2X2 X3
Other Kinds of Coding● Dummy/Treatment Coding
– Compare all levels to a baseline level
– Doesn't allow comparisons between levels
– R does this by default :(
● Sum/Effects Coding– Test whether each level
differs from overall mean or from chance
Outline
● Why use contrast coding?● Example contrasts● Contrast estimates● Contrasts in R● Multiple comparisons● How does it work?● Other kinds of coding● Interactions
Contrasts & Interactions
● Contrasts also apply in cases where we have interactions between variables
● Interaction term represents whether the value of the contrast depends on another variable
● We'll see some examples on the next slides
Interaction Example● Suppose we also sampled different age
groups in the disfluency experiment– 3 x 2 design
● What are possible patterns of results?
Fluent,young
Typical disfluencies,young
Atypical disfluencies,young
Fluent,older
Typical disfluencies,older
Atypical disfluencies,older
YOUNGADULTS
OLDER ADULTS
Gro
up
FLUENT TYPICAL ATYPICAL
Story Type
Possible Result 1
● Contrast 1 significant– Effect of disfluencies
● Contrast 2 non-sig.– Location irrelevant
● No effect of age at all in this case
– Everything the same for both age groups
YO
UN
GO
LD
ER
Before Plot Point After Plot Point Rest of Story
0
1
2
3
4
5
6
7
8
9
Before Plot Point After Plot Point Rest of Story
0
1
2
3
4
5
6
7
8
9
CONTRAST 1
CONTRAST 2 no
AGE no
CONTRAST 1 yes
C1 x AGE no
C2 x AGE no
SIGNIFICANT?
Possible Result 2
● Contrast 2 is now significant
– Typical > atypical
● Still no effect of AGE
CONTRAST 1
CONTRAST 2 yes
AGE no
CONTRAST 1 yes
C1 x AGE no
C2 x AGE no
SIGNIFICANT?
Before Plot Point After Plot Point Rest of Story
0
1
2
3
4
5
6
7
8
9
Before Plot Point After Plot Point Rest of Story
0
1
2
3
4
5
6
7
8
9
YO
UN
GO
LD
ER
Possible Result 3
● Now, AGE effect– Older adults remember
more across the board
● But, no interaction– Disfluency effect is the
same under both load conditions
CONTRAST 1
CONTRAST 2 yes
AGE yes
CONTRAST 1 yes
C1 x AGE no
C2 x AGE no
SIGNIFICANT?
Before Plot Point After Plot Point Rest of Story
0
1
2
3
4
5
6
7
8
9
Before Plot Point After Plot Point Rest of Story
0
1
2
3
4
5
6
7
8
9
YO
UN
GO
LD
ER
Possible Result 4
● Contrast 1 interacts with AGE
– Presence of disfluencies differs across age
● Effect only foryoung adults
● Contrast 2 (location) still same in all cases
CONTRAST 1
CONTRAST 2 yes
AGE yes
CONTRAST 1 yes
C1 x AGE yes
C2 x AGE no
SIGNIFICANT?
Before Plot Point After Plot Point Rest of Story
0
1
2
3
4
5
6
7
8
9
Before Plot Point After Plot Point Rest of Story
0
1
2
3
4
5
6
7
8
9
YO
UN
GO
LD
ER
Possible Result 5
● Now, Contrast 2 also interacts with AGE
– Reversal of Typical vs Atypical effect across age
CONTRAST 1
CONTRAST 2 yes
AGE yes
CONTRAST 1 yes
C1 x AGE yes
C2 x AGE yes
SIGNIFICANT?
Before Plot Point After Plot Point Rest of Story
0
1
2
3
4
5
6
7
8
9
Before Plot Point After Plot Point Rest of Story
0
1
2
3
4
5
6
7
8
9
YO
UN
GO
LD
ER
Possible Result 6
● Contrast 2 interaction but not Contrast 1
– Typical vs Atypical comparison does depend on age
– Overall effect of having fillers does not
CONTRAST 1
CONTRAST 2 yes
AGE yes
CONTRAST 1 yes
C1 x AGE no
C2 x AGE yes
SIGNIFICANT?
Before Plot Point After Plot Point Rest of Story
0
1
2
3
4
5
6
7
8
9
Before Plot Point After Plot Point Rest of Story
0
1
2
3
4
5
6
7
8
9
YO
UN
GO
LD
ER
Interactions in R● Implementing interactions in an R model
formula (lmer or otherwise):– A + B
● Main effects of A and B, no interaction– A * B
● All possible interactions and main effects of A and B
– A : B
● Interaction of A and B, no main effect (unless you add it separately)
● In, say, a corpus analysis with 20 predictors, you wouldn't want to test a 20-way interaction … but this lets you control what to include