contrast coding - learning research and development centercontrast coding contrasts: test...

Contrast CodingOr: One of These Levels is

Not Like the Others

Scott Fraundorf (and Tuan Lam)MLM Reading Group – 03.10.11

Administrivia

● 3/10 (TODAY): Contrast coding overview● 4/7: Simple vs main effects● 4/21: Principal components analysis● 1st week of May: Harald Baayen visit

Outline

● Why use contrast coding?● Example contrasts● Contrast estimates● Contrasts in R● Multiple comparisons● How does it work?● Other kinds of coding● Interactions

Why Use Contrast Coding?

● Scott's example study:

● Examining recall memory for spoken discourse as a function of:

● Location of disfluencies (categorical variable)● Prior story knowledge (continuous variable)

=LOCATION OF DISFLUENCY

SUBJECT ITEM

+ ++PRIOR

KNOWLEDGE

Why Use Contrast Coding?● Regression equation: Predicts values

● Could use this to predict whether or not something will be remembered

● But in cognitive psych:● Often interested in the effect of specific levels● Test which ones differ significantly


SUBJECT ITEM

+ ++PRIOR

KNOWLEDGE

Outline


Contrast Coding

● Example: Fluent vs. disfluencies in typical locations vs. in atypical locations

● Which ones differ significantly?

Typical Atypical Fluent0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8%

of s

tory

rec

alle

d

Contrast Coding● Contrasts: Test differences between

specific levels– Same as a planned comparison in an ANOVA

– Also analogous to a post-hoc test

● Planned comparisons vs post-hoc tests– If we are deciding tests post-hoc, greater chance

of capitalizing on chance / spurious effect

– Contrasts are set before you fit the model, but it would be possible to go back and change the contrasts afterwards

– We are basically on the honor system here—no way to prove the comparison was planned ahead of time

Contrasts!

● Contrasts like weighted sums of means– In multiple regression / MLM context, also

subject to other variables in the model

● Using your scale to test what's different

Typical Atypical Fluent

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

% o

f sto

ry r

ecal

led

Contrast CodingIt looks like the Fluentstories might not be

remembered as well.

Let's use a contrast totest this.

Contrasts

TYPICAL ATYPICAL FLUENT

Question 1: Do disfluencies affect recall?

Contrasts

Contrast weights are assigned

.33 .33 -.66

One side positive.One side negative.

This determines whichlevels are being

compared (+ versus -)

Doesn't really matterwhich side you choose

as the + side. It justaffects the sign of the

result, but notmagnitude or statistical

significanceTYPICAL ATYPICAL FLUENT

Contrasts

Contrast weights are assigned

.33 .33 -.66


Codes add up to zero.

Also nice to have theabsolute values of the+ code and the – code

sum to 1.(We'll see why later.)

abs(.33) + abs(-.66) = 1


Contrasts

Can conceptualize the comparison as:Contrast 1: .33(Typical) + .33 (Typical) - .66(Fluent)(holding other variables constant)

.33 .33 -.66



Does contrast differsignificantly from zero?

If so, difference betweenlevels is significant.


Contrasts

Contrast 1: .33(Typical) + .33 (Typical) - .66(Fluent)

.33 .33

-.66

*

TYPICAL ATYPICAL

FLUENT


0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

Contrast Coding

*%

of s

tory

rec

alle

d

Our first contrast revealsthat fluent stories areremembered worse.

Now let's look atTypical vs Atypical

We always have j – 1 contrasts, where j = the # of levels of the factorSo, here 2 contrasts needed to fully describe

Contrasts

TYPICAL ATYPICAL

Question 2: Does location of disfluencies matter?

Contrasts

Contrast 2: .50(Typical) - .50(Atypical) + 0(Rest)

.50 -.50



Sum of absolute valuesof codes is 1.

FLUENT(zeroed

out here!)

0TYPICAL ATYPICAL


0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

Contrast Coding

*%

of s

tory

rec

alle

d

n.s.

One Important Point!● Choice of contrasts doesn't affect total

variance accounted for by variable● Only about differences between levels● Can divide this up in multiple different ways

and still account for same total variance

LOCATION IN STORY

Outline


Why -.5 and .5?● Why [-.5 .5] instead of [-1 1]?● Doesn't affect significance test● Does affect β weight (estimate)

– Std error is also scaled accordingly

FILLER LOCATION:[-1 1]

FILLERLOCATION:[-.5 .5]

Contrast Estimates

ATYPICAL LOCATION

TYPICAL LOCATION

.5

-.5

CONTRAST CODE

}1

Beta weight (estimate) represents the effect of a 1-unit change in the contrast, holding everything else constant

In this case, a 1-unit change in contrast IS the difference between the levels' codes

Thus, the contrast correctly represents .04825 as the difference between the conditions

Contrast Estimates

ATYPICAL LOCATION

TYPICAL LOCATION

1

-1

CONTRAST CODE

}2

Here, the total difference between the levels' codes is 2

So, a 1-unit change in the contrast is only HALF the difference between the levels' codes

Thus, the estimate of the contrast is .024 … only half the difference between the conditions

Contrast Estimates

ATYPICAL LOCATION

TYPICAL LOCATION

.5

-.5

CONTRAST CODE

}1

ATYPICAL LOCATION

TYPICAL LOCATION

1

-1

CONTRAST CODE

1 unit change in contrast IS the difference between levels (.04825 in this case)

1 unit change in contrast IS only half the difference between levels

}2

Beta weight (estimate) represents the effect of a 1-unit change in the contrast

So Why -.5 and .5?● Better tell you about difference in means!

– The actual difference between conditions is .048

– It would be perfectly correct to describe .024 as half the difference between levels and you could even put a CI around it … it's just less intuitive for your readers

FILLER LOCATION:[-1 1]

FILLERLOCATION:[-.5 .5]

So Why -.5 and .5?● Better tell you about difference in means!

– The actual difference between conditions is .048

– It would be perfectly correct to describe .024 as half the difference between levels and you could even put a CI around it … it's just less intuitive for your readers

● Both contrasts would account for the same amount of variance

● This is just another case of deciding the scale of a variable

– Akin to measuring temperature in C versus F … both account for the same variance, but the numbers are on different scales

Imbalanced Designs

● You may have an unequal number of observations per cell– e.g. some data lost,

or responses notcodable

● Correct for thisin your contrast codes if you want things centered– Ask Tuan or Scott

about how to do this :)

Outline


Contrasts in R● To check what the current contrasts are:

– contrasts(YourDataFrame$VariableName)

● To set the contrasts:– contrasts(YourDataFrame$VariableName) =

cbind(c(.33,.33,-.66),c(.50,-.50,0))

● Each c(xx,yy,zz) is the weights for one of the contrasts you want to run

● e.g. (.33, .33, -.66) is one contrast

● After setting contrasts, run lmer model to get the results of the contrasts

Contrasts in R

● Should have j – 1 contrasts, where k = # of levels of the factor

● If using a subset of data, some levels of the factor may no longer be present

– e.g. you dropped a condition

– But, R still “remembers” that these levels exist and will get mad you didn't specify enough contrasts

– Fix this by reconverting to a factor:● YourDataFrame$Variable =

factor(YourDataFrame$Variable)

Another R Tip

● To see the mean of each level of an I.V.:– tapply(YourDataFrame$DVName,

YourDataFrame$IVName,mean)

– Could also do median, sd, etc.

● For a 2-way (or more!) table– tapply(YourDataFrame$DVName,

list(YourDataFrame$IVName1, YourDataFrame$IVName2), mean)

● Doesn't work if you have missing values

– But Tuan has made a version of tapply that fixes this problem

Outline


Multiple Comparisons(Here Comes Trouble!)

Multiple Comparisons

● Lots of comparisons you can run● Suppose we tested both young & older

adults on the disfluency task:

FLUENT /YOUNGER

FLUENT /OLDER

TYPICAL /YOUNGER

TYPICAL /OLDER

ATYPICAL /YOUNGER

ATYPICAL /OLDER


● Some comparisons are (wholly or partial) redundant

● Suppose we find typical > fluent, but typical and atypical don't reliably differ

● Should expect atypical > fluent (to at least some degree)

● Or, we find a main effect of age● Would expect to find an effect of age

within at least some conditions if we looked at them individually


● Some comparisons are (wholly or partial) redundant

● j – 1 contrasts actually describe everything● j = # of levels

FLUENT

MEAN OF:TypicalAtypical

.35730}TYPICAL

ATYPICAL}.04825Can calculate all

differences between levels based on this!

Multiple Comparisons● Want to avoid multiple comparisons

● Error rate increases if you run overlapping, redundant tests

● Suppose we have the wrong value for one of means (due to sampling error, etc.)

● In a single test, we set alpha so there is a 5% chance of incorrectly rejecting H

0 .05

Multiple Comparisons● But now we run a 2nd test comparing that

same “bad” condition to another condition● Outcome of this test is correlated with the

previous one since they both refer to one of the same conditions

● Not an independent 5% chance of error● Multiple tests compound Type I error rate

Orthogonality● Avoid this issue w/ orthogonal contrasts

– Products of weights (across contrasts) sum to 0

– Matrix of contrast is made up of orthogonal vectors

– Can think of this as the contrasts being uncorrelated with each other



.25

.25

-.5

.33

.33

-.66

.50

-.50

0

.165

-.165

0

x =

= 0

CONTRAST 1 CONTRAST 2 PRODUCT

TYPICAL

ATYPICAL

FLUENT +

x

x



.25

.25

-.5

.50

-.50

0

.50

0

-.50

.25

.0

.0

x =

= .25

CONTRAST 1 CONTRAST 2 PRODUCT

TYPICAL

ATYPICAL

FLUENT +

x

x

Corrections

● “But, Scott, I really want to do more than j – 1 comparisons”

● Can apply corrections to control Type I error

● Bonferroni: Multiply p value by # of comparisons

– Worst case scenario

● Less conservative corrections may be available

Outline


How Does it Work?


SUBJECT ITEM

+ ++PRIOR

KNOWLEDGE

Behind the scenes...

How Does it Work?

β2X

2 + β

3X

3 + ...Y=β

0● Each categorical factor gets coded as

j - 1 variables● j = number of levels in that factor● Number of contrasts you have

β0+ β

1X

1 +


SUBJECT ITEM

+ ++PRIOR

KNOWLEDGE

How Does it Work?

● Each coded variable represents one of your contrasts

β2X

2 + β

3X

3 + ...Y=β

0β

0+ β

1X

1 +

.33

.33-.66

CONTRAST 1

X2 =

if typical location for disfluenciesif atypical

if fluent

Value of

contrast: β2

● Sig. difference between levelsif β differsfrom 0

Outline


Other Kinds of Coding● Dummy/Treatment

Coding– Compare all levels to a

baseline level

– Doesn't allow direct comparisons between non-baseline levels

– R does this by default :(

100

010

TypicalAtypicalFluent

X2X2 X3

Other Kinds of Coding● Dummy/Treatment Coding

– Compare all levels to a baseline level

– Doesn't allow comparisons between levels

– R does this by default :(

● Sum/Effects Coding– Test whether each level

differs from overall mean or from chance

Outline


Contrasts & Interactions

● Contrasts also apply in cases where we have interactions between variables

● Interaction term represents whether the value of the contrast depends on another variable

● We'll see some examples on the next slides

Interaction Example● Suppose we also sampled different age

groups in the disfluency experiment– 3 x 2 design

● What are possible patterns of results?

Fluent,young

Typical disfluencies,young

Atypical disfluencies,young

Fluent,older

Typical disfluencies,older

Atypical disfluencies,older

YOUNGADULTS

OLDER ADULTS

Gro

up

FLUENT TYPICAL ATYPICAL

Story Type

Possible Result 1

● Contrast 1 significant– Effect of disfluencies

● Contrast 2 non-sig.– Location irrelevant

● No effect of age at all in this case

– Everything the same for both age groups

YO

UN

GO

LD

ER

Before Plot Point After Plot Point Rest of Story

0

1

2

3

4

5

6

7

8

9


0

1

2

3

4

5

6

7

8

9

CONTRAST 1

CONTRAST 2 no

AGE no

CONTRAST 1 yes

C1 x AGE no

C2 x AGE no

SIGNIFICANT?

Possible Result 2

● Contrast 2 is now significant

– Typical > atypical

● Still no effect of AGE

CONTRAST 1

CONTRAST 2 yes

AGE no

CONTRAST 1 yes

C1 x AGE no

C2 x AGE no

SIGNIFICANT?


0

1

2

3

4

5

6

7

8

9


0

1

2

3

4

5

6

7

8

9

YO

UN

GO

LD

ER

Possible Result 3

● Now, AGE effect– Older adults remember

more across the board

● But, no interaction– Disfluency effect is the

same under both load conditions

CONTRAST 1

CONTRAST 2 yes

AGE yes

CONTRAST 1 yes

C1 x AGE no

C2 x AGE no

SIGNIFICANT?


0

1

2

3

4

5

6

7

8

9


0

1

2

3

4

5

6

7

8

9

YO

UN

GO

LD

ER

Possible Result 4

● Contrast 1 interacts with AGE

– Presence of disfluencies differs across age

● Effect only foryoung adults

● Contrast 2 (location) still same in all cases

CONTRAST 1

CONTRAST 2 yes

AGE yes

CONTRAST 1 yes

C1 x AGE yes

C2 x AGE no

SIGNIFICANT?


0

1

2

3

4

5

6

7

8

9


0

1

2

3

4

5

6

7

8

9

YO

UN

GO

LD

ER

Possible Result 5

● Now, Contrast 2 also interacts with AGE

– Reversal of Typical vs Atypical effect across age

CONTRAST 1

CONTRAST 2 yes

AGE yes

CONTRAST 1 yes

C1 x AGE yes

C2 x AGE yes

SIGNIFICANT?


0

1

2

3

4

5

6

7

8

9


0

1

2

3

4

5

6

7

8

9

YO

UN

GO

LD

ER

Possible Result 6

● Contrast 2 interaction but not Contrast 1

– Typical vs Atypical comparison does depend on age

– Overall effect of having fillers does not

CONTRAST 1

CONTRAST 2 yes

AGE yes

CONTRAST 1 yes

C1 x AGE no

C2 x AGE yes

SIGNIFICANT?


0

1

2

3

4

5

6

7

8

9


0

1

2

3

4

5

6

7

8

9

YO

UN

GO

LD

ER

Interactions in R● Implementing interactions in an R model

formula (lmer or otherwise):– A + B

● Main effects of A and B, no interaction– A * B

● All possible interactions and main effects of A and B

– A : B

● Interaction of A and B, no main effect (unless you add it separately)

● In, say, a corpus analysis with 20 predictors, you wouldn't want to test a 20-way interaction … but this lets you control what to include

contrast coding - learning research and development centercontrast coding contrasts: test...

Documents