los angeles r users group - july 12 2011 - part 1

21
Using R for multilevel modeling of salmon habitat Yasmin Lucero, Statistical Consultant Kelly Burnett, PNW Research Station, USFS Kelly Christiansen, PNW Research Station, USFS E. Ashley Steel, PNW Research Station, USFS Eli Holmes, NW Fisheries Science Center, NOAA Acknowledgements: NRC-RAP, National Academy of Sciences ISEMP Monitoring Program, NOAA

Upload: rusersla

Post on 03-Jul-2015

343 views

Category:

Travel


0 download

TRANSCRIPT

Page 1: Los Angeles R users group - July 12 2011 - Part 1

Using R for multilevel modeling of salmon habitatYasmin Lucero, Statistical Consultant

Kelly Burnett, PNW Research Station, USFSKelly Christiansen, PNW Research Station, USFSE. Ashley Steel, PNW Research Station, USFSEli Holmes, NW Fisheries Science Center, NOAA

Acknowledgements: NRC-RAP, National Academy of SciencesISEMP Monitoring Program, NOAA

Page 2: Los Angeles R users group - July 12 2011 - Part 1

Outline

• Background on fish ecology and the data

• Background on multilevel modeling

• Demo of lme4 package in R

Page 3: Los Angeles R users group - July 12 2011 - Part 1

Schooling Juvenile Coho Salmon

The big goal: measure effect of stream habitat quality on fish survival

Photo by David Wolman

Page 4: Los Angeles R users group - July 12 2011 - Part 1

Land Area Affected by Land Area Affected by

Endangered SpeciesEndangered Species

Act Listings of SalmonAct Listings of Salmon

& Steelhead& Steelhead

* 28 distinct population segments:

6 endangered, 22 threatened

* 176,000 sq. miles in Washington,

Oregon, Idaho & California

* 61% of Washington’s land area,

55% of Oregon’s, 26% of Idaho’s, &

32% of California’s

February 2008

study area

Page 5: Los Angeles R users group - July 12 2011 - Part 1

The Data

~266 study sites

Oregon coastal region

juvenile coho salmon habitat

sparsely sampled, longitudinal study design

12 year time series

35 data layers

~100 landscape level variates

~22 habitat level variates

Oregon

Page 6: Los Angeles R users group - July 12 2011 - Part 1

Abundance increases over time due to variation in Ocean conditions (i.e. external to our analysis)

coho.obs

fs.year

fs.coho.obs

0

2

4

6

8

●●●

●●●

●●●●

●●

●●

●●

●●●

●●

●●

1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

● ●●

1998 2000 2002 2004 2006 20080.2

0.4

0.6

0.8

1.0

coho.obs

year

coefficient

Page 7: Los Angeles R users group - July 12 2011 - Part 1

Sparsely sampled longitudinal data

year

fs.coho.obs

0.00.51.01.52.02.53.0

0.00.51.01.52.02.53.0

0.00.51.01.52.02.53.0

0.00.51.01.52.02.53.0

17100201010102

●●

17100203040402

●●●● ●●

17100204050303

●●

●●

17100206010603

●●●● ●

●●

199820002002200420062008

17100202030201

● ●●

●●

●●

●●

17100203040602

●●● ●

●●

●●

17100205040105

●●● ● ●

17100303080202

● ● ● ●

199820002002200420062008

17100203020501

●●●●

●●

17100203070101●

17100205070202

●● ● ●● ● ●●●● ● ●

17100304010604

●●● ●

●●

●●●● ● ●

199820002002200420062008

17100203020902

●● ●

● ●● ●●

17100203090101

●● ●● ●●● ●●

17100206010504

●●

●●

17100305060202

●● ● ●●

●●

199820002002200420062008

• Only fish data has time component• year effects exogenous• Landscape data everywhere• Habitat data some places• Fish data some places• Not always same places

Figure Legend. Mean density of coho at 16 frequently visited sites for 1998–2009

Page 8: Los Angeles R users group - July 12 2011 - Part 1

How the landscape data is acquired

GIS map layerssummarize across area surrounding

study site

Page 9: Los Angeles R users group - July 12 2011 - Part 1

shallow, highly channelized

high structure: rocks and woody debris

habitat level data is collected by survey visits: labor intensive to collect/therefore less abundant

gradientpool densitydebrisflow ratesdrainage areachannel widthetc.

Page 10: Los Angeles R users group - July 12 2011 - Part 1

Multilevel structure for two reasons

Page 11: Los Angeles R users group - July 12 2011 - Part 1

landscape

habitat

fish

Multi-level structure for two reasons: (1) longitudinal sampling design(2) varying scales of predictors

Page 12: Los Angeles R users group - July 12 2011 - Part 1

Generalized linear mixed models(aka hierarchical, multilevel, or random effects models)

state

school

class

class

class

school

class

class

class

school

class

class

class

student_score ~ class_average + school_average + state_average

canonical example: school test scores

Page 13: Los Angeles R users group - July 12 2011 - Part 1

state

school 2school 1 school 4school 3

class 1 class 2 class 3 class 4

student 1 student 2 student 3 student 4

state level predictors

school level predictors

class level predictors

student level predictors

Norm(0, !2state)

Norm(µstate1, !2school)

Norm(µschool1, !2class)

Norm(µclass3, !2student)

Page 14: Los Angeles R users group - July 12 2011 - Part 1

Our model structure is not so complicated

global

site 1 site 2 site 3 site 4

obs 1 obs 2 obs 3 obs 4

landscape level predictors

habitat level predictors& year effects

Page 15: Los Angeles R users group - July 12 2011 - Part 1

Modeling presence/absence of fish:logistic mixed model with site and year effects

logit(Pr{yi = 1}) = !yearxy + !1xh1 + !1xh2 + "site! ! year+ "h1xh1 + "h2xh2 + ...+ #site

!site ! Norm("l1xl1 + "l2xl2 + ... ,#2site)

year effects

habitat level predictors

site effects

landscape level predictors

Page 16: Los Angeles R users group - July 12 2011 - Part 1

Fit a lot of models, some predictors rose to the top

−620 −600 −580 −560 −540 −520

1100

1150

1200

1250

1300

logLik

AIC

m1m2

m3m4m5m6

m7m8

m9

m10

m11

m12

m13

m14

m15

m16

m17m18

m19

m20

m21

m22m23m24

m25m26m27m28

m29m30m31

m32m33m34

Best predictors:

gradientdebris leveldrainage area

mean elevation

Page 17: Los Angeles R users group - July 12 2011 - Part 1

Overall model performance is strong at some things, weak at others

fitted probabilities

fitted(models.ls$m24)

coun

t

0

200

400

600

800

0.0 0.2 0.4 0.6 0.8 1.0

●●●●

●●●●

●●●

●●●

●●

●●●●●

●●●●

●●

●●●

●●

●●

●●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●

●●

●●●

●●●●

●●●●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●●●

0 1

0.0

0.2

0.4

0.6

0.8

1.0

fitte

d pr

obab

ilitie

s

presenceabsence

fitte

d pr

obab

ility

histogram of fitted probabilities

Page 18: Los Angeles R users group - July 12 2011 - Part 1

Another look at model fit: some heavy outliers

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.4

0.8

fitted

p/a

of coho o

bs (

data

)

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

~

pa.obsfs.year + (fs.grad.rs + fs.cfs.down.rs + fs.vol.len.rs + el.mean.rs | catchment) - 1

Page 19: Los Angeles R users group - July 12 2011 - Part 1

conclusions

• site matters

• we can explain about half of the variation in why site matters with 4-5 predictors

• habitat data more valuable than landscape data

• small number of predictions are very wrong, and we can’t seem to improve them

Page 20: Los Angeles R users group - July 12 2011 - Part 1

Thanks. [email protected]

Page 21: Los Angeles R users group - July 12 2011 - Part 1

Model predicted probabilities given presence/absence with and without site effects

FALSE TRUE

0.0

0.2

0.4

0.6

0.8

1.0

m0

coho presence

Pr{c

oho

pres

ent}

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●●●

●●●●●●

●●

●●

●●

●●

●●●

FALSE TRUE0.

00.

20.

40.

60.

81.

0

m1

coho presence

Pr{c

oho

pres

ent}