los angeles r users group - july 12 2011 - part 1

Post on 03-Jul-2015

343 Views

Category:

Travel

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Using R for multilevel modeling of salmon habitatYasmin Lucero, Statistical Consultant

Kelly Burnett, PNW Research Station, USFSKelly Christiansen, PNW Research Station, USFSE. Ashley Steel, PNW Research Station, USFSEli Holmes, NW Fisheries Science Center, NOAA

Acknowledgements: NRC-RAP, National Academy of SciencesISEMP Monitoring Program, NOAA

Outline

• Background on fish ecology and the data

• Background on multilevel modeling

• Demo of lme4 package in R

Schooling Juvenile Coho Salmon

The big goal: measure effect of stream habitat quality on fish survival

Photo by David Wolman

Land Area Affected by Land Area Affected by

Endangered SpeciesEndangered Species

Act Listings of SalmonAct Listings of Salmon

& Steelhead& Steelhead

* 28 distinct population segments:

6 endangered, 22 threatened

* 176,000 sq. miles in Washington,

Oregon, Idaho & California

* 61% of Washington’s land area,

55% of Oregon’s, 26% of Idaho’s, &

32% of California’s

February 2008

study area

The Data

~266 study sites

Oregon coastal region

juvenile coho salmon habitat

sparsely sampled, longitudinal study design

12 year time series

35 data layers

~100 landscape level variates

~22 habitat level variates

Oregon

Abundance increases over time due to variation in Ocean conditions (i.e. external to our analysis)

coho.obs

fs.year

fs.coho.obs

0

2

4

6

8

●●●

●●●

●●●●

●●

●●

●●

●●●

●●

●●

1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

● ●●

1998 2000 2002 2004 2006 20080.2

0.4

0.6

0.8

1.0

coho.obs

year

coefficient

Sparsely sampled longitudinal data

year

fs.coho.obs

0.00.51.01.52.02.53.0

0.00.51.01.52.02.53.0

0.00.51.01.52.02.53.0

0.00.51.01.52.02.53.0

17100201010102

●●

17100203040402

●●●● ●●

17100204050303

●●

●●

17100206010603

●●●● ●

●●

199820002002200420062008

17100202030201

● ●●

●●

●●

●●

17100203040602

●●● ●

●●

●●

17100205040105

●●● ● ●

17100303080202

● ● ● ●

199820002002200420062008

17100203020501

●●●●

●●

17100203070101●

17100205070202

●● ● ●● ● ●●●● ● ●

17100304010604

●●● ●

●●

●●●● ● ●

199820002002200420062008

17100203020902

●● ●

● ●● ●●

17100203090101

●● ●● ●●● ●●

17100206010504

●●

●●

17100305060202

●● ● ●●

●●

199820002002200420062008

• Only fish data has time component• year effects exogenous• Landscape data everywhere• Habitat data some places• Fish data some places• Not always same places

Figure Legend. Mean density of coho at 16 frequently visited sites for 1998–2009

How the landscape data is acquired

GIS map layerssummarize across area surrounding

study site

shallow, highly channelized

high structure: rocks and woody debris

habitat level data is collected by survey visits: labor intensive to collect/therefore less abundant

gradientpool densitydebrisflow ratesdrainage areachannel widthetc.

Multilevel structure for two reasons

landscape

habitat

fish

Multi-level structure for two reasons: (1) longitudinal sampling design(2) varying scales of predictors

Generalized linear mixed models(aka hierarchical, multilevel, or random effects models)

state

school

class

class

class

school

class

class

class

school

class

class

class

student_score ~ class_average + school_average + state_average

canonical example: school test scores

state

school 2school 1 school 4school 3

class 1 class 2 class 3 class 4

student 1 student 2 student 3 student 4

state level predictors

school level predictors

class level predictors

student level predictors

Norm(0, !2state)

Norm(µstate1, !2school)

Norm(µschool1, !2class)

Norm(µclass3, !2student)

Our model structure is not so complicated

global

site 1 site 2 site 3 site 4

obs 1 obs 2 obs 3 obs 4

landscape level predictors

habitat level predictors& year effects

Modeling presence/absence of fish:logistic mixed model with site and year effects

logit(Pr{yi = 1}) = !yearxy + !1xh1 + !1xh2 + "site! ! year+ "h1xh1 + "h2xh2 + ...+ #site

!site ! Norm("l1xl1 + "l2xl2 + ... ,#2site)

year effects

habitat level predictors

site effects

landscape level predictors

Fit a lot of models, some predictors rose to the top

−620 −600 −580 −560 −540 −520

1100

1150

1200

1250

1300

logLik

AIC

m1m2

m3m4m5m6

m7m8

m9

m10

m11

m12

m13

m14

m15

m16

m17m18

m19

m20

m21

m22m23m24

m25m26m27m28

m29m30m31

m32m33m34

Best predictors:

gradientdebris leveldrainage area

mean elevation

Overall model performance is strong at some things, weak at others

fitted probabilities

fitted(models.ls$m24)

coun

t

0

200

400

600

800

0.0 0.2 0.4 0.6 0.8 1.0

●●●●

●●●●

●●●

●●●

●●

●●●●●

●●●●

●●

●●●

●●

●●

●●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●

●●

●●●

●●●●

●●●●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●●●

0 1

0.0

0.2

0.4

0.6

0.8

1.0

fitte

d pr

obab

ilitie

s

presenceabsence

fitte

d pr

obab

ility

histogram of fitted probabilities

Another look at model fit: some heavy outliers

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.4

0.8

fitted

p/a

of coho o

bs (

data

)

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

~

pa.obsfs.year + (fs.grad.rs + fs.cfs.down.rs + fs.vol.len.rs + el.mean.rs | catchment) - 1

conclusions

• site matters

• we can explain about half of the variation in why site matters with 4-5 predictors

• habitat data more valuable than landscape data

• small number of predictions are very wrong, and we can’t seem to improve them

Thanks. yasmin.lucero@gmail.com

Model predicted probabilities given presence/absence with and without site effects

FALSE TRUE

0.0

0.2

0.4

0.6

0.8

1.0

m0

coho presence

Pr{c

oho

pres

ent}

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●●●

●●●●●●

●●

●●

●●

●●

●●●

FALSE TRUE0.

00.

20.

40.

60.

81.

0

m1

coho presence

Pr{c

oho

pres

ent}

top related