modeling presence/absence data acknowledgements to wyomingfishing.net (electro-fishing pics) and...

Post on 21-Jan-2016

222 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Modeling Presence/Absence Data

Acknowledgements to WyomingFishing.net (electro-fishing pics)and Michael Houts (Wolf data and article)

• How are fish numbers calculated?

• “There are approximately 3200 trout per mile that are greater than 6” on the Miracle Mile…66% are Browns, 29% are Rainbows and 5% are Snake River Cutts

• Where does this information come from?

• Lots of ways to count fish and do fish surveys…will discuss a bit about electrofishing

Counting Fish

The Miracle Mile

4

Background – Electrofishing

• Electrofishing– Portable generator– DC current from generator is at ??? volts to immobilize fish– Probes are electrodes which provide positive end of current– Nets are called “dip nets”– Back-up samplers catch missed fish– Fish placed in a flooded net

• Determining correct voltage is important…too little voltage will not allow sufficient capture…too much…well, we know what that means!

• Voltage studies involve setting up tanks with similar water chemistry to stream of interest…place fish in tank, provide a voltage amount, observe if fish immobilized

• electricfish.csv contains such a dataset

How Much Voltage to Use??

7

Since our interest is in determining what sort of relationship may exist between y and x, we can take a regression approach. We believe that the probability of immobilization, , is a function of X= voltage

8

If we use the following linear relationship between Y and X

0 1Voltage ,

Problems???

Need “Legal” Estimates

Need a function to relate mean to predictor (voltage) which binds it to an appropriate scale. i.e. if you need a mean that can’t assume negative values, use a log

0 1 1ln x x 0 1 1expx x

“Legal” continued…

For Bernouilli data, we want the mean to be between 0 and 1…a common mechanism for achieving this is the use of the logit link

0 1 11

xln x

x

0 1 1

0 1 11

exp xx

exp x

#Logit fit using the glm function model1 <- glm(Response ~ Voltage, family=binomial(logit), data=electric) summary(model1) #linear regression fit using the lm function model2 <- lm(Response ~ Voltage, data=electric) logitfits <- fitted(model1) linearfits <- fitted(model2) electric$logits <- logitfits electric$linear <- linearfits

Using R

Plotting Using ‘Lattice’

library(lattice) xyplot(electric$Response + electric$logits + electric$linear ~ electric$Voltage, aspect=1,panel=function(x,y){ panel.xyplot(electric$Voltage,electric$Response,col="black") panel.xyplot(electric$Voltage,electric$logits,type="smooth") panel.xyplot(electric$Voltage,electric$linear,type="smooth")},

xlab="Voltage",ylab="Immobilized",data=electric)

R Code:

The ‘best fit’ logit to our data compared to the ‘best fit’ line is given by

Predicted Probabilities

The model is not in terms of i though, it is instead in terms of the ‘log of the odds ratio’ or the ‘logit’

0 1= 1

i

i

ln Voltage

Thus, in ‘logistic regression’ we model the log odds of an event Even though we are likely more interested in than we are in the log odds, the log odds have ‘nicer mathematical properties’…so we do inference for the log odds and then backtransform to get the estimated value of

0 1

0 1

ˆ ˆexpˆ =

ˆ ˆ1 expi

Voltage

Voltage

So our estimated model is given by:

-13.84 + 0.1008

ˆ = 1 -13.84 + 0.1008

exp Voltage

exp Voltage

Small

change in probBig change

in prob

• Understanding habitat selection by animals, plants, and aquatic species is an important problem faced by ecologists and wildlife biologists worldwide

• If we know what sort of habitat critters select for, we can better manage these species

• Will consider a data set, wolves_geo.csv, which reports wolf occurrence in two years following wolf re-introduction in the Greater Yellowstone Area

Resource Selection

• RD_DENSITY is a measure of, well, road density

• WOLVES_99 = 2 means the data came after 1999

• MAJOR_LC = codes for major land cover types (descriptions in landcover.txt)

• Paper: Houts03.pdf

Data Description

Project

Fit the logistic regression model to the 1999 data set and createa column of predicted “probabilities” of wolf occurrence.

Are the “predicted probabilities” really “probabilities”?

Can we use this model to predict likely wolf occurrence acrossthe five state region? Why/why not?

Can you think of a better ‘design’ for building the initial model?

top related