1 experimental design and analyses of experimental data lesson 6 logistic regression generalized...

Post on 19-Dec-2015

225 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Experimental design and analyses of experimental data

Lesson 6

Logistic regression

Generalized Linear Models (GENMOD)

2

Logistic regression

• Used when data are dichotomous.

• Used when data are fractions between 0 and 1

3

Example:

• The distance from the nest to the nearest nest of Herring gull?

• On the vegetation surrounding the nest?

• On the number of eggs in the nest?

Does predation of eggs in nests of Oyster catcher depend on

4

OBS DIST EGGS VEG KILLED

1 0.5 3 B 3

2 1.0 7 C 5

3 5.7 5 B 1

4 3.8 9 A 6

5 3.0 7 C 5

6 6.1 8 A 3

........

57 3.3 3 A 3

Data:

5

Analysis of dichotomous data:

• Nests are categorized according to whether predation has occurred or not.

• No predation is scored as 0

• Predation is scored as 1

6

Plus/minus predator visit to Oyster catcher nest

0 1 2 3 4 5 6 7 8 9 10

Distance (m) from nearest Herring gull nest

0

1

Vis

it t

o n

est

7

The purpose is to fit a model to the data – a model that predicts the probability of a nest being

predated

8

The logistic regression model:

kk

kk

xxx

xxx

ie

e....

....

22110

22110

1

y

y

xxx

xxx

i e

e

e

epp

pp

11 ....

....

22110

22110

where pp xxxy ....22110

and ε BIN(0, π(1-π))

pp xxxy 221101ln

The logit-transformationThe odds(the ratio between the probability of a positive and a negative event)

9

y =02

1

11

1

11 0

0

e

e

e

ey

y

y 01

0

1

e

e

y 11

e

e

e

e

So that

y 10

10

How to do it in SAS

11

DATA logist;

OPTIONS LINESIZE = 90;

/* Example on logistic regression */

/* The example is inspirered by Dorthe Lahrmann's investigations of Oyster catchers (strandskader) on Langli in Ho Bugt */

INFILE 'h:\lin-mod\logist.prn' FIRSTOBS=2;

INPUT dist eggs veg $ killed;

/* dist = Distance to the nearest nest of Herring gull (sølvmåge)*/

/* eggs = Number of Oyster catcher eggs in a nest */

/* veg = vegetation type surrounding an Oyster catcher nest*/

IF killed > 0 THEN visit= 1;

IF killed = 0 THEN visit = 0;

/* If killed > 0 then the nest has been visited by a predator at least once */

12

/* Eksempel A: Analysis of a nest has been visited or not-visited by predators, i.e. visit = 1 or 0 */

PROC GENMOD; /* The procedure is Generalized Linear Models */

TITLE 'Eksempel A';

CLASS veg; /* veg is a class variable */

MODEL visit = dist veg /DIST=binomial LINK=logit TYPE3 DSCALE OBSTATS;

/* DIST = distribution function (here chosen as binomial) */

/* LINK = the model uses a logit-transformation of data */

/* TYPE3 = type 3 is used in order to evaluate the relative contribution of the different factors on the independent variable */

/* DSCALE = an option which tells SAS to scale the error in order to meet the demands of the model. If DSCALE is approximately 1, scaling is not needed. */

/* OBSTATS = gives the predicted values as well as their confidence limits */

RUN;

13

Eksempel A 10:19 Thursday, November 22, 2001 87

The GENMOD Procedure

Model Information

Description Value

Data Set WORK.LOGIST

Distribution BINOMIAL

Link Function LOGIT

Dependent Variable VISIT

Observations Used 57

Number Of Events 52

Number Of Trials 57

Class Level Information

Class Levels Values

VEG 3 A B C

14

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 53 20.2819 0.3827

Scaled Deviance 53 53.0000 1.0000

Pearson Chi-Square 53 22.2740 0.4203

Scaled Pearson X2 53 58.2057 1.0982

Log Likelihood . -26.5000 .

These values indicate the fit of the model.

Low values (for a given DF) indicate a good fit

These values should be close to unity if the model’s assumptions are met

Values less than unity indicate underdispersion (variance less than expected)Values greater than unity indicate overdispersion (variance greater than expected)

Values after scaling with DSCALE

15

Analysis Of Parameter Estimates

Parameter DF Estimate Std Err ChiSquare Pr>Chi

INTERCEPT 1 8.5639 2.1271 16.2093 0.0001

DIST 1 -1.0032 0.2651 14.3173 0.0002

VEG A 1 0.2489 0.9555 0.0678 0.7945

VEG B 1 0.4370 0.9250 0.2232 0.6366

VEG C 0 0.0000 0.0000 . .

SCALE 0 0.6186 0.0000 . .

NOTE: The scale parameter was estimated by the square root of DEVIANCE/DOF.

LR Statistics For Type 3 Analysis

Source NDF DDF F Pr>F ChiSquare Pr>Chi

DIST 1 53 34.8596 0.0001 34.8596 0.0001

VEG 2 53 0.1118 0.8944 0.2237 0.8942

16

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 55 20.3675 0.3703

Scaled Deviance 55 55.0000 1.0000

Pearson Chi-Square 55 21.6364 0.3934

Scaled Pearson X2 55 58.4265 1.0623

Log Likelihood . -27.5000 .

Analysis Of Parameter Estimates

Parameter DF Estimate Std Err ChiSquare Pr>Chi

INTERCEPT 1 8.8288 2.0182 19.1363 0.0001

DIST 1 -1.0012 0.2587 14.9777 0.0001

SCALE 0 0.6085 0.0000 . .

NOTE: The scale parameter was estimated by the square root of DEVIANCE/DOF.

LR Statistics For Type 3 Analysis

Source NDF DDF F Pr>F ChiSquare Pr>Chi

DIST 1 55 36.4999 0.0001 36.4999 0.0001

17

Observation Statistics

VISIT Pred Xbeta Std HessWgt Lower Upper Resraw

1 0.9998 8.3283 1.8909 0.000652 0.9903 1.0000 0.000242

1 0.9996 7.8277 1.7639 0.001075 0.9875 1.0000 0.000398

1 0.9578 3.1222 0.6185 0.1091 0.8710 0.9871 0.0422

1 0.9935 5.0244 1.0628 0.0175 0.9498 0.9992 0.006533

1 0.9971 5.8253 1.2605 0.007924 0.9663 0.9998 0.002943

1 0.9383 2.7217 0.5356 0.1563 0.8418 0.9775 0.0617

1 0.9971 5.8253 1.2605 0.007924 0.9663 0.9998 0.002943

1 0.9973 5.9255 1.2854 0.007173 0.9679 0.9998 0.002663

0 0.3358 -0.6822 0.5813 0.6023 0.1392 0.6123 -0.3358

1 0.9764 3.7229 0.7525 0.0622 0.9045 0.9945 0.0236

0 0.7150 ..........................................

18

Predicted values and 95% confidence limits

0 1 2 3 4 5 6 7 8 9 10

Distance (m) from nearest Herring gull nest

0.00

0.20

0.40

0.60

0.80

1.00V

isit

to

nes

t

19

/* Example B: Analysis of the fraction of eggs in a nest that are lost */

PROC GENMOD; /* procedure is Generalized Linear Models */

TITLE 'Eksempel B';

CLASS veg; /* veg is a class variable */

MODEL killed/eggs = dist veg eggs/DIST=binomial LINK=logit TYPE3 DSCALE OBSTATS;

/* DIST = distribution function (here chosen as binomial) */

/* LINK = the model uses a logit-transformation of data */

/* TYPE3 = SS3 is used to determine the contribution of the individual factors to the dependent variable */

/* DSCALE = option that can be used if Deviance/DF is different from 1.

It reduces the risk of Type 1 errors if the scale parameter is > 1

og the risk of a Type II errors, if the scale parameter is < 1 */

/* OBSTATS = gives the predicted values, and the confidence limits */

RUN;

Note that this procedure takes the absolutenumber of eggs killed out of the totalnumber of eggs into consideration, and notmerely the proportion of killed eggs

20

Eksempel B 12:26 Thursday, November 22, 2001 7

The GENMOD Procedure

Model Information

Description Value

Data Set WORK.LOGIST

Distribution BINOMIAL

Link Function LOGIT

Dependent Variable KILLED

Dependent Variable EGGS

Observations Used 57

Number Of Events 183

Number Of Trials 336

Class Level Information

Class Levels Values

VEG 3 A B C

21

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 52 53.9491 1.0375

Scaled Deviance 52 52.0000 1.0000

Pearson Chi-Square 52 44.1413 0.8489

Scaled Pearson X2 52 42.5465 0.8182

Log Likelihood . -171.3777 .

22

Analysis Of Parameter Estimates

Parameter DF Estimate Std Err ChiSquare Pr>Chi

INTERCEPT 1 2.6437 0.5644 21.9369 0.0001

DIST 1 -0.5284 0.0623 71.9060 0.0001

VEG A 1 0.1425 0.3629 0.1541 0.6946

VEG B 1 0.1623 0.3602 0.2029 0.6524

VEG C 0 0.0000 0.0000 . .

EGGS 1 -0.0314 0.0637 0.2433 0.6219

SCALE 0 1.0186 0.0000 . .

NOTE: The scale parameter was estimated by the square root of DEVIANCE/DOF.

LR Statistics For Type 3 Analysis

Source NDF DDF F Pr>F ChiSquare Pr>Chi

DIST 1 52 97.2164 0.0001 97.2164 0.0001

VEG 2 52 0.1135 0.8929 0.2271 0.8927

EGGS 1 52 0.2443 0.6232 0.2443 0.6211

23

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 55 54.5182 0.9912

Scaled Deviance 55 55.0000 1.0000

Pearson Chi-Square 55 45.0882 0.8198

Scaled Pearson X2 55 45.4867 0.8270

Log Likelihood . -179.6600 .

Analysis Of Parameter Estimates

Parameter DF Estimate Std Err ChiSquare Pr>Chi

INTERCEPT 1 2.5156 0.2950 72.7128 0.0001

DIST 1 -0.5212 0.0589 78.3656 0.0001

SCALE 0 0.9956 0.0000 . .

NOTE: The scale parameter was estimated by the square root of DEVIANCE/DOF.

LR Statistics For Type 3 Analysis

Source NDF DDF F Pr>F ChiSquare Pr>Chi

DIST 1 55 107.8859 0.0001 107.8859 0.0001

24

Predicted values and 95% confidence limits

0 1 2 3 4 5 6 7 8 9 10

Distance (m) from nearest Herring gull nest

0.0

0.2

0.4

0.6

0.8

1.0

Fra

ctio

n o

f eg

gs

rem

ove

d

25

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 52 53.9491 1.0375

Scaled Deviance 52 52.0000 1.0000

Pearson Chi-Square 52 44.1413 0.8489

Scaled Pearson X2 52 42.5465 0.8182

Log Likelihood . -171.3777 .

What is this?

26

The likelihood function

27

A nest contains n eggs of which r are eaten by predators.The probability that a given egg is eaten is denoted π.The probability that exactly r of the eggs are killed is

The binomial distribution

rnr

r

nrP

1)(

pp

pp

xxx

xxx

e

e

....

....

22110

22110

1

where

28

r1 = number of killed eggs out of n1 eggs in the first nest

r2 = number of killed eggs out of n2 eggs in the second nest

ri = number of killed eggs out of ni eggs in the ith nest

111

111

11 1)( rnr

r

nrP

The probability of observing exactly r1, r2, ...,ri events is

times 222

222

22 1)( rnr

r

nrP

333

333

33 1)( rnr

r

nrP

iii rn

ir

ii

ii r

nrP

1)(

L = P(r1) P(r2) P(r3)....... P(ri)...... P(rk) =

k

iirP

1

)(

ln L = ln P(r1) + ln P(r2) + ln P(r3) +...+ ln P(ri) + ...+ ln P(rk) =

)(ln1

k

iirP

Log-likelihood function

29

Maximum likelihoodThe parameters of

pp

pp

xxx

xxx

ie

e

....

....

22110

22110

1

are found as the values that maximize the likelihood of observing exactly r1, r2, ....,ri.... positive events out of n1, n2, ....,ni.... events

The maximum value of L can be found by differentiation of L with respect to β0 , β1, ...., βp, and setting the derivative equal to 0.

This is the same as differentiation with respect to ln L

0ln

0

L

0ln

1

L0

ln

2

L

...... 0ln

p

L

top related