logistic regression and odds ratios 818 - lecture 0… · odds ratio used to compare two...

Post on 20-Jun-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Logistic Regression andLogistic Regression and

Odds RatiosOdds Ratios

Psych 818 - DeShonPsych 818 - DeShon

Dichotomous ResponseDichotomous Response

Used when the outcome or DV is aUsed when the outcome or DV is adichotomous, random variabledichotomous, random variable

Can only take one of two possible values (1,0)Can only take one of two possible values (1,0)Pass/FailPass/Fail

Disease/No DiseaseDisease/No Disease

Agree/DisagreeAgree/Disagree

True/FalseTrue/False

Present/AbsentPresent/Absent

This data structure causes problems forThis data structure causes problems forOLS regressionOLS regression

Dichotomous ResponseDichotomous Response

Properties of dichotomous responseProperties of dichotomous response

variables (variables (YY))POSITIVE RESPONSE (Success =1) POSITIVE RESPONSE (Success =1) pp

NEGATIVE RESPONSE (Failure = 0) NEGATIVE RESPONSE (Failure = 0) qq = (1- = (1-pp))

observed proportion of successes observed proportion of successes

VarVar((YY) = ) = p*qp*q

OoopsOoops! Variance depends on the mean! Variance depends on the mean

Y = p

Dichotomous ResponseDichotomous Response

Lets generate some (0,1)Lets generate some (0,1)

datadataYY <- <-rbinomrbinom((nn==10001000,,sizesize==11,,probprob==.3.3))

mean(Y)mean(Y) = 0.295= 0.295

μμ = .3 = .3

varvar(Y)(Y) = 0.208 = 0.20822= (.3 = (.3 *.7) = .21*.7) = .21

histhist(Y(Y))

Histogram of Y

Y

0.0 0.2 0.4 0.6 0.8 1.0

01

00

20

03

00

40

05

00

60

07

00

Describing Dichotomous DataDescribing Dichotomous Data

Proportion of successes (p)Proportion of successes (p)

OddsOdds

Odds of an event is the probability it occursOdds of an event is the probability it occurs

divided by the probability it does not occurdivided by the probability it does not occur

p/(1-p)p/(1-p)

if p=.53; odds=.53/.47 = 1.13if p=.53; odds=.53/.47 = 1.13

Modeling Y (Categorical X)Modeling Y (Categorical X)

Odds RatioOdds Ratio

Used to compare two proportions across groupsUsed to compare two proportions across groupsodds for males =.54/(1-.53) = 1.13odds for males =.54/(1-.53) = 1.13

odds for females = .62/(1-.62) = 1.63odds for females = .62/(1-.62) = 1.63

Odds-ratio = 1.62/1.13 = 1.44Odds-ratio = 1.62/1.13 = 1.44

A female is 1.44 times more likely than a male to get a 1A female is 1.44 times more likely than a male to get a 1

OrOr…… 1.13/1.62 = 0.69 1.13/1.62 = 0.69

A male is .69 times as likely as a female to get a 1A male is .69 times as likely as a female to get a 1

OR > 1: increased odds for group 1 relative to 2OR > 1: increased odds for group 1 relative to 2

OR = 1: no difference in odds for group 1 relative to 2OR = 1: no difference in odds for group 1 relative to 2

OR < 1: lower odds for group 1 relative to 2OR < 1: lower odds for group 1 relative to 2

Modeling Y (Categorical X)Modeling Y (Categorical X)

Odds-ratio for a 2 x 2 tableOdds-ratio for a 2 x 2 table

Odds(Hi)Odds(Hi)11/411/4

Odds(Lo)Odds(Lo)2/52/5

O.R. = (11/4)/(2/5)=8.25O.R. = (11/4)/(2/5)=8.25

Odds of HD are 8.25 time larger for highOdds of HD are 8.25 time larger for highcholesterolcholesterol

CholestCholest

inin

DietDiet

Heart DiseaseHeart Disease

232310101313

886622LoLo

1515441111HiHi

NNYY

Odds-RatioOdds-Ratio

Ranges from 0 to infinityRanges from 0 to infinity

00 11

Tends to be skewedTends to be skewed

Often transform to log-odds to getOften transform to log-odds to get

symmetrysymmetryThe log-OR comparing females to males = log(1.44) = 0.36The log-OR comparing females to males = log(1.44) = 0.36

The log-OR comparing males to females = log(0.69) = -0.36The log-OR comparing males to females = log(0.69) = -0.36

Modeling Y (Continuous X)Modeling Y (Continuous X)

We need to form a general prediction modelWe need to form a general prediction model

Standard OLS regression wonStandard OLS regression won’’t workt work

The errors of a dichotomous variable can not beThe errors of a dichotomous variable can not be

normally distributed with constant variancenormally distributed with constant variance

Also, the estimated parameters donAlso, the estimated parameters don’’t make mucht make much

sensesense

LetLet’’s look at a s look at a scatterplot scatterplot of dichotomous dataof dichotomous data……

Dichotomous Dichotomous ScatterplotScatterplot

What smooth function can we use to model somethingWhat smooth function can we use to model something

that looks like this?that looks like this?

Dichotomous Dichotomous ScatterplotScatterplot

OLS regression? Smooth butOLS regression? Smooth but……

Dichotomous Dichotomous ScatterplotScatterplot

Could break X into groups to form a moreCould break X into groups to form a more

continuous scale for Ycontinuous scale for Y

proportion or percentage scaleproportion or percentage scale

Dichotomous Dichotomous ScatterplotScatterplot

Now, plot the categorized dataNow, plot the categorized data

Notice the “S”Shape? = sigmoid

Notice that we just shifted to acontinuous scale?

Dichotomous Dichotomous ScatterplotScatterplot

We can fit a smooth function by modelingWe can fit a smooth function by modeling

the probability of success (the probability of success (““11””) directly) directly

Model the probabilityof a ‘1’ rather than the(0,1) data directly

Another ExampleAnother Example

Another Example (cont)Another Example (cont)

Logistic EquationLogistic Equation

E(y|x)= E(y|x)= (x) = probability that a person with a(x) = probability that a person with agiven x-score will have a score of given x-score will have a score of ‘‘11’’ on Y on Y

Could just expand Could just expand uu to include more predictors to include more predictorsfor a multiple logistic regressionfor a multiple logistic regression

(x) =

eu

1+ eu

u = +

1x

Logistic RegressionLogistic Regression

- shifts the distribution (value of x where =.5)

- reflects the steepness of the transition (slope)

Features of Logistic RegressionFeatures of Logistic Regression

Change in probability is not constantChange in probability is not constant

(linear) with constant changes in X(linear) with constant changes in X

probability of a success (Y = 1) given theprobability of a success (Y = 1) given the

predictor variable (X) is a non-linearpredictor variable (X) is a non-linear

functionfunction

Can rewrite the logistic equation as anCan rewrite the logistic equation as an

OddsOdds

0 1 1( )ˆ( 1| )e

ˆ(1 ( 1| )) (1 )i

b b Xi

i

P Y X

P Y X

+== =

=

Logit Logit TransformTransform

Can Can linearizelinearize the logistic equation by using the logistic equation by using

the the ““logitlogit”” transformation transformation

apply the natural log to both sides of theapply the natural log to both sides of the

equationequation

Yields the Yields the logitlogit or log-odds: or log-odds:

0 1 1

ˆ( 1| )ln ln

ˆ(1 ( 1| )) (1 )

P Y Xb b X

P Y X

== = +

=

Logit Logit TransformationTransformation

The The logitlogit transformation puts the transformation puts the

interpretation of the regression estimatesinterpretation of the regression estimates

back on familiar footingback on familiar footing

= = expected value of the expected value of the logitlogit (log-odds) (log-odds)

when X = 0when X = 0

= = ‘‘logitlogit difference difference’’ = The amount the = The amount the logitlogit

(log-odds) changes, with a one unit change in(log-odds) changes, with a one unit change in

X;X;

LogitLogit

LogitLogit

the natural log of the oddsthe natural log of the odds

often called a log oddsoften called a log odds

logitlogit scale is continuous, linear, and functions scale is continuous, linear, and functionsmuch like a z-score scale.much like a z-score scale.

p = 0.50, then p = 0.50, then logitlogit = 0 = 0

p = 0.70, then p = 0.70, then logitlogit = 0.84 = 0.84

p = 0.30, then p = 0.30, then logitlogit = -0.84 = -0.84

Odds-Ratios and LogisticOdds-Ratios and Logistic

RegressionRegression

The slope may also be interpreted as theThe slope may also be interpreted as the

log odds-ratio associated with a unitlog odds-ratio associated with a unit

increase in xincrease in x

exp(exp( )=odds-ratio)=odds-ratio

Compare the log odds (Compare the log odds (logitlogit) of a person) of a person

with a score of x to a person with a scorewith a score of x to a person with a score

of x+1of x+1logit( ( ))x x= +

logit( ( 1)) ( 1)x x x+ = + + = + +

There and back againThere and back again……

If the data are consistent with a logistic function,If the data are consistent with a logistic function,

then the relationship between the model and thethen the relationship between the model and the

logit logit is linearis linear

The The logit logit scale is somewhat difficult to understandscale is somewhat difficult to understand

Could interpret as odds but people seem to preferCould interpret as odds but people seem to prefer

probability as the natural scale, soprobability as the natural scale, so……

log logit( )1

pp x

p= = +

There and back againThere and back again……

log logit( )1

pp x

p= = +

1

xpe

p

+=

Logit

1

x

x

ep

e

+

+=

+

Odds

Probability

EstimationEstimation

DonDon’’t meet OLS assumptions so somet meet OLS assumptions so some

variant of MLE is usedvariant of MLE is used

LetLet’’s develop the likelihoods develop the likelihood

Assuming observations are independentAssuming observations are independent……

p(yi = 1) = i

p(yi = 0) = 1 i

pdf : fi (yi ) = iyi (1 i )

1 yi ; yi = 0,1; i = 1,2...n

joint pdf : fi (yi )i=1

n

= iyi (1 i )

1 yi

i=1

n

EstimationEstimation

LikelihoodLikelihood

recall..recall..

joint pdf : fi (yi )i=1

n

= iyi (1 i )

1 yi

i=1

n

log transform = [yi log( i1 i

)]i=1

n

+ log(1 i )i=1

n

log i

1 i

= + x

1 i =1

1+ exp( + x)

EstimationEstimation

Upon substitutionUpon substitution……

log l = l( , ) = yi ( + x) log[1+ exp( + x)]i=1

n

i=1

n

ExampleExample

Heart Disease & AgeHeart Disease & Age

100 participants100 participants

DV = presence of heart diseaseDV = presence of heart disease

IV = AgeIV = Age

Heart Disease ExampleHeart Disease Example

0.0

0.2

0.4

0.6

0.8

1.0

Heart Disease ExampleHeart Disease Example

library(MASS)library(MASS)

glmglm(formula = y ~ x, family = binomial,(formula = y ~ x, family = binomial,data=mydatadata=mydata))

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -5.30945 1.13365 -4.683 2.82e-06 ***

age 0.11092 0.02406 4.610 4.02e-06 ***

Null deviance: 136.66 on 99 degrees of freedom

Residual deviance: 107.35 on 98 degrees of freedom

AIC: 111.35

Number of Fisher Scoring iterations: 4

Heart Disease ExampleHeart Disease Example

Logistic regressionLogistic regression

Odds-RatioOdds-Ratio

exp(.111)=1.117exp(.111)=1.117

5.31 .111( )

5.31 .111( )( )

1

x

x

ex

e

+

+=

+

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Heart Disease ExampleHeart Disease Example

In terms of In terms of logitslogits……

-3-2

-10

top related