binary logistic regression

92
Binary Logistic Regression “To be or not to be, that is the question..”(William Shakespeare, “Hamlet”)

Upload: stian

Post on 18-Jan-2016

66 views

Category:

Documents


3 download

DESCRIPTION

Binary Logistic Regression “To be or not to be, that is the question..”(William Shakespeare, “Hamlet”). Binary Logistic Regression. Also known as “logistic” or sometimes “logit” regression Foundation from which more complex models derived - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Binary Logistic Regression

Binary Logistic Regression

“To be or not to be, that is the question..”(William Shakespeare,

“Hamlet”)

Page 2: Binary Logistic Regression

Binary Logistic Regression

Also known as “logistic” or sometimes “logit” regression

Foundation from which more complex models derivede.g., multinomial regression and ordinal

logistic regression

Page 3: Binary Logistic Regression

Dichotomous Variables

Two categories indicating whether an event has occurred or some characteristic is present

Sometimes called “binary” or “binomial” variables

Page 4: Binary Logistic Regression

Dichotomous DVs

Placed in foster care or not Diagnosed with a disease or not Abused or not Pregnant or not Service provided or not

Page 5: Binary Logistic Regression

Single (Dichotomous) IV Example DV = continue fostering, 0 = no, 1 = yes

Customary to code category of interest 1 and the other category 0

IV = married, 0 = not married, 1 = married

N = 131 foster families

Are two-parent families more likely to continue fostering than one-parent families?

Page 6: Binary Logistic Regression

Crosstabulation

Table 2.1

Relationship between marital status and continuation is statistically significant [2(1, N = 131) = 5.65, p = .017]

A higher percentage of two-parent families (62.20%) than single-parent families (40.82%) planned to continue fostering

Page 7: Binary Logistic Regression

Strength & Direction of Relationships

Different ways to quantify the relationship between IV(s) and DVProbabilitiesOddsOdds Ratio (OR)

• Also abbreviated as eB, Exp(B) (on SPSS output), or exp(B)

% change

Page 8: Binary Logistic Regression

Roadmap to Computations

Probabilities

Oddsp / 1 - p

Odds RatiosOdds(1) / Odds(0)

% change100(OR - 1)

Page 9: Binary Logistic Regression

Probabilities

Percentages in Table 2.1 as probabilities (e.g., 62.20% as .6220)

p• Probability that event will occur (continue)• e.g., probability that one-parent families plan to

continue is .4082

1 – p• Probability that event will not occur (not continue)• e.g., probability that one-parent families do not

plan to continue is .5918 (1 - .4082)

Page 10: Binary Logistic Regression

Odds Ratio of probability that event will occur

to probability that it will not

e.g., odds of continuation for one-parent families are .69 (.4082 / .5918)

Can range from 0 to positive infinity

p

podds

1

Page 11: Binary Logistic Regression

Probabilities and Odds

Table 2.2 Odds = 1

Both outcomes equally likely Odds > 1

Probability that event will occur greater than probability that it will not

Odds < 1Probability that event will occur less than

probability that it will not

Page 12: Binary Logistic Regression

Odds Ratio (OR)

Odds of the event for one value of the IV (two-parent families) divided by the odds for a different value of the IV, usually a value one unit lower (one-parent families)

e.g., odds of continuing for two-parent families more than double the odds for one-parent familiesOR = 1.6455 / .6898 = 2.39

Page 13: Binary Logistic Regression

OR (cont’d)

Plays a central role in quantifying the strength and direction of relationships between IVs and DVs in binary, multinomial, and ordinal logistic regression

OR < 1 indicates a negative relationshipOR > 1 indicates a positive relationshipOR = 1 indicates no linear relationship

Page 14: Binary Logistic Regression

ORs > 1

e.g., OR of 2.39

A one-unit increase in the independent variable increases the odds of continuing by a factor of 2.39

The odds of continuing are 2.39 times higher for two-parent compared to one-parent families

Page 15: Binary Logistic Regression

ORs < 1

e.g., OR = .50

A one-unit increase in the independent variable decreases the odds of continuing by a factor of .50

The odds that two-parent families will continue are .50 (or one-half) of the odds that one-parent families will continue

Page 16: Binary Logistic Regression

ORs < 1 (cont’d)

Compute reciprocal (i.e., 1 / .50 = 2.00) Express relationship as opposite event

of interest (e.g., discontinuing)

A one-unit increase in the independent variable increases the odds of discontinuing by a factor of 2.00

The odds that two-parent families will discontinue are 2.00 times (or twice) the odds of one-parent families

Page 17: Binary Logistic Regression

OR to Percentage Change

% change = 100(OR – 1) Alternative way to express OR

e.g., A one-unit increase in the independent variable increases the odds of continuing by 139.00%

• 100(2.39 – 1) = 139.00

e.g., A one-unit increase in the independent variable decreases the odds of continuing by 50.00%

• 100(.50 – 1) = -50.00

Page 18: Binary Logistic Regression

Comparing OR > 1 and OR OR > 1 and OR < 1< 1 Compute reciprocal of one of the ORs

e.g., OR of 2.00 and an OR of .50

Reciprocal of .50 is 2.00 (1 / .50 = 2.00)ORs are equal in size (but not in direction of

the relationship)

Page 19: Binary Logistic Regression

Qualitative Descriptors for OR Table 2.3 Use cautiously with IVs that aren’t

dichotomous

Page 20: Binary Logistic Regression

Question & Answer

Are two-parent families more likely to continue fostering than one-parent families?Yes. The odds of continuing are 2.39 times

(139%) higher for two-parent compared to one-parent families. The probability of continuing is .41 for one-parent families and .62 for two-parent families.

Page 21: Binary Logistic Regression

Binary Logistic Regression Example DV = continue fostering, 0 = no, 1 = yes

Customary to code category of interest 1 and the other category 0

IV = married, 0 = not married, 1 = married

N = 131 foster families

Are two-parent families more likely to continue fostering than one-parent families?

Page 22: Binary Logistic Regression

Statistical Significance

Table 2.4Relationship between marital status and

continuation is statistically significant (Wald 2 = 5.544, p = .019)

Page 23: Binary Logistic Regression

Direction of Relationship

B = slopePositive slope, positive relationship

• OR > 1

Negative slope, negative relationship• OR < 1

0 slope, no linear relationship• OR = 1

Page 24: Binary Logistic Regression

Direction/Strength of Relationship

Positive relationship between marital status and continuationTwo-parent families more likely to continueB = .869Exp(B) = OR = 2.385

• % change = 100(2.385 - 1) = 139%

The odds of continuing are 2.39 times (139%) higher for two-parent compared to one-parent families

Page 25: Binary Logistic Regression

Roadmap to Computations Logits

ln(p / 1 – p) = L short for ln(p / 1 – p)

OddseL

ProbabilitieseL / (1 + eL)

Odds RatiosOdds(1) / Odds(0)

% change100(OR - 1)

Page 26: Binary Logistic Regression

Binary Logistic Regression Model

ln(π/ (1 - π)) = α + 1X1 + 1X2 + … kXk, or

ln(π / (1 - π)) =

π is the probability of the event (eta) is the abbreviation for the linear

predictor (right hand side of this equation) k = number of independent variables

Page 27: Binary Logistic Regression

Logit Link

ln(π / (1 - π))Log of the odds that the DV equals 1 (event

occurs)Connects (i.e., links) DV to linear

combination of IVs

Page 28: Binary Logistic Regression

Estimated Logits (L)

ln(p / 1 - p) = a + B1X1 + B1X2 + … BkXk

ln(p / 1 – p)Log of the odds that the DV equals 1 (event

occurs)Estimated logit, LDoes not have intuitive or substantive

meaning Useful for examining curvilinear

relationships and interaction effectsPrimarily useful for estimating probabilities,

odds, and ORs

Page 29: Binary Logistic Regression

Estimated Logits (L)

L(Continue) = a + BMarriedXMarried

L(Continue) = -.372 + (.869)(XMarried)

a = intercept B = slope

Page 30: Binary Logistic Regression

Logit to Odds

If L = 0:Odds = eL = e0 = 1.00

If L = .50:Odds = eL = e.50 = 1.65

If L = 1.00:Odds = eL = e1.00 = 2.72

Page 31: Binary Logistic Regression

Logits to Odds (cont’d)

Table 2.4One-parent families

• L(Continue) = -.372 = -.372 + (.869)(0)

• Odds of continuing = e-.372 = .69

Two-parent families• L(Continue) = .497 = -.372 + (.869)(1)

• Odds of continuing = e.497 = 1.65

Page 32: Binary Logistic Regression

Odds to OR

OR = 1.65 / .69 = 2.39, or

e.869 = 2.39, labeled Exp(B)Table 2.4

Page 33: Binary Logistic Regression

OR to Percentage Change

% change = 100(OR – 1)

e.g., A one-unit increase in the independent variable increases the odds of continuing by 139.00%

• 100(2.39 – 1) = 139.00

e.g., A one-unit increase in the independent variable decreases the odds of continuing by 50.00%

• 100(.50 – 1) = -50.00

Page 34: Binary Logistic Regression

Logits to Probabilities

One-parent families, L(Continue) = -.372

Two-parent families, L(Continue) = .497

L

L

)Continue( e

ep

..

.

e

ep

.

.

)Continue(

..

.

e

ep

.

.

)Continue(

Page 35: Binary Logistic Regression

Question & Answer

Are two-parent families more likely to continue fostering than one-parent families?Yes. The odds of continuing are 2.39 times

(139%) higher for two-parent compared to one-parent families. The probability of continuing is .41 for one-parent families and .62 for two-parent families.

Page 36: Binary Logistic Regression

Single (Quantitative) IV Example

DV = continue fostering, 0 = no, 1 = yesCustomary to code category of interest 1

and other category 0 IV = number of resources N = 131 foster families

Are foster families with more resources more likely to continue fostering?

Page 37: Binary Logistic Regression

Statistical Significance

Table 2.5Relationship between resources and

continuation is statistically significant (Wald 2 = 4.924, p = .026)

H0: = 0, 0, ≤ 0, same as

H0: OR = 1, OR 1, OR ≤ 1Likelihood ratio 2 better than Wald

Page 38: Binary Logistic Regression

Direction/Strength of Relationship

Positive relationship between resources and continuationFamilies with more resources are more

likely to continueB = .212Exp(B) = OR = 1.237

• % change = 100(1.237 – 1) = 24%

The odds of continuing are 1.24 times (24%) higher for each additional resource

Page 39: Binary Logistic Regression

Estimated Logits

L(Continue) = -1.227 + (.212)(X)

Page 40: Binary Logistic Regression

Figures

Resources.xls

Page 41: Binary Logistic Regression

Effect of Resources on Continuation (Logits)

-1.50

-1.00

-0.50

0.00

0.50

1.00

1.50

Resources

Lo

git

s

Logits -1.01 -0.80 -0.59 -0.38 -0.16 0.05 0.26 0.47 0.68 0.90 1.11

1 2 3 4 5 6 7 8 9 10 11

Page 42: Binary Logistic Regression

Effect of Resources on Continuation (Odds)

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

Resources

Od

ds

Odds 0.36 0.45 0.55 0.69 0.85 1.05 1.30 1.60 1.98 2.45 3.03

1 2 3 4 5 6 7 8 9 10 11

Page 43: Binary Logistic Regression

Effect of Resources on Continuation (Probabilities)

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

Resources

Pro

ba

bil

itie

s

Probabilities 0.27 0.31 0.36 0.41 0.46 0.51 0.56 0.62 0.66 0.71 0.75

1 2 3 4 5 6 7 8 9 10 11

Page 44: Binary Logistic Regression

Question & Answer

Are foster families with more resources more likely to continue fostering?Yes. The odds of continuing are 1.24 times

(24%) higher for each additional resource. The probability of continuing is .31 for families with two resources, .51 for families with 6 resources, and .71 for families with 10 resources.

Page 45: Binary Logistic Regression

Relationship of Linear Predictor to Logits, Odds & p Relationship between linear predictor and

logits is linear

Relationship between linear predictor and odds is non-linear

Relationship between linear predictor and p is non-linearChallenge is to summarize changes in odds

and probabilities associated with changes in IVs in the most meaningful and parsimonious way

Page 46: Binary Logistic Regression

Logit as Function of Linear Predictor

-3.00

-2.00

-1.00

.00

1.00

2.00

3.00

-3.00 -2.00 -1.00 .00 1.00 2.00 3.00

Linear Predictor

Log

it

Page 47: Binary Logistic Regression

Odds as Function of Linear Predictor

.003.006.009.0012.0015.0018.0021.00

-3.00 -2.00 -1.00 .00 1.00 2.00 3.00

Linear Predictor

Od

ds

Page 48: Binary Logistic Regression

Probabilities as Function of Linear Predictor

.00

.10

.20

.30

.40

.50

.60

.70

.80

.901.00

-3.00 -2.00 -1.00 .00 1.00 2.00 3.00

Linear Predictor

Pro

bab

ility

Page 49: Binary Logistic Regression

IVs to z-scores

z-scores (standard scores)Only the IV (not DV)--semi-standardized slopesOne-unit increase in the IV refers to a one-

standard-deviation increaseOR interpreted as expected change in the odds

associated with a one standard deviation increase in the IV

Conversion to z-scores changes intercept, slope, and OR, but not associated test statistics

Table 2.6 (compare to Table 2.5)

Page 50: Binary Logistic Regression

Figures

zResources.xls

Page 51: Binary Logistic Regression

Effect of zResources on Continuation (Probabilities)

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

Standardized Resources

Pro

ba

bil

itie

s

Probabilities 0.26 0.34 0.44 0.54 0.64 0.73 0.80

-3 -2 -1 0 1 2 3

Page 52: Binary Logistic Regression

Question & Answer

Are foster families with more resources more likely to continue fostering?Yes. The odds of continuing are 1.51 times

(51%) higher for each one standard deviation (1.93) increase in resources. The probability of continuing is .34 for families with resources two standard deviations below the mean, .54 for families with the mean number of resources (6.60), and .73 for families with resources two standard deviations above the mean.

Page 53: Binary Logistic Regression

IVs Centered

CenteringTypically center on meanUseful when testing interactions, curvilinear

relationships, or when no meaningful 0 point (e.g., no family with 0 resources)

Centering doesn’t change slope, OR, or associated test statistics, but does change the intercept

Table 2.7 (compare to Table 2.5)

Page 54: Binary Logistic Regression

Figures

cResources.xls

Page 55: Binary Logistic Regression

Effect of cResources on Continuation (Probabilities)

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

Centered Resources

Pro

ba

bil

itie

s

Probabilities 0.29 0.34 0.39 0.44 0.49 0.54 0.60 0.65 0.69 0.74 0.77

-5 -4 -3 -2 -1 0 1 2 3 4 5

Page 56: Binary Logistic Regression

Question & Answer

Are foster families with more resources more likely to continue fostering?Yes. The odds of continuing are 1.24 times

(24%) higher for each additional resource. The probability of continuing is .34 for families with 4 resources below the mean, .54 for families with the mean number of resources (6.60), and .74 for families with 4 resources above the mean.

Page 57: Binary Logistic Regression

Multiple IV Example

DV = continue fostering, 0 = no, 1 = yesCustomary to code the category of interest as

1 and the other category as 0 IV = married, 0 = not married, 1 =

married IV = number of resources (z-scores) N = 131 foster families

Are foster families with more resources more likely to continue fostering, controlling for marital status?

Page 58: Binary Logistic Regression

Statistical Significance

Table 2.12Relationship between set of IVs and

continuation is statistically significant (2 = 6.58, p = .037)

H0: 1 = 2 = k = 0, same as

H0: 1 = 2 = k = 1 (psi) is symbol for population value of OR

Page 59: Binary Logistic Regression

Statistical Significance (cont’d) Table 2.13

Relationship between resources and continuation is not statistically significant, controlling for marital status (2 = .92, p = .338)

Relationship between marital status and continuation is not statistically significant, controlling for resources (2 = 1.42, p = .234)

H0: = 0, 0, ≤ 0, same asH0: = 1, 1, ≤ 1

(psi) is symbol for population value of ORLikelihood ratio 2 better than Wald

Page 60: Binary Logistic Regression

Statistical Significance (cont’d) Table 2.9

Relationship between resources and continuation is not statistically significant, controlling for marital status (2 = .91, p = .340)

Relationship between marital status and continuation is not statistically significant, controlling for resources (2 = 1.41, p = .235)

H0: = 0, 0, ≤ 0, same asH0: = 1, 1, ≤ 1

(psi) is symbol for population value of OR Wald 2, but likelihood ratio 2 better

Page 61: Binary Logistic Regression

Estimated Logits

L(Continue) = -.183 + (.228)(XzResources) + (.570)(XMarried)

Page 62: Binary Logistic Regression

ORs & Percentage Change

ORzResources = 1.256 (ns)The odds of continuing are 1.26 times (26%)

higher for each one standard deviation (1.93) increase in resources, controlling for marital status

ORMarried = 1.769 (ns)The odds of continuing are 1.77 times (77%)

higher for two-parent compared to one-parent families, controlling for marital status

Page 63: Binary Logistic Regression

Figures

Married & zResources.xls

Page 64: Binary Logistic Regression

Effect of Resources and Marital Status on Plans to Continue Fostering (Odds)

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

Standardized Resources

Od

ds

One-Parent 0.42 0.53 0.66 0.83 1.05 1.31 1.65

Two-Parent 0.74 0.93 1.17 1.47 1.85 2.32 2.92

-3 -2 -1 0 1 2 3

Page 65: Binary Logistic Regression

Effect of Resources and Marital Status on Plans to Continue Fostering (Probabilities)

0.00

0.100.20

0.300.40

0.50

0.600.70

0.80

Standardized Resources

Pro

ba

bil

itie

s

One-Parent 0.30 0.35 0.40 0.45 0.51 0.57 0.62

Two-Parent 0.43 0.48 0.54 0.60 0.65 0.70 0.74

-3 -2 -1 0 1 2 3

Page 66: Binary Logistic Regression

Presenting Odds and Probabilities in Tables

Tables 2.10 and 2.11

Page 67: Binary Logistic Regression

Question & Answer

Are foster families with more resources more likely to continue fostering, controlling for marital status?No (ns). The odds of continuing are 1.26

times (26%) higher for each one standard deviation (1.93) increase in resources, controlling for marital status.

Cont’d

Page 68: Binary Logistic Regression

Question & Answer (cont’d)

For one-parent families the probability of continuing is .35 for families with resources two standard deviations below the mean, .45 for families with the mean number of resources, and .57 for families with resources two standard deviations above the mean. For two-parent families the probability of continuing is .48 for families with resources two standard deviations below the mean, .60 for families with the mean number of resources, and .70 for families with resources two standard deviations above the mean.

Page 69: Binary Logistic Regression

Comparing the Relative Strength of IVs

Size of slope and OR depend on how the IV is measuredWhen IVs measured the same way (e.g., two

dichotomous IVs or two continuous IVs transformed to z-scores) relative strength can be compared

Nothing comparable to standardized slope (Beta)

Page 70: Binary Logistic Regression

Nested ModelsNested Models

IV1, IV2, IV3

IV1, IV2 IV2, IV3 IV1, IV3

IV1 IV2 IV1IV2 IV3 IV3

Page 71: Binary Logistic Regression

Nested Models (cont’d)Nested Models (cont’d)

One regression model is nested within another if it contains a subset of variables included in the model within which it’s nested, and same cases are analyzed in both models

The more complex model called the “full model” The nested model called the “reduced model.” Comparison of full and reduced models allows

you to examine whether one or more variable(s) in the full model contribute to explanation of the DV

Page 72: Binary Logistic Regression

Sequential Entry of IVs

Used to compare full and reduced modelse.g., family resources entered first, and then

marital status

Fchange used in linear regression

Page 73: Binary Logistic Regression

Sequential Entry of IVs (cont’d) SPSS GZLM doesn’t allow sequential of

IVsEstimate models separately and compare

omnibus likelihood ratio 2 values

Reduced model 2(1) = 5.168Full model 2(2) = 6.585

2 difference = 6.585 – 5.168 = 1.417df difference = 2 – 1p = .234Chi-square Difference.xls

Page 74: Binary Logistic Regression

Assumptions Necessary for Testing Hypotheses No assumptions unique to binary

logistic regression other than ones discussed in GZLM lecture

Page 75: Binary Logistic Regression

Model Evaluation

Evaluate your model before you test hypotheses or interpret substantive resultsOutliersAnalogs of R2

Page 76: Binary Logistic Regression

Outliers

Atypical cases Can lead to flawed conclusions Can provide theoretical insights Common causes

Data entry errorsModel misspecificationRare events

Page 77: Binary Logistic Regression

Outliers (cont’d)

Leverage

ResidualsStandardized or unstandardized deviance

residuals

InfluenceCook’s D

Page 78: Binary Logistic Regression

Leverage

Think of a seesaw Leverage value for each case Cases with greater leverage can exert a

disproportionately large influence Leverage value for each case No clear benchmarks

Identify cases with substantially different leverage values than those of other cases

Page 79: Binary Logistic Regression

Residuals

Difference between actual and estimated values of the DV for a case

Residual for each case Large residual indicates a case for

which model fits poorly

Page 80: Binary Logistic Regression

Residuals (cont’d)

Standardized or unstandardized deviance residualsNot normally distributedValues less than -2 or greater than +2

warrant some concernValues less than -3 or greater than +3 merit

close inspection

Page 81: Binary Logistic Regression

Influence

Cases whose deletion result in substantial changes to regression coefficients

Cook’s D for each caseApproximate aggregate change in

regression parameters resulting from deletion of a case

Values of 1.0 or more indicate a problematic degree of influence for an individual case

Page 82: Binary Logistic Regression

Index Plot

Scatterplot

Horizontal axis (X)• Case id

Vertical axis (Y)• Leverage values, or• Residuals, or• Cook’s D

Page 83: Binary Logistic Regression

Index Plot: Leverage Values

Page 84: Binary Logistic Regression

Index Plot: Standardized Deviance Residuals

Page 85: Binary Logistic Regression

Index Plot: Cook’s D

Page 86: Binary Logistic Regression

Analogs of RAnalogs of R22

None in standard use and each may give different results

Typically much smaller than R2 values in linear regression

Difficult to interpret

Page 87: Binary Logistic Regression

Multicollinearity

SPSS GZLM doesn’t compute multicollinearity statistics

Use SPSS linear regression

Problematic levelsTolerance < .10 or VIF > 10

Page 88: Binary Logistic Regression

Additional Topics

Polychotomous IVs Curvilinear relationships Interactions

Page 89: Binary Logistic Regression

Overview of the Process

Select IVs and decide whether to test curvilinear relationships or interactions

Carefully screen and clean data Transform and code variables as needed Estimate regression model Examine assumptions necessary to

estimate binary regression model, examine model fit, and revise model as needed

Page 90: Binary Logistic Regression

Overview of the Process (cont’d)

Test hypotheses about the overall model and specific model parameters, such as ORs

Create tables and graphs to present results in the most meaningful and parsimonious way

Interpret results of the estimated model in terms of logits, probabilities, odds, and odds ratios, as appropriate

Page 91: Binary Logistic Regression

Additional Regression Models for Dichotomous DVs Binary probit regression

Substantive results essentially indistinguishable from binary logistic regression

Choice between this and binary logistic regression largely one of convenience and discipline-specific convention

Many researchers prefer binary logistic regression because it provides odds ratios whereas probit regression does not, and binary logistic regression comes with a wider variety of fit statistics

Page 92: Binary Logistic Regression

Additional Regression Models for Dichotomous DVs (cont’d) Complementary log-log (clog-log) and

log-log models Probability of the event is very small or

large Loglinear regression

Limited to categorical IVs Discriminant analysis

Limited to continuous IVs