binary logistic regression with spss

87
Binary Logistic Regression with SPSS Karl L. Wuensch Dept of Psychology East Carolina University

Upload: dangdien

Post on 05-Feb-2017

298 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Binary Logistic Regression with SPSS

Binary Logistic Regression with SPSSKarl L. Wuensch

Dept of PsychologyEast Carolina University

Page 2: Binary Logistic Regression with SPSS

Download the Instructional Document

• http://core.ecu.edu/psyc/wuenschk/SPSS/SPSS-MV.htm .

• Click on Binary Logistic Regression .• Save to desktop.• Open the document.

Page 3: Binary Logistic Regression with SPSS

When to Use Binary Logistic Regression

• The criterion variable is dichotomous.• Predictor variables may be categorical or

continuous.• If predictors are all continuous and nicely

distributed, may use discriminant function analysis.

• If predictors are all categorical, may use logit analysis.

Page 4: Binary Logistic Regression with SPSS

Wuensch & Poteat, 1998

• Cats being used as research subjects.• Stereotaxic surgery.• Subjects pretend they are on university

research committee.• Complaint filed by animal rights group.• Vote to stop or continue the research.

Page 5: Binary Logistic Regression with SPSS

Purpose of the Research

• Cosmetic• Theory Testing• Meat Production• Veterinary• Medical

Page 6: Binary Logistic Regression with SPSS

Predictor Variables

• Gender• Ethical Idealism (9-point Likert)• Ethical Relativism (9-point Likert)• Purpose of the Research

Page 7: Binary Logistic Regression with SPSS

Model 1: Decision = Gender• Decision 0 = stop, 1 = continue• Gender 0 = female, 1 = male• Model is ….. logit =

• is the predicted probability of the event which is coded with 1 (continue the research) rather than with 0 (stop the research).

bXaY

YODDS

ˆ1

ˆlnln

Y

Page 8: Binary Logistic Regression with SPSS

Iterative Maximum Likelihood Procedure

• SPSS starts with arbitrary regression coefficents.

• Tinkers with the regression coefficients to find those which best reduce error.

• Converges on final model.

Page 9: Binary Logistic Regression with SPSS

SPSS• Bring the data into SPSS• http://core.ecu.edu/psyc/wuenschk/SPSS/

Logistic.sav

• Analyze, Regression, Binary Logistic

Page 10: Binary Logistic Regression with SPSS
Page 11: Binary Logistic Regression with SPSS

• Decision Dependent• Gender Covariate(s), OK

Page 12: Binary Logistic Regression with SPSS

Look at the Output

• We have 315 cases.

Case Processing Summary

315 100.00 .0

315 100.00 .0

315 100.0

Unweighted Cases a

Included in AnalysisMissing CasesTotal

Selected Cases

Unselected CasesTotal

N Percent

If weight is in effect, see classification table for the totalnumber of cases.

a.

Page 13: Binary Logistic Regression with SPSS

Block 0 Model, Odds• Look at Variables in the Equation.• The model contains only the intercept

(constant, B0), a function of the marginal distribution of the decisions.

Variables in the Equation

-.379 .115 10.919 1 .001 .684ConstantStep 0B S.E. Wald df Sig. Exp(B)

379.ˆ1

ˆlnln

YYODDS

Page 14: Binary Logistic Regression with SPSS

Exponentiate Both Sides• Exponentiate both sides of the equation: • e-.379 = .684 = Exp(B0) = odds of deciding to

continue the research.

• 128 voted to continue the research, 187 to stop it.

187128684.)379.(ˆ1

ˆ

Exp

YY

Page 15: Binary Logistic Regression with SPSS

Probabilities

• Randomly select one participant.• P(votes continue) = 128/315 = 40.6%• P(votes stop) = 187/315 = 59.4%• Odds = 40.6/59.4 = .684• Repeatedly sample one participant and

guess how e will vote.

Page 16: Binary Logistic Regression with SPSS

Humans vs. Goldfish

• Humans Match Probabilities– (suppose p = .7, q = .3)– .7(.7) + .3(.3) = .49 + .09 = .58

• Goldfish Maximize Probabilities– .7(1) = .70

• The goldfish win!

Page 17: Binary Logistic Regression with SPSS

SPSS Model 0 vs. Goldfish• Look at the Classification Table for Block 0.

• SPSS Predicts “STOP” for every participant.• SPSS is as smart as a Goldfish here.

Classification Tablea,b

187 0 100.0128 0 .0

59.4

Observedstopcontinue

decision

Overall Percentage

Step 0stop continue

decision PercentageCorrect

Predicted

Constant is included in the model.a.

The cut value is .500b.

Page 18: Binary Logistic Regression with SPSS

Block 1 Model

• Gender has now been added to the model.• Model Summary: -2 Log Likelihood = how

poorly model fits the data.

Model Summary

399.913a .078 .106Step1

-2 Loglikelihood

Cox & SnellR Square

NagelkerkeR Square

Estimation terminated at iteration number 3 becauseparameter estimates changed by less than .001.

a.

Page 19: Binary Logistic Regression with SPSS

Block 1 Model

• For intercept only, -2LL = 425.666.• Add gender and -2LL = 399.913.• Omnibus Tests: Drop in -2LL = 25.653 =

Model 2.• df = 1, p < .001. Omnibus Tests of Model Coefficients

25.653 1 .00025.653 1 .00025.653 1 .000

StepBlockModel

Step 1Chi-square df Sig.

Page 20: Binary Logistic Regression with SPSS

Variables in the Equation

• ln(odds) = -.847 + 1.217Gender

GenderbaeODDS Variables in the Equation

1.217 .245 24.757 1 .000 3.376-.847 .154 30.152 1 .000 .429

genderConstant

Step1

a

B S.E. Wald df Sig. Exp(B)

Variable(s) entered on step 1: gender.a.

Page 21: Binary Logistic Regression with SPSS

Odds, Women

• A woman is only .429 as likely to decide to continue the research as she is to decide to stop it.

429.0847.)0(217.1847. eeODDS

Page 22: Binary Logistic Regression with SPSS

Odds, Men

• A man is 1.448 times more likely to vote to continue the research than to stop the research.

448.137.)1(217.1847. eeODDS

Page 23: Binary Logistic Regression with SPSS

Odds Ratio

• 1.217 was the B (slope) for Gender, 3.376 is the Exp(B), that is, the exponentiated slope, the odds ratio.

• Men are 3.376 times more likely to vote to continue the research than are women.

217.1376.3429.448.1

__ e

oddsfemaleoddsmale

Page 24: Binary Logistic Regression with SPSS

Convert Odds to Probabilities

• For our women,

• For our men,

30.0429.1429.0

ODDSODDSY

59.0448.2448.1

ODDSODDSY

Page 25: Binary Logistic Regression with SPSS

Classification

• Decision Rule: If Prob (event) Cutoff, then predict event will take place.

• By default, SPSS uses .5 as Cutoff.• For every man, Prob(continue) = .59,

predict he will vote to continue.• For every woman Prob(continue) = .30,

predict she will vote to stop it.

Page 26: Binary Logistic Regression with SPSS

Overall Success Rate• Look at the Classification Table

• SPSS beat the Goldfish!

%66315208

31568140

Classification Tablea

140 47 74.960 68 53.1

66.0

Observedstopcontinue

decision

Overall Percentage

Step 1stop continue

decision PercentageCorrect

Predicted

The cut value is .500a.

Page 27: Binary Logistic Regression with SPSS

Sensitivity• P (correct prediction | event did occur)• P (predict Continue | subject voted to Continue)• Of all those who voted to continue the research,

for how many did we correctly predict that.

%5312868

606868

Page 28: Binary Logistic Regression with SPSS

Specificity• P (correct prediction | event did not occur)• P (predict Stop | subject voted to Stop)• Of all those who voted to stop the research, for

how many did we correctly predict that.

%75187140

47140140

Page 29: Binary Logistic Regression with SPSS

False Positive Rate• P (incorrect prediction | predicted occurrence)• P (subject voted to Stop | we predicted Continue)• Of all those for whom we predicted a vote to Continue

the research, how often were we wrong.

%4111547

684747

Page 30: Binary Logistic Regression with SPSS

False Negative Rate• P (incorrect prediction | predicted nonoccurrence)• P (subject voted to Continue | we predicted Stop)• Of all those for whom we predicted a vote to Stop the

research, how often were we wrong.

%3020060

6014060

Page 31: Binary Logistic Regression with SPSS

Pearson 2

• Analyze, Descriptive Statistics, Crosstabs• Gender Rows; Decision Columns

Page 32: Binary Logistic Regression with SPSS

Crosstabs Statistics

• Statistics, Chi-Square, Continue

Page 33: Binary Logistic Regression with SPSS

Crosstabs Cells• Cells, Observed Counts, Row

Percentages

Page 34: Binary Logistic Regression with SPSS

Crosstabs Output

• Continue, OK• 59% & 30% match logistic’s predictions.

gender * decision Crosstabulation

140 60 20070.0% 30.0% 100.0%

47 68 11540.9% 59.1% 100.0%

187 128 31559.4% 40.6% 100.0%

Count% within genderCount% within genderCount% within gender

Female

Male

gender

Total

stop continuedecision

Total

Page 35: Binary Logistic Regression with SPSS

Crosstabs Output

• Likelihood Ratio 2 = 25.653, as with logistic.

Chi-Square Tests

25.685b 1 .00025.653 1 .000

315

Pearson Chi-SquareLikelihood RatioN of Valid Cases

Value dfAsymp. Sig.

(2-sided)

Computed only for a 2x2 tablea.

0 cells (.0%) have expected count less than 5. Theminimum expected count is 46.73.

b.

Page 36: Binary Logistic Regression with SPSS

Model 2: Decision =Idealism, Relativism, Gender

• Analyze, Regression, Binary Logistic• Decision Dependent• Gender, Idealism, Relatvsm Covariate(s)

Page 37: Binary Logistic Regression with SPSS
Page 38: Binary Logistic Regression with SPSS

• Click Options and check “Hosmer-Lemeshow goodness of fit” and “CI for exp(B) 95%.”

• Continue, OK.

Page 39: Binary Logistic Regression with SPSS

Comparing Nested Models• With only intercept and gender,

-2LL = 399.913.• Adding idealism and relativism dropped

-2LL to 346.503, a drop of 53.41.• 2(2) = 399.913 – 346.503 = 53.41, p = ?

Model Summary

346.503a .222 .300Step1

-2 Loglikelihood

Cox & SnellR Square

NagelkerkeR Square

Estimation terminated at iteration number 4 becauseparameter estimates changed by less than .001.

a.

Page 40: Binary Logistic Regression with SPSS

Obtain p• Transform, Compute• Target Variable = p• Numeric Expression =

1 - CDF.CHISQ(53.41,2)

Page 41: Binary Logistic Regression with SPSS

p = ?• OK• Data Editor, Variable View• Set Decimal Points to 5 for p

Page 42: Binary Logistic Regression with SPSS

p < .0001• Data Editor, Data View• p = .00000• Adding the ethical ideology variables

significantly improved the model.

Page 43: Binary Logistic Regression with SPSS

Hosmer-Lemeshow• Hø: predictions made by the model fit

perfectly with observed group memberships

• Cases are arranged in order by their predicted probability on the criterion.

• Then divided into (usually) ten bins with approximately equal n.

• This gives ten rows in the table.

Page 44: Binary Logistic Regression with SPSS

For each bin and each event, we have number of observed cases and expected number predicted from the model.

Contingency Table for Hosmer and Lemeshow Test

29 29.331 3 2.669 3230 27.673 2 4.327 3228 25.669 4 6.331 3220 23.265 12 8.735 3222 20.693 10 11.307 3215 18.058 17 13.942 3215 15.830 17 16.170 3210 12.920 22 19.080 3212 9.319 20 22.681 326 4.241 21 22.759 27

12345678910

Step1

Observed Expecteddecision = stop

Observed Expecteddecision = continue

Total

Page 45: Binary Logistic Regression with SPSS

• Note expected freqs decline in first column, rise in second.

• The nonsignificant chi-square is indicative of good fit of data with linear model.

Hosmer and Lemeshow Test

8.810 8 .359Step1

Chi-square df Sig.

Page 46: Binary Logistic Regression with SPSS

Hosmer-Lemeshow• There are problems with this procedure.• Hosmer and Lemeshow have

acknowledged this.• Even with good fit the test may be

significant if sample sizes are large• Even with poor fit the test may not be

significant if sample sizes are small.• Number of bins can have a big effect on

the results of this test.

Page 47: Binary Logistic Regression with SPSS

Linearity of the Logit• We have assumed that the log odds are

related to the predictors in a linear fashion.• Use the Box-Tidwell test to evaluate this

assumption.• For each continuous predictor, compute

the natural log.• Include in the model interactions between

each predictor and its natural log.

Page 48: Binary Logistic Regression with SPSS

Box-Tidwell

• If an interaction is significant, there is a problem.

• For the troublesome predictor, try including the square of that predictor.

• That is, add a polynomial component to the model.

• See T-Test versus Binary Logistic Regression

Page 49: Binary Logistic Regression with SPSS
Page 50: Binary Logistic Regression with SPSS
Page 51: Binary Logistic Regression with SPSS

Variables in the Equation

  B S.E. Wald df Sig. Exp(B)

Step 1a

gender 1.147 .269 18.129 1 .000 3.148

idealism 1.130 1.921 .346 1 .556 3.097

relatvsm 1.656 2.637 .394 1 .530 5.240idealism by idealism_LN -.652 .690 .893 1 .345 .521

relatvsm by relatvsm_LN -.479 .949 .254 1 .614 .620

Constant -5.015 5.877 .728 1 .393 .007a. Variable(s) entered on step 1: gender, idealism, relatvsm, idealism * idealism_LN , relatvsm * relatvsm_LN .

No Problem Here.

Page 52: Binary Logistic Regression with SPSS

Model 3: Decision =Idealism, Relativism, Gender, Purpose• Need 4 dummy variables to code the five

purposes.• Consider the Medical group a reference

group.• Dummy variables are: Cosmetic, Theory,

Meat, Veterin.• 0 = not in this group, 1 = in this group.

Page 53: Binary Logistic Regression with SPSS

Add the Dummy Variables

• Analyze, Regression, Binary Logistic• Add to the Covariates: Cosmetic, Theory,

Meat, Veterin.• OK

Page 54: Binary Logistic Regression with SPSS

Block 0 • Look at “Variables not in the Equation.”• “Score” is how much -2LL would drop if a

single variable were added to the model with intercept only.

Variables not in the Equation

25.685 1 .00047.679 1 .0007.239 1 .007.003 1 .955

2.933 1 .087.556 1 .456.013 1 .909

77.665 7 .000

genderidealismrelatvsmcosmetictheorymeatveterin

Variables

Overall Statistics

Step0

Score df Sig.

Page 55: Binary Logistic Regression with SPSS

Effect of Adding Purpose• Our previous model had -2LL = 346.503.• Adding Purpose dropped -2LL to 338.060.

• 2(4) = 8.443, p = .0766.• But I make planned comparisons (with medical

reference group) anyhow!

Model Summary

338.060a .243 .327Step1

-2 Loglikelihood

Cox & SnellR Square

NagelkerkeR Square

Estimation terminated at iteration number 5 becauseparameter estimates changed by less than .001.

a.

Page 56: Binary Logistic Regression with SPSS

Classification Table

• YOU calculate the sensitivity, specificity, false positive rate, and false negative rate.

Classification Tablea

152 35 81.354 74 57.8

71.7

Observedstopcontinue

decision

Overall Percentage

Step 1stop continue

decision PercentageCorrect

Predicted

The cut value is .500a.

Page 57: Binary Logistic Regression with SPSS

Answer Key

• Sensitivity = 74/128 = 58%• Specificity = 152/187 = 81%• False Positive Rate = 35/109 = 32%• False Negative Rate = 54/206 = 26%

Page 58: Binary Logistic Regression with SPSS

Wald Chi-Square

• A conservative test of the unique contribution of each predictor.

• Presented in Variables in the Equation.• Alternative: drop one predictor from the

model, observe the increase in -2LL, test via 2.

Page 59: Binary Logistic Regression with SPSS

Variables in the Equation

1.255 20.586 1 .000 3.508 2.040 6.033-.701 37.891 1 .000 .496 .397 .620.326 6.634 1 .010 1.386 1.081 1.777

-.709 2.850 1 .091 .492 .216 1.121-1.160 7.346 1 .007 .314 .136 .725-.866 4.164 1 .041 .421 .183 .966-.542 1.751 1 .186 .581 .260 1.2982.279 4.867 1 .027 9.766

genderidealismrelatvsmcosmetictheorymeatveterinConstant

Step1

a

B Wald df Sig. Exp(B) Lower Upper95.0% C.I.for EXP(B)

Variable(s) entered on step 1: gender, idealism, relatvsm, cosmetic, theory, meat, veterin.a.

Page 60: Binary Logistic Regression with SPSS

Odds Ratios – Exp(B)• Odds of approval more than cut in half (.496) for

each one point increase in Idealism.• Odds of approval multiplied by 1.39 for each one

point increase in Relativism.• Odds of approval if purpose is Theory Testing

are only .314 what they are for Medical Research.

• Odds of approval if purpose is Agricultural Research are only .421 what they are for Medical research

Page 61: Binary Logistic Regression with SPSS

Inverted Odds Ratios

• Some folks have problems with odds ratios less than 1.

• Just invert the odds ratio.• For example, 1/.421 = 2.38.• That is, respondents were more than two

times more likely to approve the medical research than the research designed to feed the poor in the third world.

Page 62: Binary Logistic Regression with SPSS

Classification Decision Rule

• Consider a screening test for Cancer.• Which is the more serious error

– False Positive – test says you have cancer, but you do not

– False Negative – test says you do not have cancer but you do

• Want to reduce the False Negative rate?

Page 63: Binary Logistic Regression with SPSS

Classification Decision Rule• Analyze, Regression, Binary Logistic• Options• Classification Cutoff = .4, Continue, OK

Page 64: Binary Logistic Regression with SPSS

Effect of Lowering Cutoff

• YOU calculate the Sensitivity, Specificity, False Positive Rate, and False Negative Rate for the model with the cutoff at .4.

• Fill in the table on page 15 of the handout.

Page 65: Binary Logistic Regression with SPSS

Answer Key

Value When Cutoff = .5 .4

Sensitivity 58% 75%

Specificity 81% 72%

False Positive Rate 32% 36%

False Negative Rate 26% 19%

Overall % Correct 72% 73%

Page 66: Binary Logistic Regression with SPSS

SAS Rules

• See, on page 16 of the handout, how easy SAS makes it to see the effect of changing the cutoff.

• SAS classification tables remove bias (using a jackknifed classification procedure), SPSS does not have this feature.

Page 67: Binary Logistic Regression with SPSS

Presenting the Results

• See the handout.

Page 68: Binary Logistic Regression with SPSS

Interaction Terms

• May want to standardize continuous predictor variables.

• Compute the interaction terms or• Let Logistic compute them.

Page 69: Binary Logistic Regression with SPSS

Deliberation and Physical Attractiveness in a Mock Trial

• Subjects are mock jurors in a criminal trial.• For half the defendant is plain, for the

other half physically attractive.• Half recommend a verdict with no

deliberation, half deliberate first.

Page 70: Binary Logistic Regression with SPSS

Get the Data

• Bring Logistic2x2x2.sav into SPSS.• Each row is one cell in 2x2x2 contingency

table.• Could do a logit analysis, but will do

logistic regression instead.

Page 71: Binary Logistic Regression with SPSS
Page 72: Binary Logistic Regression with SPSS

• Tell SPSS to weight cases by Freq. Data, Weight Cases:

Page 73: Binary Logistic Regression with SPSS

• Dependent = Guilty.• Covariates = Delib, Plain.• In left pane highlight Delib and Plain.

Page 74: Binary Logistic Regression with SPSS

• Then click >a*b> to create the interaction term.

Page 75: Binary Logistic Regression with SPSS

• Under Options, ask for the Hosmer-Lemeshow test and confidence intervals on the odds ratios.

Page 76: Binary Logistic Regression with SPSS

Significant Interaction

• The interaction is large and significant (odds ratio of .030), so we shall ignore the main effects.

Variables in the Equation

3.697 1 .054 .338 .112 1.0214.204 1 .040 3.134 1.052 9.3398.075 1 .004 .030 .003 .338.037 1 .847 1.077

DelibPlainDelib by PlainConstant

Step1

a

Wald df Sig. Exp(B) Lower Upper95.0% C.I.for EXP(B)

Variable(s) entered on step 1: Delib, Plain, Delib * Plain .a.

Page 77: Binary Logistic Regression with SPSS

• Use Crosstabs to test the conditional effects of Plain at each level of Delib.

• Split file by Delib.

Page 78: Binary Logistic Regression with SPSS

• Analyze, Crosstabs.• Rows = Plain, Columns = Guilty.• Statistics, Chi-square, Continue.• Cells, Observed Counts and Column

Percentages.• Continue, OK.

Page 79: Binary Logistic Regression with SPSS

Rows = Plain, Columns = Guilty

Page 80: Binary Logistic Regression with SPSS

• For those who did deliberate, the odds of a guilty verdict are 1/29 when the defendant was plain and 8/22 when she was attractive, yielding a conditional odds ratio of 0.09483 .

Plain * Guilty Crosstabulationa

22 8 3073.3% 26.7% 100.0%

29 1 3096.7% 3.3% 100.0%

51 9 6085.0% 15.0% 100.0%

Count% within PlainCount% within PlainCount% within Plain

Attrractive

Plain

Plain

Total

No YesGuilty

Total

Delib = Yesa.

Page 81: Binary Logistic Regression with SPSS

• For those who did not deliberate, the odds of a guilty verdict are 27/8 when the defendant was plain and 14/13 when she was attractive, yielding a conditional odds ratio of 3.1339.

Plain * Guilty Crosstabulationa

13 14 2748.1% 51.9% 100.0%

8 27 3522.9% 77.1% 100.0%

21 41 6233.9% 66.1% 100.0%

Count% within PlainCount% within PlainCount% within Plain

Attrractive

Plain

Plain

Total

No YesGuilty

Total

Delib = Noa.

Page 82: Binary Logistic Regression with SPSS

Interaction Odds Ratio• The interaction odds ratio is simply the ratio of

these conditional odds ratios – that is, .09483/3.1339 = 0.030.

• Among those who did not deliberate, the plain defendant was found guilty significantly more often than the attractive defendant, 2(1, N = 62) = 4.353, p = .037.

• Among those who did deliberate, the attractive defendant was found guilty significantly more often than the plain defendant, 2(1, N = 60) = 6.405, p = .011.

Page 83: Binary Logistic Regression with SPSS

Interaction Between Continuous and Dichotomous Predictor

Page 84: Binary Logistic Regression with SPSS

Interaction Falls Short of Significance

Page 85: Binary Logistic Regression with SPSS

Standardizing Predictors

• Most helpful with continuous predictors.• Especially when want to compare the

relative contributions of predictors in the model.

• Also useful when the predictor is measured in units that are not intrinsically meaningful.

Page 86: Binary Logistic Regression with SPSS

Predicting Retention in ECU’sEngineering Program

Page 87: Binary Logistic Regression with SPSS

Practice Your New Skills

• Try the exercises in the handout.