advanced models and methods in behavioral research

69
Advanced Methods and Models in Behavioral Research – Advanced Models and Methods in Behavioral Research Chris Snijders [email protected] 3 ects http://www.chrissnijders.com/ammbr (=studyguide) literature: Field book + separate course material laptop exam (+ assignments) ToDo (if not done yet): Enroll in 0a611

Upload: amandla

Post on 23-Feb-2016

51 views

Category:

Documents


1 download

DESCRIPTION

Advanced Models and Methods in Behavioral Research. Chris Snijders [email protected] 3 ects http://www.chrissnijders.com/ammbr (=studyguide) literature: Field book + separate course material laptop exam (+ assignments). ToDo ( if not done yet ): Enroll in 0a611. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Advanced Models and Methods  in Behavioral Research

Advanced Methods and Models in Behavioral Research –

Advanced Models and Methods in Behavioral Research

• Chris Snijders• [email protected]

• 3 ects

• http://www.chrissnijders.com/ammbr (=studyguide)

• literature: Field book + separate course material

• laptop exam (+ assignments)

ToDo (if not done yet):

Enroll in 0a611

Page 2: Advanced Models and Methods  in Behavioral Research

Advanced Methods and Models in Behavioral Research –

The methods package

• MMBR (6 ects)– Blumberg: questions, reliability, validity, research design– Field: SPSS: factor analysis, multiple regression, ANcOVA,

sample size etc

• AMMBR (3 ects) - Field (1 chapter): logistic regression - literature through website:

conjoint analysis multi-level regression

Page 3: Advanced Models and Methods  in Behavioral Research

Advanced Methods and Models in Behavioral Research –

Models and methods: topics• t-test, Cronbach's alpha, etc• multiple regression, analysis of (co)variance and

factor analysis

• logistic regression• conjoint analysis / repeated measures

– Stata next to SPSS– “Finding new questions”– Some data collection

In the background: “now you should be able to deal with data on your own”

Page 4: Advanced Models and Methods  in Behavioral Research

Advanced Methods and Models in Behavioral Research –

Methods in brief (1)

• Logistic regression: target Y, predictors Xi. Y is a binary variable (0/1).

- Why not just multiple regression?- Interpretation is more difficult- goodness of fit is non-standard- ...

(and it is a chapter in Field)

Page 5: Advanced Models and Methods  in Behavioral Research

Advanced Methods and Models in Behavioral Research –

Methods in brief (2)

• Conjoint analysis

Underlying assumption: for each user, the "utility" of an offer can be written as

U(x1,x2, ... , xn) = c0 + c1 x1 + ... + cn xn

- 10 Euro p/m- 2 years fixed- free phone- ...

How attractive is thisoffer to you?

Page 6: Advanced Models and Methods  in Behavioral Research

Conjoint analysis as an “in between method”

BetweenWhich phone do you like and why?What would your favorite phone be?

And:Let’s keep track of what people buy.

We have:

Advanced Methods and Models in Behavioral Research –

Page 7: Advanced Models and Methods  in Behavioral Research

Advanced Methods and Models in Behavioral Research –

Local Master Thesis example:

Fiber to the home

Speed: really fastPrice: sort of highInstallation: free!Your neighbors: are in!

How attractive is this to you?

(Roel Schuring)

Page 8: Advanced Models and Methods  in Behavioral Research

Coming up with new ideas (3)

Advanced Methods and Models in Behavioral Research –

“More research is necessary”

But on what?

YOU: come up with sensible new ideas, given previous research

Page 9: Advanced Models and Methods  in Behavioral Research

Stata next to SPSS

Advanced Methods and Models in Behavioral Research –

• It’s just better (faster, better written, more possibilities, better programmable …)

• Multi-level regression is much easier than in SPSS

• It’s good to be exposed to more than just a single statistics package (your knowledge should not be based on “where to click” arguments)

• More stable

• BTW Supports OSX as well… (anybody?)

Page 10: Advanced Models and Methods  in Behavioral Research

Every advantage has a disadvantage• Output less “polished”

• It takes some extra work to get you started

• The Logistic Regression chapter in the Field book uses SPSS (but still readable for the larger part)

• (and it’s not campus software, but subfaculty software)

• Installation …

Advanced Methods and Models in Behavioral Research –

Page 11: Advanced Models and Methods  in Behavioral Research

Logistic Regression Analysis

Credit where credit is due:slides adapted from Gerrit Rooks

That is: your Y variable is 0/1: Now what?

Page 12: Advanced Models and Methods  in Behavioral Research

The main points

1. Why do we have to know and sometimes use logistic regression?

2. What is the underlying model? What is maximum likelihood estimation?

3. Logistics of logistic regression analysis1. Estimate coefficients2. Assess model fit3. Interpret coefficients4. Check residuals

4. An example (with some output)

Page 13: Advanced Models and Methods  in Behavioral Research

Advanced Methods and Models in Behavioral Research

Page 14: Advanced Models and Methods  in Behavioral Research

Suppose we have 100 observations with information about an individuals age and wether or not this indivual had some kind of a heart disease (CHD)

ID age CHD1 20 02 23 03 24 04 25 1…98 64 099 65 1

100 69 1

Page 15: Advanced Models and Methods  in Behavioral Research

A graphic representation of the data

CHD

Age

Page 16: Advanced Models and Methods  in Behavioral Research

Let’s just try regression analysis

pr(CHD|age) = -.54 +.022*Age

Page 17: Advanced Models and Methods  in Behavioral Research

... linear regression is not a suitable model for probabilities

pr(CHD|age) = -.54 +.0218107*Age

Page 18: Advanced Models and Methods  in Behavioral Research

In this graph for 8 age groups, I plotted the probability of having a heart disease (proportion)

Page 19: Advanced Models and Methods  in Behavioral Research

A nonlinear model is probably better here

Page 20: Advanced Models and Methods  in Behavioral Research

Something like this

Page 21: Advanced Models and Methods  in Behavioral Research

This is the logistic regression model

)( 111011)|Pr(

XbbeXY

Page 22: Advanced Models and Methods  in Behavioral Research

Predicted probabilities are always between 0 and 1

)( 111011)|Pr(

XbbeXY

similar to classic regressionanalysis

Page 23: Advanced Models and Methods  in Behavioral Research

Side note: this is similar to MMBR …

Advanced Methods and Models in Behavioral Research –

Suppose Y is a percentage (so between 0 and 1).

Then consider

…which will ensure that the estimated Y will vary between 0 and 1and after some rearranging this is the same as

Page 24: Advanced Models and Methods  in Behavioral Research

… (continued)

Advanced Methods and Models in Behavioral Research –

And one “solution” might be:

- Change all Y values that are 0 to 0.001- Change all Y values that are 1 to 0.999

Now run regression on log(Y/(1-Y)) …

… but that really is sort of higgledy-piggledy …

Page 25: Advanced Models and Methods  in Behavioral Research

Logistics of logistic regression

1. How do we estimate the coefficients? 2. How do we assess model fit?3. How do we interpret coefficients? 4. How do we check regression assumptions?

Page 26: Advanced Models and Methods  in Behavioral Research

Kinds of estimation in regression

• Ordinary Least Squares (we fit a line through a cloud of dots)

• Maximum likelihood (we find the parameters that are the most likely, given our data)

We never bothered to consider maximum likelihood in standard multiple regression, because you can show that they lead to exactly the same estimator (in MR, that is, normally they differ).

Actually, maximum likelihood has superior statistical properties (efficiency, consistency, invariance, …)

Advanced Methods and Models in Behavioral Research –

Page 27: Advanced Models and Methods  in Behavioral Research

Maximum likelihood estimation• Method of maximum likelihood yields values

for the unknown parameters that maximize the probability of obtaining the observed set of data

)( 111011)|Pr(

XbbeXY

Unknown parameters

Page 28: Advanced Models and Methods  in Behavioral Research

Maximum likelihood estimation• First we have to construct the “likelihood

function” (probability of obtaining the observed set of data).

Likelihood = pr(obs1)*pr(obs2)*pr(obs3)…*pr(obsn)

Assuming that observations are independent

Page 29: Advanced Models and Methods  in Behavioral Research

Log-likelihood

• For technical reasons the likelihood is transformed in the log-likelihood (then you just maximize the sum of the logged probabilities)

LL= ln[pr(obs1)]+ln[pr(obs2)]+ln[pr(obs3)]…+ln[pr(obsn)]

Page 30: Advanced Models and Methods  in Behavioral Research

Advanced Methods and Models in Behavioral Research –

Some subtleties

• In OLS, we did not need stochastic assumptions to be able to calculate a best-fitting line (only for the estimates of the confidence intervals we need that). With maximum likelihood estimation we need this from the start

(and let us not be bothered at this point by how the confidence intervals are calculated in

maximum likelihood)

Page 31: Advanced Models and Methods  in Behavioral Research

Advanced Methods and Models in Behavioral Research –

And this is what it looks like …

Page 32: Advanced Models and Methods  in Behavioral Research

Note: optimizing log-likelihoods is difficult• It’s iterative (“searching the landscape”)

it might not converge it might converge to the wrong answer

Advanced Methods and Models in Behavioral Research –

Page 33: Advanced Models and Methods  in Behavioral Research

Advanced Methods and Models in Behavioral Research –

Nasty implication: extreme cases should be left out

(some handwaving here)

Page 34: Advanced Models and Methods  in Behavioral Research

Advanced Methods and Models in Behavioral Research –

Example (with some SPSS output)

Page 35: Advanced Models and Methods  in Behavioral Research

Estimation of coefficients: SPSS Results

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

Step 1a age ,111 ,024 21,254 1 ,000 1,117

Constant -5,309 1,134 21,935 1 ,000 ,005

a. Variable(s) entered on step 1: age.

)11.3.5( 111)|Pr( Xe

XY

Page 36: Advanced Models and Methods  in Behavioral Research

)11.3.5( 111)|Pr( Xe

XY

Page 37: Advanced Models and Methods  in Behavioral Research

)11.3.5( 111)|Pr( Xe

XY

This function fits best: other values of b0 and b1 give worse results (that is, other values have a smaller likelihood value)

Page 38: Advanced Models and Methods  in Behavioral Research

Illustration 1: suppose we chose .05X instead of .11X

)05.3.5( 111)|Pr( Xe

XY

Page 39: Advanced Models and Methods  in Behavioral Research

)40.3.5( 111)|Pr( Xe

XY

Illustration 2: suppose we chose .40X instead of .11X

Page 40: Advanced Models and Methods  in Behavioral Research

Logistics of logistic regression

• Estimate the coefficients (and their conf.int.)• Assess model fit

– Between model comparisons– Pseudo R2 (similar to multiple regression)– Predictive accuracy

• Interpret coefficients • Check regression assumptions

Page 41: Advanced Models and Methods  in Behavioral Research

42

Model fit: comparisons between models

)]baseline()New([22 LLLL

The log-likelihood ratio test statistic can be used to test the fit of a model

The test statistic has achi-square distribution

reduced modelfull model

NOTE This is sort of similar to the variance decomposition tables you see in MR!

Page 42: Advanced Models and Methods  in Behavioral Research

Advanced Methods and Models in Behavioral Research

Page 43: Advanced Models and Methods  in Behavioral Research

Between model comparisons: the likelihood ratio test

)( 11011)(P Xbbe

Y

)]baseline()New([22 LLLL

reduced modelfull model

)( 011)(P be

Y

The model including only an interceptIs often called the empty model. SPSS uses this model as a default.

Page 44: Advanced Models and Methods  in Behavioral Research

)]baseline(2)New(22 LLLL

Omnibus Tests of Model Coefficients

Chi-square df Sig.

Step 1 Step 29,310 1 ,000

Block 29,310 1 ,000

Model 29,310 1 ,000

Model Summary

Step -2 Log likelihood

Cox & Snell R

Square

Nagelkerke R

Square

1 107,353a ,254 ,341

a. Estimation terminated at iteration number 5 because

parameter estimates changed by less than ,001.

This is the test statistic,and it’s associated significance

Between model comparison: SPSS output

Page 45: Advanced Models and Methods  in Behavioral Research

46

Overall model fitpseudo R2

Just like in multiple regression, pseudo R2 ranges 0.0 to 1.0

– Cox and Snell• cannot theoretically

reach 1

– Nagelkerke• adjusted so that it

can reach 1

)(2)(2

LOGIT2

EmptyLLModelLLR

log-likelihood of modelbefore any predictors wereentered

log-likelihood of the modelthat you want to test

NOTE: R2 in logistic regression tends to be (even) smaller than in multiple regression

Page 46: Advanced Models and Methods  in Behavioral Research

47

Overall model fit: Classification table

We predict 74% correctly

Classification Tablea

Observed

Predicted

chd

0 1

Percentage

Correct

Step 1 chd 0 45 12 78,9

1 14 29 67,4

Overall Percentage 74,0

a. The cut value is ,500

Page 47: Advanced Models and Methods  in Behavioral Research

48

Overall model fit: Classification table

14 cases had a CHD while according to our modelthis shouldnt have happened

Classification Tablea

Observed

Predicted

chd

0 1

Percentage

Correct

Step 1 chd 0 45 12 78,9

1 14 29 67,4

Overall Percentage 74,0

a. The cut value is ,500

Page 48: Advanced Models and Methods  in Behavioral Research

49

Overall model fit: Classification table

12 cases didn’t have a CHD while according to our modelthis should have happened

Classification Tablea

Observed

Predicted

chd

0 1

Percentage

Correct

Step 1 chd 0 45 12 78,9

1 14 29 67,4

Overall Percentage 74,0

a. The cut value is ,500

Page 49: Advanced Models and Methods  in Behavioral Research

Logistics of logistic regression

• Estimate the coefficients • Assess model fit• Interpret coefficients

– Direction– Significance– Magnitude

• Check regression assumptions

Page 50: Advanced Models and Methods  in Behavioral Research

51

The Odds Ratio

)...(

)...(

)...( 1110

1110

1110 111)(

nn

nn

nn XbXbb

XbXbb

XbXbb ee

eYp

We had:

And after some rearranging we can get

Page 51: Advanced Models and Methods  in Behavioral Research

Magnitude of association: Percentage change in odds

event

event

prob1probOddsi

Probability Odds25% 0.3350% 175% 3

Page 52: Advanced Models and Methods  in Behavioral Research

53

Interpreting coefficients: direction

• original b reflects changes in logit: b>0 implies positive relationship

• exponentiated b reflects the “changes in odds”: exp(b) > 1 implies a positive relationship

nnxbxbxbbypyp

...)(1

)(lnlogit 22110

Page 53: Advanced Models and Methods  in Behavioral Research

54

3. Interpreting coefficients: magnitude

• The slope coefficient (b) is interpreted as the rate of change in the "log odds" as X changes … not very useful.

• exp(b) is the effect of the independent variable on the odds, more useful for calculating the size of an effect

nnxbxbxbbypyp

...)(1

)(lnlogit 22110

nnxbxbxbb eeeeypyp

...)(1

)(Odds 22110

Page 54: Advanced Models and Methods  in Behavioral Research

Another way to get an idea of the size of effects: Calculating predicted probabilities

)11.3.5( 111)|Pr( Xe

XY

For somebody of 20 years old, the predicted probability is .04

For somebody of 70 years old, the predicted probability is .91

Page 55: Advanced Models and Methods  in Behavioral Research

But this gets more complicatedwhen you have more than a single X-variable

(see blackboard)

Conclusion: if you consider the effect of a variable on the predicted probability, the size of the effect of X1 depends on the value of X2! (yuck!)

Advanced Methods and Models in Behavioral Research –

Page 56: Advanced Models and Methods  in Behavioral Research

Testing significance of coefficients

• In linear regression analysis this statistic is used to test significance

• In logistic regression something similar exists

• however, when b is large, standard error tends to become inflated, hence underestimation (Type II errors are more likely)

b

bSE

Wald

t-distribution standard error of estimate

estimate

Note: This is not the Wald Statistic SPSS presents!!!

Page 57: Advanced Models and Methods  in Behavioral Research

Interpreting coefficients: significance

• SPSS presents

• While Andy Field thinks SPSS presents this (at least in the 2nd version of the book):

bSEb

2

2

Wald

b

bSE

Wald

Page 58: Advanced Models and Methods  in Behavioral Research

Advanced Methods and Models in Behavioral Research –

Page 59: Advanced Models and Methods  in Behavioral Research

Logistics of logistic regression

• Estimate the coefficients • Assess model fit• Interpret coefficients • Check regression assumptions

Page 60: Advanced Models and Methods  in Behavioral Research

Checking assumptions0. Independent data points

(no tests for that, just think about your data)Problem: likelihood function is wrong otherwise + confidence intervals too small

1. Influential data points & Residuals– Follow Samanthas tips in Field; we will get back to this later

2. No multi-collinearity (Stata: “collin”)

3. All relevant variables included (Stata: “linktest”, nb regression: “ovtest”)

4. Hosmer & Lemeshow (Stata: “estat gof”)– Divides sample in subgroups– Checks whether there are differences between observed and predicted between

subgroups– Test should not be significant, if so: indication of lack of fit

Page 61: Advanced Models and Methods  in Behavioral Research

1. Residual statistics: Field’s rules of thumb

Page 62: Advanced Models and Methods  in Behavioral Research

1. Examining residuals in logistic regression

Isolate points for which the model fits poorlyIsolate influential data points

Page 63: Advanced Models and Methods  in Behavioral Research

Advanced Methods and Models in Behavioral Research –

2. No multi-collinearity• Problem = same as in regression, the net effect of

two (or more) collinear variables will be zero (see MMBR)

• In regression: Stata-command is “vif”:

reg y x // Stata’s regression commandvif // the variance-inflation-factors

• In logistic regression: Stata-command is “collin”

logit y x // Stata’s logit regr. Commandcollin // the variance-inflation-factors

Page 64: Advanced Models and Methods  in Behavioral Research

Advanced Methods and Models in Behavioral Research –

NOTE: “collin” is not standard Stata

help ... (if you know and have the command)

net search … (otherwise)

findit … (otherwise)

Page 65: Advanced Models and Methods  in Behavioral Research

Advanced Methods and Models in Behavioral Research –

3. All relevant variables included:Model specification

• Note that this refers to the inclusion of given variables (not the inclusion of totally other variables)(compare Stata’s “ovtest” in multiple regression)

In Stata: linktest

Many specification tests consider whether including y-hat and (y-hat)^2 would improve your model. If yes keep adding transformation of your variables

Page 66: Advanced Models and Methods  in Behavioral Research

4. Hosmer & Lemeshow

Test divides sample in subgroups, checks whether difference between observed and predicted is about equal in these groups

Test should not be significant (indicating no difference)

Page 67: Advanced Models and Methods  in Behavioral Research

Advanced Methods and Models in Behavioral Research –

Time for an example…

Page 68: Advanced Models and Methods  in Behavioral Research

Logistic regression

• Y = 0/1• Multiple regression (or ANcOVA) is not right• You consider either the odds or the log(odds)• It is estimated through “maximum likelihood”• Interpretation is a bit more complicated than normal• Assumption testing is a bit more concrete than in

multiple regression (also because now we can do this with Stata)

Advanced Methods and Models in Behavioral Research –

Page 69: Advanced Models and Methods  in Behavioral Research

Advanced Methods and Models in Behavioral Research –

8 groups – run a logistic regression in Stata

• Create groups, choose a data set

• Create a do-file that reads in the data, and runs a logistic regression (along the lines of the commands in the example file, BUT WITH MORE COMMENTS ABOUT WHAT YOU FIND)

• Start now, deliver by this Saturday

• Participation mandatory