term 4, 2006bio656--multilevel models1 140.656 multi-level statistical models if you did not receive...

Post on 13-Jan-2016

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Term 4, 2006 BIO656--Multilevel Models 1

140.656 140.656 Multi-Level Statistical ModelsMulti-Level Statistical Models

If you did not receive the welcome email from me, email me at: (tlouis@jhsph.edu)

Term 4, 2006 BIO656--Multilevel Models 2

ROOM CHANGE, AGAIN!ROOM CHANGE, AGAIN!

• Starting Thursday, March 30th and henceforth, lectures will be in W2030

• Labs will still be in W2009

Term 4, 2006 BIO656--Multilevel Models 3

Term 4, 2006 BIO656--Multilevel Models 4

Prerequisites, resources and GradingPrerequisites, resources and Grading

Term 4, 2006 BIO656--Multilevel Models 5

Learning ObjectivesLearning Objectives

Term 4, 2006 BIO656--Multilevel Models 6

Content & ApproachContent & Approach

Term 4, 2006 BIO656--Multilevel Models 7

Approach Approach

• Lectures include basic illustrations and case studies, structuring an approach and interpreting results– Labs address computing and amplify on the

foregoing

• My approach is formal, but not “mathematical”

• To understand MLMs, you need a very good understanding on single-level models– If you understand these, you are ready to

multi-level!

Term 4, 2006 BIO656--Multilevel Models 8

StructureStructure

Term 4, 2006 BIO656--Multilevel Models 9

RULES FOR HOMEWORK,RULES FOR HOMEWORK,MID-TERM AND PROJECTMID-TERM AND PROJECT

Homework • Must be individually prepared, but you can get help• Homework due dates should be honored.• Turn in hard copy for grading

The in-class, midterm• Must be prepared absolutely independently • During the exam, no advice or information can be obtained from others• You can use your notes and reference materials

The term project• Must be individually prepared, but you can get help• Must be electronically submitted

Term 4, 2006 BIO656--Multilevel Models 10

Handouts and the WebHandouts and the Web• Virtually all course materials will be on the web• Check frequently for updates

• I’ve provided hard copy of the general information sheet

• However, other lectures will be on the web in powerpoint format and won’t be handed out

• Download to your computer so you have an electronic version each part

• Print if you need hard copy, but do it 4 or 6 to a page to save paper• More generally, try to “go electronic” printing sparingly

Term 4, 2006 BIO656--Multilevel Models 11

COMPUTING & DATACOMPUTING & DATA

• We will support WinBUGS, Stata

• We provide partial support for SAS, which should be used only by current SAS users; we aren’t teaching it from scratch

• Some homeworks require use of WinBUGS and another “traditional” program (STATA, SAS, R,...)

• We provide datasets, including some in the WinBUGS examples

Term 4, 2006 BIO656--Multilevel Models 12

WHY BUGS?WHY BUGS?

• Freeware!

• In MLMs, it’s important to see distributions– e.g., Skewness of sampling distribution of variance component estimates

• It’s important to incorporate all uncertainties in estimating random effects

• Note that WinBugs isn’t very data input friendly

• And, it’s difficult to produce P-values

Term 4, 2006 BIO656--Multilevel Models 13

STATISTICAL MODELSSTATISTICAL MODELS

• A statistical model is an approximation • Almost never is there a “correct” or “best” model, no holy grail

• A model is a tool for structuring a statistical approach and addressing a scientific question

• An effective model combines the data with prior information to address a question

Term 4, 2006 BIO656--Multilevel Models 14

MULTI-LEVEL MODELSMULTI-LEVEL MODELS

• Biological, physical, psycho/social processes that influence health occur at many levels:– Cell Organ Person Family Nhbd

City Society ... Solar system– Crew VesselFleet ...

– Block Block Group Tract ...

– Visit Patient Phy Clinic HMO ...

• Covariates can be at each level• Many “units of analysis”

• More modern and flexible parlance and approach: “many variance components”

Term 4, 2006 BIO656--Multilevel Models 15

Example: Alcohol AbuseExample: Alcohol Abuse

• Cell: neurochemistry

• Organ: ability to metabolize ethanol

• Person: genetic susceptibility to addiction

• Family: alcohol abuse in the home

• Neighborhood: availability of bars

• Society: regulations; organizations; social norms

Term 4, 2006 BIO656--Multilevel Models 16

ALCOHOL ABUSE:ALCOHOL ABUSE:A multi-level, interaction model

• Interaction between existence of bars & state, drunk driving laws

• Alcohol abuse in a family & ability to metabolize ethanol

• Genetic predisposition to addiction & household environment

• State regulations about intoxication & job requirements

Term 4, 2006 BIO656--Multilevel Models 17

Many names for similar, Many names for similar, but not identical but not identical

models, analyses and goalsmodels, analyses and goals

• Multi-Level Models• Random effects models

• Mixed models

• Random coefficient models

• Hierarchical models• Bayesian Models

Term 4, 2006 BIO656--Multilevel Models 18

We don’t need MLMsWe don’t need MLMs

• If your question is about slopes on regressors, you can run a standard regression and (usually) get valid slope estimates

Y = 0 + 1(areal monitor) + 2(home monitor) + ...

Y = 0 + 1(zipcode income) + 2(personal income) + ...

logit(P) = ......

• Analysis can be followed by computing a “robust” SE to get valid inferences

Term 4, 2006 BIO656--Multilevel Models 19

We do need MLMsWe do need MLMs

• If your question is about variance components, you need to build the multi-level model

Yijkl = 0 + 1X1 + 2 X2 + ... + ijkl

Var(Yijkl) = Var(ijkl) =

= VHospital + VClinic + VPhysician + VPatient + Vunexplained

• These variances depend on what Xs are in the model

Term 4, 2006 BIO656--Multilevel Models 20

We do need MLMsWe do need MLMs

• To create a broad class of correlation structures– Longitudinal correlations– Nested correlations

• To structure improving unit-level estimates (latent effects) and to make unit-level predictions

Term 4, 2006 BIO656--Multilevel Models 21

MLMs are effective in producing MLMs are effective in producing “working models” that “working models” that

incorporate stochastic realitiesincorporate stochastic realities

• Producing efficient population estimates• Broadening the inference beyond “these units”• Protecting against some types of informative missing data processes• Producing correlation structures• Generating “overdispersed” versions of standard models• Structuring estimation of latent effects

But, MLMs can be fragile and care is neededBut, MLMs can be fragile and care is needed

Term 4, 2006 BIO656--Multilevel Models 22

MLMs are not and should not beMLMs are not and should not be

• A religion

• A truth

• The only way to model multi-level data! 

Term 4, 2006 BIO656--Multilevel Models 23

Improving individual-level estimatesImproving individual-level estimatesSimilar to the BUGS rat data

• Dependent variable (Yij) is weight for rat “i” at age Xij

i = 1, ..., I (=10); j = 1, ..., J (=5)

Xij = Xj = (-14, -7, 0, 7, 14) = (8-22, 15-22, 22-22, 29-22 36-22)

Yij = bi0 + bi1 Xj + ij

– As usual, the intercept depends on the centering

• Analyses– Each rat has its own line – All rats follow the same line: bi0 = 0 , bi1 = 1 – A compromise between these two

Term 4, 2006 BIO656--Multilevel Models 24

Each rat has its own (LSE, MLE) lineEach rat has its own (LSE, MLE) line(with the population line)

Pop line

Term 4, 2006 BIO656--Multilevel Models 25

A multi-level model:A multi-level model: Each rat has its own line,

but the lines come from the same distribution

• The bi0 are independent Normal(0, 02)

• The bi1 are independent N(1, 12)

Overdispersion• Sample variance of the OLS estimated intercepts: 345 = SEint

2 + 02 = 320 + 0

2 02 = 25, 0 = 5

• Sample variance of the OLS estimated slopes 4.25 = SEslope

2 + 12 = 3.25 + 1

2 12 = 1.00, 1 = 1.00

Term 4, 2006 BIO656--Multilevel Models 26

A compromise: each rat has its own line,A compromise: each rat has its own line, butbut the lines come from the same distribution the lines come from the same distribution

Pop line

Term 4, 2006 BIO656--Multilevel Models 27

ONE-WAY RANDOM EFFECTS ANOVAONE-WAY RANDOM EFFECTS ANOVA

Term 4, 2006 BIO656--Multilevel Models 28

Simulated “Neighborhood Clustering”Simulated “Neighborhood Clustering”• Random mean for each of 10 neighborhoods (J=10) b1, b2, ..., b10 (iid) N(10, 9)

• Random deviation from neighborhood mean for each of 10 persons in each neighborhood (n=10) Yij = bj + eij, eij (iid) N(0, 4)

Conditional Independence Over-dispersion: Variance of each point is 13 (= 4 + 9)Correlation: Measurements within each cluster are correlated

Term 4, 2006 BIO656--Multilevel Models 29

Term 4, 2006 BIO656--Multilevel Models 30

Intra-class Correlation (ICC)Intra-class Correlation (ICC)

• Correlation of two observations in the same cluster:

ICC = Var(Between)/ Var(Total)

= 1 – Var(Within)/Var(Total)

Estimated ICC: 0.67 = (9.8-3.2)/9.8

True ICC: 0.69 = 9/(9 + 4) = 9/13

Term 4, 2006 BIO656--Multilevel Models 31

V(b)

Term 4, 2006 BIO656--Multilevel Models 32

Term 4, 2006 BIO656--Multilevel Models 33

Term 4, 2006 BIO656--Multilevel Models 34

Term 4, 2006 BIO656--Multilevel Models 35

Term 4, 2006 BIO656--Multilevel Models 36

Term 4, 2006 BIO656--Multilevel Models 37

regressionline

Pop line

45o line

Term 4, 2006 BIO656--Multilevel Models 38

Term 4, 2006 BIO656--Multilevel Models 39

Term 4, 2006 BIO656--Multilevel Models 40

Term 4, 2006 BIO656--Multilevel Models 41

Term 4, 2006 BIO656--Multilevel Models 42

Term 4, 2006 BIO656--Multilevel Models 43

Term 4, 2006 BIO656--Multilevel Models 44

WEIGHTED MEANSWEIGHTED MEANS

Term 4, 2006 BIO656--Multilevel Models 45

Term 4, 2006 BIO656--Multilevel Models 46

Term 4, 2006 BIO656--Multilevel Models 47

Term 4, 2006 BIO656--Multilevel Models 48

Term 4, 2006 BIO656--Multilevel Models 49

Term 4, 2006 BIO656--Multilevel Models 50

Term 4, 2006 BIO656--Multilevel Models 51

Term 4, 2006 BIO656--Multilevel Models 52

INFERENCE SPACEINFERENCE SPACE(Sanders)(Sanders)

• The choice between fixed and random effects depends in part on the reference population (the inference space)

–These studies or people

– Studies or people like these– .........

Term 4, 2006 BIO656--Multilevel Models 53

Random Effects Random Effects should replace “unit of analysis”should replace “unit of analysis”

• Models contain Fixed-effects, Random effects (via Variance Components) and other correlation-inducers

• There are many “units” and so in effect no single set of units

• Random Effects induce unexplained (co)variance• Some of the unexplained may be explicable by

including additional covariates• MLMs are one way to induce a structure and

estimate the REs

Term 4, 2006 BIO656--Multilevel Models 54

PLEASE DO THISPLEASE DO THIS

If you did not receive the welcome email from me, email me at: (tlouis@jhsph.edu)

Term 4, 2006 BIO656--Multilevel Models 55

ROOM CHANGE, AGAIN!ROOM CHANGE, AGAIN!

• Starting Thursday, March 30th and henceforth, lectures will be in W2030

• Labs will still be in W2009

Term 4, 2006 BIO656--Multilevel Models 56

END OF PART IEND OF PART I

top related