hsrp 734: advanced statistical methods may 29, 2008

63
HSRP 734: HSRP 734: Advanced Advanced Statistical Statistical Methods Methods May 29, 2008 May 29, 2008

Upload: geneva

Post on 20-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

HSRP 734: Advanced Statistical Methods May 29, 2008. Finish talking about Association Measures: Odds Ratio. OR=2 of Disease for Exposed vs. Not exposed What is the interpretation? “Exposed patients have twice the odds of disease versus patients that were not exposed.”. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: HSRP 734:  Advanced Statistical Methods May 29, 2008

HSRP 734: HSRP 734: Advanced Advanced

Statistical MethodsStatistical MethodsMay 29, 2008May 29, 2008

Page 2: HSRP 734:  Advanced Statistical Methods May 29, 2008

Finish talking about Association Measures:

Odds Ratio

• OR=2 of Disease for Exposed vs. Not exposed

• What is the interpretation?

• “Exposed patients have twice the odds of disease versus patients that were not exposed.”

Page 3: HSRP 734:  Advanced Statistical Methods May 29, 2008

Finish talking about Association Measures:

Relative Risk

• RR=2.5 of Disease for Exposed vs. Not exposed

• What is the interpretation?

• “Exposed patients are 2.5 times as likely to have the disease versus patients that were not exposed.”

Page 4: HSRP 734:  Advanced Statistical Methods May 29, 2008

Finish talking about Association Measures

• OR is not close to RR

• Unless Pr(disease) for Exposed, Not exposed low

• “Rare” disease

Page 5: HSRP 734:  Advanced Statistical Methods May 29, 2008

Finish talking about Association Measures

• Confidence intervals for Odds Ratio

• Confidence intervals for Relative Risk

Page 6: HSRP 734:  Advanced Statistical Methods May 29, 2008

Measures of Disease Association

Exposed E

Not Exposed Total

Disease D

a b n1

No Diseasec d n0

Total m1 m0 N

D

E

Page 7: HSRP 734:  Advanced Statistical Methods May 29, 2008

Confidence limits are based on the sampling distribution of

which is normal or approximately normal with

and

Confidence Interval for Odds Ratio

dcbaVariance

1111

bc

adMean ln

)ˆ1(ˆ

)ˆ1(ˆ

lnln

2

2

1

1

pp

pp

bc

ad

Page 8: HSRP 734:  Advanced Statistical Methods May 29, 2008

Confidence Interval for Odds Ratio

S1.96M CI 95% Calculate .3

1111S Calculate .2

lnORlnM Calculate .1

e

dcba

bc

ad

*Use if N>25

Page 9: HSRP 734:  Advanced Statistical Methods May 29, 2008

Confidence limits are based on the sampling distribution of

which is normal or approximately normal with

and

Confidence Interval for Risk Ratio

2

1lnp

pMean

2

1

ˆ

ˆln

)/(

)/(ln

p

p

dcc

baa

)()( dcc

d

baa

bVariance

Page 10: HSRP 734:  Advanced Statistical Methods May 29, 2008

Confidence Interval for Risk Ratio

S1.96M CI 95% Calculate .3

S Calculate .2

lnRRlnM Calculate .1

e

dcc

d

baa

b

dcc

baa

*Use if N>25

Page 11: HSRP 734:  Advanced Statistical Methods May 29, 2008

SAS Enterprise:

or_rr.sas7bdat

Page 12: HSRP 734:  Advanced Statistical Methods May 29, 2008

SAS websites

• Online help:

http://support.sas.com/onlinedoc/913/docMainpage.jsp

• UCLA: http://www.ats.ucla.edu/stat/SAS/

• SAS SUGI:

http://support.sas.com/events/sasglobalforum/previous/index.html

Page 13: HSRP 734:  Advanced Statistical Methods May 29, 2008

Categorical Data Analysis

1. Understand the Multinomial probability mass function

2. Compute Goodness-of-fit tests and chi-squared tests for association

3. Test for association in the presence of a possibly confounding third factor

(e.g., disease versus exposure from 3 sites)

Page 14: HSRP 734:  Advanced Statistical Methods May 29, 2008

Categorical Data Analysis

• Motivation – How do we estimate and test the magnitude of

posited relationship when the outcome of interest is categorical?

– e.g., An international study examines the relationship between age at first birth and the development of breast cancer

• Age = categorized into age groups

Page 15: HSRP 734:  Advanced Statistical Methods May 29, 2008

Categorical Data Analysis

<20 20-24 25-29 30-34 >=35 Total

Cancer 320 1206 1011 463 220 3220

No cancer

1422 4432 2893 1092 406 10245

Total 1742 5638 3904 1555 626 13465

Page 16: HSRP 734:  Advanced Statistical Methods May 29, 2008

Categorical Data Analysis

• Research question

– Is there a relationship between age at first birth and Cancer status?

• Better to convert the table into percentages (easier to see)

• Turns out that there is a significant relationship (p<0.001)

Page 17: HSRP 734:  Advanced Statistical Methods May 29, 2008

Categorical Data Analysis

• Statistical techniques involve

– Probability distribution for categorical data

– Tests for relationship in a RxC table

R = # of Rows in Table

C = # of Columns in Table

Page 18: HSRP 734:  Advanced Statistical Methods May 29, 2008

Probability Distribution for Categorical Outcomes

• Fun for Friday night:

– Go home and flip a quarter 10,000 times. Determine if there is evidence that one side is falling down more.

Page 19: HSRP 734:  Advanced Statistical Methods May 29, 2008

Probability Distributions for Categorical Data

– Bernoulli (1 toss of a coin, outcome=H,T)

– Binomial (10 tosses of a coin, outcome=0,1,2..,10 heads)

– Multinomial (throw 10 balls into 4 pigeon holes ABCD, outcome= (3A,2B,1C,4D))

Page 20: HSRP 734:  Advanced Statistical Methods May 29, 2008

Why use multinomial for testing?

• Relationship between 2 categorical variables

– RxC table analysis

– Based on multinomial distribution

Page 21: HSRP 734:  Advanced Statistical Methods May 29, 2008

Why use multinomial for testing?

• Example:

2 level exposure status (Exposed, Not exposed),

3 level outcome (severe, mild, no disease)

– Treat 2x3=6 outcomes as categorical or a multinomial distribution with 6 pigeon holes

– The expected probability of the pigeon holes are specified under some kind of assumptions (e.g., independence)

Page 22: HSRP 734:  Advanced Statistical Methods May 29, 2008

Level of Measurement

– Categorical response

• dichotomous

• ordinal (>2 categories, ordered)

• nominal (>2 categories, not ordered)

– Dichotomous use Binomial distribution

– Ordinal, Nominal use Multinomial distribution

Page 23: HSRP 734:  Advanced Statistical Methods May 29, 2008

Multinomial Distribution

• Multinomial experiment:1. Experiment consists of n identical and independent

trials2. Each trial results in one of K outcomes3. Let pi be the probability of outcome i

a. Each pi remains constant for each experimentb.

• The pmf for k outcomes is:

• Notes:

11

K

iip

kn

k

nn

k

k pppnnn

nnnnP ...

!!...!

!),...,,( 21

21

21

21

ii

k

ii

k

ii pnnEnnp

)(;;1

11

Page 24: HSRP 734:  Advanced Statistical Methods May 29, 2008

Example of a Multinomial Experiment

Consider an unfair die and 6 tosses:

Let

Find the probability of this outcome

1 2 3 4 5 6

Pi 0.3 0.1 0.1 0.1 0.0 0.4

ni 2 0 1 1 0 2

02592.0000144.04

720

)4.0)(0.0)(1.0)(1.0)(1.0)(3.0(!2 !0 !1 !1 !0 !2

!6

)2,0,1,1,0,2Pr(

201102

Page 25: HSRP 734:  Advanced Statistical Methods May 29, 2008

Simple Multinomial Experiments

Classical example: Mendel Sample from the second generation of seeds resulting from crossing yellow round peas and green wrinkled peas (N=556)

Yellow Green

Round Wrinkled Round Wrinkled

315 101 108 32

Page 26: HSRP 734:  Advanced Statistical Methods May 29, 2008

Mendel’s Laws of Inheritance suggest that we should expect the following ratios:

9/16, 3/16, 3/16, 1/16

For N = 556, the expected number of each outcome is:

E(YR) = 556 x 9/16 = 312.75

E(YW) = 556 x 3/16 = 104.25

E(GR) = 556 x 3/16 = 104.25

E(GW) = 556 x 1/16 = 34.75

Page 27: HSRP 734:  Advanced Statistical Methods May 29, 2008

Yellow Green

Round Wrinkled Round Wrinkled

315(312.75)

101

(104.25)

108

(104.25)

32

(34.75)

(Expected counts)

Page 28: HSRP 734:  Advanced Statistical Methods May 29, 2008

Multinomial distribution

• The observed cell counts are not identical to the expected cell counts

• Under the assumption of a multinomial model with the stated probabilities, how might we determine how unlikely it is to observe these data?

Page 29: HSRP 734:  Advanced Statistical Methods May 29, 2008

Chi-square GOF Test

• Hypothesis: observed cell counts are consistent with the multinomial probabilities

• Theoretical result

• Require that expected cell counts not too small• Expected counts > 5.

21

1

2

1

2 )()(

k

distk

i i

iik

i i

ii

Np

Npn

Expected

ExpectedObserved

Page 30: HSRP 734:  Advanced Statistical Methods May 29, 2008

Chi-square distribution

• Remarks about Chi-squared distribution:

1.Nonsymmetric

2.Strictly positive

3.Different chi-squared distribution for each df.

Page 31: HSRP 734:  Advanced Statistical Methods May 29, 2008
Page 32: HSRP 734:  Advanced Statistical Methods May 29, 2008

Chi-square GOF Test

• Applying this test to Mendel’s peas example yields

• H0: pYR = 9/16, pYW = 3/16, pGR = 3/16, pGW = 1/16

• H1: at least one pi differs from hypothesized value

Yellow Green

Round Wrinkled Round Wrinkled

Observed (ni) 315 101 108 32

Expected (Npi)

312.75 104.25 104.25 34.75

Page 33: HSRP 734:  Advanced Statistical Methods May 29, 2008

Chi-square GOF Test

47.0

75.34

75.3432

25.104

25.104108

25.104

25.104101

75.312

75.312315

)()(

2222

1

2

1

22

k

i i

iik

i i

ii

Np

Npn

Expected

ExpectedObserved

Page 34: HSRP 734:  Advanced Statistical Methods May 29, 2008

Chi-square GOF Test

• Therefore, we observed 2 = 0.47 from a multinomial experiment with k = 4. Thus, df = k-1 = 3.

For = 0.05,

• Thus, the observed chi-squared statistic is not greater than the critical value for = 0.05 and df = 3.

• We fail to find evidence that these data depart from the hypothesized probabilities. i.e., model fits well to data

81.723,95.0

214,05.01

21,1 k

Page 35: HSRP 734:  Advanced Statistical Methods May 29, 2008

Testing association in 2x2 table

• This method translates to testing cross-tabulation tables for RxC cases

• Here the cells are formed by cross-classification of 2 variables

• Null hypothesis is the 2 variables are independent

• Simplest case : 2x2 table

Page 36: HSRP 734:  Advanced Statistical Methods May 29, 2008

Testing association in 2x2 table

• Testing for independence or no association

• Similar idea to checking goodness-of-fit

– Compare what to see to what you hypothesized to be true

– You did, in fact, hypothesize “independence”

Page 37: HSRP 734:  Advanced Statistical Methods May 29, 2008

Basic Inference for 2x2 Tables

• 2x2 Contingency Table

Column Levels

Row Levels

1 2 Total

1 n11 n12 n1+

2 n21 n22 n2+

Total n+1 n+2 N

Page 38: HSRP 734:  Advanced Statistical Methods May 29, 2008

Chi-square GOF Test for 2x2 Tables

• H0: There is no association between row and columns• Under H0, the expected cell counts are the product of the

marginal probabilities and the sample size. Why?

• The classic Pearson’s chi-squared test of independence

• df = (2-1) x (2-1) = 1• Conservatively, we require EXPECTEDij ≥ 5 for all i, j

OVERALL

COLROWjiij Total

TotalTotal

N

n

N

nNEXPECTED

*

22 221

1 1

( )ij ij dist

i j ij

Observed Expected

Expected

Page 39: HSRP 734:  Advanced Statistical Methods May 29, 2008

Other Tests for 2x2 Tables

• Two alternative tests

– Yate’s continuity corrected chi-square statistic

– Mantel-Haenszel chi-square statistic

• For sufficiently large sample size, all three Chi-squared statistics are approximately equal and all have a Chi-squared distribution with 1 df

Page 40: HSRP 734:  Advanced Statistical Methods May 29, 2008

When to use Chi-square vs. Fisher’s Exact

• When the expected cell counts are less than 5, it is better to use the Fisher’s exact test.

Page 41: HSRP 734:  Advanced Statistical Methods May 29, 2008

Summary of the Use of 2 test

• Test of goodness-of-fit

Determine whether or not a sample of observed values of some random variable is compatible with the hypothesis that the sample was drawn from a population with a specified distributional form (e.g., specified probabilities of certain events)

Page 42: HSRP 734:  Advanced Statistical Methods May 29, 2008

Summary of the Use of 2 test

• Test of independenceTest the null hypothesis that two criteria of classification (variables) are independent

Page 43: HSRP 734:  Advanced Statistical Methods May 29, 2008

Summary of the Use of 2 test

• Test of homogeneity

Test the null hypothesis that the samples are drawn from populations that are homogeneous with respect to some factor (i.e., no association between group and factor)

Page 44: HSRP 734:  Advanced Statistical Methods May 29, 2008

Summary of the Use of 2 test

• Could consider this test as answering:

“Are the Row factor and Column factor associated?”

Page 45: HSRP 734:  Advanced Statistical Methods May 29, 2008

Categorical Data Analysis

• Ideas of multinomial and chi-squared test generalize to testing RxC association and RxCxK association

• Example:

– 2 exposure status, 2 disease status, 3 sites

– 2x2x3 association analysis

Page 46: HSRP 734:  Advanced Statistical Methods May 29, 2008

Test of General Association (R x C Table)

• Consider a study designed to test whether there exists an association between political party affiliation and residency within specific counties

County

Party Buncombe Transylvania Halifax

Democrat 221 160 360

Independent 200 291 160

Republican 208 106 316

Page 47: HSRP 734:  Advanced Statistical Methods May 29, 2008

• Notation for general RxC table

Response Variable Categories

Group 1 2 … c Total

1 n11 n12 … n1c n1+_

2 n21 n22 … n2c n2+

… … … … … …

r nr1 nr2 … nrc nr+

Total n+1 n+2 … n+c N

Page 48: HSRP 734:  Advanced Statistical Methods May 29, 2008

Test of General Association

• H0: There is no association between rows and columnsH1: There exists a dependence between rows and columns

• Under H0,the expected cell counts are the product of the corresponding marginal probabilities and the sample size.

• The classic Pearson’s chi squared test of independence

2)1)(1(

1 1

22

crdist

r

i

c

j ij

ijij

Expected

ExpectedObserved

OVERALL

COLROWjiij Total

TotalTotal

N

n

N

nNExpected

*

Page 49: HSRP 734:  Advanced Statistical Methods May 29, 2008

SAS Enterprise:

chisq.sas7bdat

Page 50: HSRP 734:  Advanced Statistical Methods May 29, 2008

Mantel-Haenszel test

• Often, there are other factors in a RxC test

• Mantel-Haenszel test (or Cochran Mantel Haenzsel CMH) can be used for controlling for “nuisance” factors

• Typically used for rxcx2 table– e.g., 2x2x2 cross classification– e.g., Association between disease status and

exposure controlling for age group (strata)

Page 51: HSRP 734:  Advanced Statistical Methods May 29, 2008

Stratified Analysis

• Examples of commonly used strata•Age group•Gender•Study site (hospital, country)•ethnic group

Page 52: HSRP 734:  Advanced Statistical Methods May 29, 2008

Stratified Analysis

• Myocardial infarction and anticoagulant use by Coronary Care Unit

AC use MI No MI Total

Stratum 1 No 43 56

CCU+ Yes 20 90

Total 209

Stratum 2 No 137 437

CCU- Yes 32 341

Total 947

Page 53: HSRP 734:  Advanced Statistical Methods May 29, 2008

Stratified Analysis

• Idea: test for an association while controlling for CCU effects

• Denote the counts from the first cell within the hth subtable as nh11,

• Construct the CMH test of association controlling for CCU

Page 54: HSRP 734:  Advanced Statistical Methods May 29, 2008

Stratified Analysis

• Test assumes the direction of effect within each table is the same

• The Cochran-Mantel-Haenszel approach partially removes the confounding influences of the explanatory variable (e.g., CCU)

• May improve power

Page 55: HSRP 734:  Advanced Statistical Methods May 29, 2008

Mantel-Haenszel Test

• The expected value of nh11 for h = 1,2,…,g is

and the variance of nh11

This leads to the Cochrane-Mantel-Haenszel test

h

hhhh n

nnmnE 11

1111)(

122121

11

hh

hhhhh nn

nnnnnVar

21

111

2

1 11111

distg

hh

g

h

g

hhh

nVar

mn

Page 56: HSRP 734:  Advanced Statistical Methods May 29, 2008

Direction of effects across Strata

• Note that if directions of conditional ORs are not the same, discrepancies between observed and expected from different strata may cancel out one another

• Lead to poor power and biased result

Page 57: HSRP 734:  Advanced Statistical Methods May 29, 2008

MH “Pooled” Odds Ratio

g

gg

g

gg

g

hh

hh

g

hh

hh

MH

n

nn

nnn

nnn

n

nn

nnn

nnn

n

nnn

nn

OR1221

2

212221

1

112121

2211

2

222211

1

122111

1

1221

1

2211

...

...

Page 58: HSRP 734:  Advanced Statistical Methods May 29, 2008

MH test decision list

• Z = strata of potential confounder

-> If ORc ≈ (ORZ=1 ≈ ORZ=2 ≈…) Z is not a confounder, report crude OR (ORc)

-> If ORc ≠ (ORZ=1 ≈ ORZ=2 ≈…) Z is a confounder, report adjusted OR (ORMH)

-> If ORZ=1 ≠ ORZ=2 ≠ … Z is an effect modifier, report strata specific OR’s (don’t adjust!)

Page 59: HSRP 734:  Advanced Statistical Methods May 29, 2008

Breslow Day test

• (More formal approach) Can also test for homogeneity of odds ratio across strata

• If Breslow Day test is significant => odds ratios within strata are not homogeneous. Thus, => ORMH would be inappropriate!

Page 60: HSRP 734:  Advanced Statistical Methods May 29, 2008

SAS Enterprise:

cmh.sas7bdat

Page 61: HSRP 734:  Advanced Statistical Methods May 29, 2008

Results from cmh.sas7bdat

ORcrude = 3.76 (2.01, 7.05)ORcenter1 = 4.01 (1.67, 9.66)ORcenter2 = 4.05 (1.55, 10.60)ORMH = 4.03 (2.11, 7.71)

Breslow-Day p-value = 0.99

MH Chi-square = 18.41, p-value < 0.0001

Page 62: HSRP 734:  Advanced Statistical Methods May 29, 2008

Take home messages

• Multinomial and the Chi-square test are the “workhorse” for testing of goodness-of-fit

• Idea is to compare expected counts (calculated from a pre-determined set of probabilities) and the observed counts

• The same idea can be applied to testing statistical assumptions such as no association

• CMH test is for testing association when a confounding effect (strata) may be present

Page 63: HSRP 734:  Advanced Statistical Methods May 29, 2008

For Next Class 6/5

• HW #1 key posted

• HW #2 will be due

• Read Kleinbaum Ch. 1,2