hsrp 734: advanced statistical methods may 22, 2008

71
HSRP 734: Advanced Statistical Methods May 22, 2008

Upload: christian-kennedy

Post on 30-Dec-2015

38 views

Category:

Documents


0 download

DESCRIPTION

HSRP 734: Advanced Statistical Methods May 22, 2008. Course Website. Course site in Public Health Sciences (PHS) website: http://www.phs.wfubmc.edu/public/edu_statMeth.cfm. Course Syllabus. HSRP 734: Advanced Statistical Methods. Categorical Data Analysis Logistic Regression - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: HSRP 734:  Advanced Statistical Methods May 22, 2008

HSRP 734: Advanced Statistical Methods

May 22, 2008

Page 2: HSRP 734:  Advanced Statistical Methods May 22, 2008

Course Website

• Course site in Public Health Sciences (PHS) website:

http://www.phs.wfubmc.edu/public/edu_statMeth.cfm

Page 3: HSRP 734:  Advanced Statistical Methods May 22, 2008

Course Syllabus

Page 4: HSRP 734:  Advanced Statistical Methods May 22, 2008

HSRP 734: Advanced Statistical Methods

• Categorical Data Analysis

• Logistic Regression

• Survival analysis

• Cox PH regression

Page 5: HSRP 734:  Advanced Statistical Methods May 22, 2008

What is Categorical Data Analysis?

• Statistical analysis of data that are non-continuous

• Includes dichotomous, ordinal, nominal and count outcomes

• Examples: Disease incidence, Tumor response

Page 6: HSRP 734:  Advanced Statistical Methods May 22, 2008

What is Logistic Regression?

A statistical method used to model dichotomous or binary outcomes (but not limited to) using predictor variables.

Page 7: HSRP 734:  Advanced Statistical Methods May 22, 2008

What is Logistic Regression?

• Used when the research method is focused on whether or not an event occurred, rather than when it occurred

• Time course information is not used

Page 8: HSRP 734:  Advanced Statistical Methods May 22, 2008

Logistic Regression quantifies “effects” using Odds Ratios

• Does not model the outcome directly, which leads to effect estimates quantified by means (i.e., differences in means)

• Estimates of effect are instead quantified by “Odds Ratios”

Page 9: HSRP 734:  Advanced Statistical Methods May 22, 2008

The Logistic Regression Model

0 1 1 2 2 K K

P Yln

1-P YX X X

predictor variables

YP1

YPln is the log(odds) of the outcome.

dichotomous outcome

Page 10: HSRP 734:  Advanced Statistical Methods May 22, 2008

The Logistic Regression Model

0 1 1 2 2 K K

P Yln

1-P YX X X

intercept

YP1

YPln is the log(odds) of the outcome.

model coefficients

Page 11: HSRP 734:  Advanced Statistical Methods May 22, 2008

A Short ReviewA Short Review

Page 12: HSRP 734:  Advanced Statistical Methods May 22, 2008

Philosophy of Science

• Idea: We posit a paradigm and attempt to falsify that paradigm.

• Science progresses faster via attempting to falsify a paradigm than attempting to corroborate a paradigm.

(Thomas S. Kuhn. 1970. The Structure of Scientific Revolutions. University of Chicago Press.)

Page 13: HSRP 734:  Advanced Statistical Methods May 22, 2008

Philosophy of Science• The fastest way to progress in science under this paradigm of

falsification is through perturbation experiments.

• In epidemiology, – often unable to do perturbation experiments– it becomes a process of accumulating evidence

• Statistical testing provides a rigorous data-driven framework for falsifying hypothesis

Page 14: HSRP 734:  Advanced Statistical Methods May 22, 2008

The P-Value

• What is the probability of having gotten a sample mean as extreme as 4.8 if the null hypothesis was true (H0: = 0)?

• P-value = probability of obtaining a result as or more “extreme” than observed if H0 was true.

• Consider for the above example, if p = 0.0089 (less than a 9 out of 1,000 chance)

• What if p = 0.0501 (5 out of 100 chance) ?

Page 15: HSRP 734:  Advanced Statistical Methods May 22, 2008

Hypothesis Testing

1. Set up a null and alternative hypothesis

2. Calculate test statistic

3. Calculate the p-value for the test statistic

4. Based on p-value make a decision to reject or fail to reject the null hypothesis

5. Make your conclusion

Page 16: HSRP 734:  Advanced Statistical Methods May 22, 2008

Hypothesis Testing

Your decision vs. Truth

Truth: H0 True Truth: H0 False

Decision: Fail to reject H0

Correct Decision Incorrect DecisionType II Error ()

Decision:Reject H0

Incorrect DecisionType I Error ()

Correct Decision(Power)

Page 17: HSRP 734:  Advanced Statistical Methods May 22, 2008

Hypothesis Testing

• Type I error () = the probability of rejecting the null hypothesis given that H0 is true (the significance level of a test).

• Type II error (): the probability of not rejecting the null hypothesis given that H0 is false (not rejecting when you should have).

• Power = 1 -

Page 18: HSRP 734:  Advanced Statistical Methods May 22, 2008

Power

• The power of a test is: The probability of rejecting a false null

hypothesis under certain assumed differences between the populations.

• We like a study that has “high” power (usually at least 80%).

Page 19: HSRP 734:  Advanced Statistical Methods May 22, 2008

• Any difference can become significant if N is large enough

• Even if there is statistical significance is there clinical significance?

Page 20: HSRP 734:  Advanced Statistical Methods May 22, 2008

Controversy around HT and p-value

“A methodological culprit responsible for spurious theoretical conclusions”

(Meehl, 1967; see Greenwald et al, 1996)

“The p-value is a measure of the credibility of the null hypothesis. The smaller the p-value is, the less likely one feels the null hypothesis can be true.”

Page 21: HSRP 734:  Advanced Statistical Methods May 22, 2008

HT and p-value

• “It cannot be denied that many journal editors and investigators use p-value < 0.05 as a yardstick for the publishability of a result.”

• “This is unfortunate because not only p-value, but also the sample size and magnitude of a physically important difference determine the quality of an experimental finding.”

Page 22: HSRP 734:  Advanced Statistical Methods May 22, 2008

HT and p-value

• Consider a new cancer drug that possibly shows significant improvements.

• Should we consider a p = 0.01 the same as a p = 0.00001 ?

Page 23: HSRP 734:  Advanced Statistical Methods May 22, 2008

HT and p-value

• “[We] endorse the reporting of estimation statistics (such as effect sizes, variabilities, and confidence intervals) for all important hypothesis tests.”

– Greenwald et al (1996)

Page 24: HSRP 734:  Advanced Statistical Methods May 22, 2008

Reporting Statistics

• Reporting I. Statistical Methods

The changes in blood pressure after oral contraceptive use were calculated for 10 women. A paired t-test was used to determine if there was a significant change in blood pressure and a 95% confidence was calculated for the mean blood pressure change (after-before).

Page 25: HSRP 734:  Advanced Statistical Methods May 22, 2008

Reporting Statistics

• Reporting II. Results

Blood pressure measurements increased on average 4.8 mmHg with standard deviation of 4.57. The 95% confidence interval for the mean change was (1.53, 8.07).

There was evidence that blood pressure measurements after oral contraceptive use were significantly higher than before oral contraceptive use (p = 0.009).

Page 26: HSRP 734:  Advanced Statistical Methods May 22, 2008

HSRP 734Lecture 1:

Measures of Disease Occurrence and Association

Page 27: HSRP 734:  Advanced Statistical Methods May 22, 2008

Objectives:

1.Define and compute the measures of disease occurrence and association

2.Discuss differences in study design and their implications for inference

Page 28: HSRP 734:  Advanced Statistical Methods May 22, 2008

Example

CT images rated

by radiologist

(Rosner p.65)

Page 29: HSRP 734:  Advanced Statistical Methods May 22, 2008

Rated as normal

Rated as questionable

Rated as abnormal

Normal 39 6 13

Abnormal 5 2 44

Page 30: HSRP 734:  Advanced Statistical Methods May 22, 2008

(Cell %)Row %Col %

Rated as normal

Rated as questionable

Rated as abnormal

Normal

39 (35.8%)

67%88.6%

6 (5.5%)10.3%75%

13 (11.9%)22.4%22.8%

58

Abnormal

5(4.6%)9.8%

11.4%

2(1.8%)3.9%25%

44(40.4%)86.3%77.2%

51

44 8 57 109

Page 31: HSRP 734:  Advanced Statistical Methods May 22, 2008

Basic Probability

• Conditional probability

– Restrict yourself to a “subspace” of the sample space

Male Female

Young 20% 10%

Old 35% 35%

Page 32: HSRP 734:  Advanced Statistical Methods May 22, 2008

Conditional probabilities

• Probability that something occurs (event B), given that event A has occurred (conditioning on A)

• Pr(B given that A is true) = Pr(B | A)

Page 33: HSRP 734:  Advanced Statistical Methods May 22, 2008

Conditional probabilities

• Categorical data analysis• odds ratio = ratio of odds of two

conditional probabilities

• Conditional probabilities in survival analysis of the form :

Pr(live till time t1+t2 | survive up till time t1)

Page 34: HSRP 734:  Advanced Statistical Methods May 22, 2008

Basic probability

• Example: automatic blood-pressure machine

• 84% hypertensive and 23% normotensives are classified as hypertensive

• Given 20% of adult population is hypertensive

• We now know:

Pr(machine says hypertensive | truly hypertensive)

• What is Pr(truly hypertensive| machine says hypertensive)?

Page 35: HSRP 734:  Advanced Statistical Methods May 22, 2008

Basic probability

Machine diagnosed as hypertensive (D)

Hypertension (H) Yes No

Yes

No

Page 36: HSRP 734:  Advanced Statistical Methods May 22, 2008

Basic probability

• Positive predictive value — Probability that a randomly selected subject from the population actually has the disease given that the screening test is positive

• Negative predictive value — Probability that a randomly selected subject from the population is actually disease free given that the screening test is negative

Page 37: HSRP 734:  Advanced Statistical Methods May 22, 2008

Basic probability

• Sensitivity — Probability that the procedure is positive given that the person has the disease

• Specificity — Probability that the procedure is negative given that the person does not have the disease

Review examples 3.26, 3.27, and 3.28 in Rosner

Page 38: HSRP 734:  Advanced Statistical Methods May 22, 2008

• Measures of Occurrence– Measure using proportions (e.g.,

prevalence, odds)– Rates (e.g., incidence, cumulative

incidence)

• Measure of Association– Based on odds (e.g., odds ratio)– Based on probabilities (e.g., risk ratio)

Page 39: HSRP 734:  Advanced Statistical Methods May 22, 2008

Absolute Measures of Disease Occurrence

• Point prevalence = proportion of cases at a given point in time– cross-sectional measure

• Incidence = number of new cases within a specified time interval– prospective measure

Page 40: HSRP 734:  Advanced Statistical Methods May 22, 2008

Absolute Measures of Disease Occurrence

• Example:

Consider four individuals diagnosed with lung cancer

• Proportion of death = 2/4 = 0.5• Rate of death = 2/(3+5+2+1) = 0.18 deaths per person year

Person Years of Follow-up Status

1 3 Dead

2 5 Alive

3 2 Alive

4 1 Dead

Page 41: HSRP 734:  Advanced Statistical Methods May 22, 2008

Absolute Measures of Disease Occurrence

• Two kinds of quantities used in measurement:

– Proportion: the numerator of a proportion as a subset of the denominator, e.g., prevalence

– Rate: # events which occur during a time interval divided by the total amount of time, e.g., incidence rate

Page 42: HSRP 734:  Advanced Statistical Methods May 22, 2008

Absolute Measures of Disease Occurrence

Remarks:

1) Diseases of long duration tend to have a higher prevalence

2) Incidence tends to be more informative than prevalence for causal understanding of the disease etiology

3) Incidence is more difficult to measure & more expensive

Page 43: HSRP 734:  Advanced Statistical Methods May 22, 2008

Absolute Measures of Disease Occurrence

4) Prevalence & incidence can be influenced by the evolution of screening procedures and diagnostic tests

5) Both incidence and prevalence rates may be age dependent

Page 44: HSRP 734:  Advanced Statistical Methods May 22, 2008

Absolute Measures of Disease Occurrence

• Odds = ratio of P(event occurs) to the P(event does not occur).

Example:

The probability of a disease is 0.20.

Thus, the odds are 0.20/(1-0.20) = 0.20/0.80 =0.25 = 1:4

That is, for every one person with an event, there are 4 people without the event.

p

podds

1

Page 45: HSRP 734:  Advanced Statistical Methods May 22, 2008

Absolute Measures of Disease Occurrence

• Risk of disease in time interval [t0, t1)

P(t) = Pr(developing disease in interval of length

t = t1 - t0 given disease free at the start

of the interval)

• Average Prevalence = Incidence x Duration

duration = average duration of disease after onset

Page 46: HSRP 734:  Advanced Statistical Methods May 22, 2008

Measures of Disease Association

• So far we have discussed

– Prevalence

– Incidence rate

– Cumulative incidence rate

– Risk of disease within an interval t

• All absolute measures

• Next, relative measures and associations

– Exposed (E) versus Unexposed ( )

E

Page 47: HSRP 734:  Advanced Statistical Methods May 22, 2008

Measures of Disease Association

• Population versus sample

– Probabilities (population) are denoted by symbols such as

• = P(disease within the exposed population)

– Sample estimates are denoted by

1p

1p̂

Page 48: HSRP 734:  Advanced Statistical Methods May 22, 2008

Measures of Disease Association

Exposed E

Not Exposed Total

Disease D

a b n1

No Diseasec d n0

Total m1 m0 n

D

E

Page 49: HSRP 734:  Advanced Statistical Methods May 22, 2008

Conditional distribution

Exposed E

Not Exposed Margin

Disease D

No Disease

Margin 1

D

E

1p

11 p

Page 50: HSRP 734:  Advanced Statistical Methods May 22, 2008

Conditional distribution

Exposed E

Not Exposed Margin

Disease D

No Disease

Margin 1

D

E

0p

01 p

Page 51: HSRP 734:  Advanced Statistical Methods May 22, 2008

Measures of Association

• Odds ratio: Odds of disease among exposed divided by odds of disease among unexposed

0

0

1

1

1

1

pp

pp

OR

Page 52: HSRP 734:  Advanced Statistical Methods May 22, 2008

Measures of Association

OR > 1 implies a positive association between disease and exposure

OR < 1 implies a negative association between disease and exposure

OR for disease = OR for exposure

Page 53: HSRP 734:  Advanced Statistical Methods May 22, 2008

Measures of Association

• Risk ratio = ratio between P(disease for exposed) and P(disease for unexposed) , both P(.) measured within the same duration of time

1

0

pRR

p

Page 54: HSRP 734:  Advanced Statistical Methods May 22, 2008

Measures of Association?

• Risk Difference (Excess Risk): RD = 1 - 0

RD not scale free

e.g., What is the meaning of these two equal differences

RR = 0.009. RD = 0.010-0.001 vs. RD = 0.210-0.201

• Attributable Risk for Exposed Persons:AR = (1 - 0) / 1 = 1 – 1 / RR

Page 55: HSRP 734:  Advanced Statistical Methods May 22, 2008

• Measurements of risk and relative risk in different sampling designs

• Cross-sectional• Cohort• Case-control

Page 56: HSRP 734:  Advanced Statistical Methods May 22, 2008

Measures of Disease Association

Exposed E

Not Exposed Total

Disease D

a b n1

No Diseasec d n0

Total m1 m0 n

D

E

Page 57: HSRP 734:  Advanced Statistical Methods May 22, 2008

• Cross-Sectional SamplingRandomly sample n subjects from population at time t and determine disease and exposure status.

Important: n is fixed for this design.

1) a/m1 estimates prevalence of disease at t among exposed

2) b/m0 estimates prevalence of disease at t among unexposed

3) ad/bc estimates the OR for disease and exposure

Page 58: HSRP 734:  Advanced Statistical Methods May 22, 2008

Odds Ratio

p1 = a/m1 = disease risk among exposedp0 = b/m0 = disease risk among unexposed

If p1 and p0 are small (rare disease) and the time interval is relatively short, it can be shown that OR ≈ RR

)1(

)1(

0

0

1

1

pp

pp

OR

Page 59: HSRP 734:  Advanced Statistical Methods May 22, 2008

Cross-sectional Sampling

• Cross-sectional design not prospective

• Can only test for association between exposure and prevalence and not incidence

• Cannot test hypotheses about causality

Page 60: HSRP 734:  Advanced Statistical Methods May 22, 2008

• Cohort SamplingSample n disease-free individuals from the population at time t0 and follow them until time t1.

Measure exposure history for each subject and observe which subjects develop disease in interval [t0, t1)

Important: m1, m0, and n are fixed

Page 61: HSRP 734:  Advanced Statistical Methods May 22, 2008

Cohort study: Estimates of risk

1) p1 = a/m1 estimates risk of developing disease in interval among exposed

2) p0 = b/m0 estimates risk of developing disease in interval among unexposed

3) RR ≈ p1 / p0

4) OR = ad / bc

5) IR (incidence rate): i ≈ pi / t for i = 0, 1 (and small t)

6) RD (risk difference): RD ≈ 1 – 0 ≈ (p1 – p0) / t

Page 62: HSRP 734:  Advanced Statistical Methods May 22, 2008

• Case-Control Sampling

Sample n1 cases and n0 disease free controls from target population during interval [t0, t1)

Important: n1, n0, and n are fixed

Page 63: HSRP 734:  Advanced Statistical Methods May 22, 2008

1) a/m1 and b/m0 do not estimate population disease risks

2) a/n1 estimates Pr(prior exposure | disease incidence in [t0, t1)

3) c/n0 estimates Pr(prior exposure | no disease incidence in [t0, t1)

4) OR = ad / bc

5) RR ≈ OR for rare disease or short time intervals

6) IR (incidence rate) or disease risks cannot be estimated; RD (risk difference) cannot be estimated

Page 64: HSRP 734:  Advanced Statistical Methods May 22, 2008

• Hypothetical exampleFrequency of disease and exposure in a target population

p1 = ? p0 = ?

RR = p1 / p0 = ? OR = ?

ExposureNot

ExposureTotal

Disease 8 32 40

No Disease 92 868 960

Total 100 900 1000

Page 65: HSRP 734:  Advanced Statistical Methods May 22, 2008

• Hypothetical exampleFrequency of disease and exposure in a target population

p1 = 8 / 100 = 0.08; p0 = 32 / 900 = 0.036

RR = p1 / p0 = 0.08 / 0.036 = 2.25 OR = (8 x 868) / (92 x 32) = 2.36

ExposureNot

ExposureTotal

Disease 8 32 40

No Disease 92 868 960

Total 100 900 1000

Page 66: HSRP 734:  Advanced Statistical Methods May 22, 2008

• Cohort Study50% of exposed individuals sampled25% of unexposed individuals sampled

p1 = 4 / 50 = 0.08; p0 = 8 / 225 = 0.036 RR = p1 / p0 = 0.08 / 0.036 = 2.25 OR = (4 x 217) / (46 x 8) = 2.36

Exposure Not Exposure Total

Disease 4 8 12

No Disease 46 217 263

Total 50 225 275

Page 67: HSRP 734:  Advanced Statistical Methods May 22, 2008

• Case-Control Study100% of diseased individuals sampled25% of disease-free individuals sampled

p1 = 8 / 31 = 0.26 ≠ 0.08; p0 = 32 / 249 = 0.13 ≠ 0.036

RR = p1 / p0 = (8/31) / (32/249) = 2.01 ≠ 2.25 OR = (8 x 217) / (23 x 32) = 2.36

ExposureNot

ExposureTotal

Disease 8 32 40

No Disease 23 217 240

Total 31 249 280

Page 68: HSRP 734:  Advanced Statistical Methods May 22, 2008

Odds ratio

• The odds ratio is equally valid for retrospective, prospective, or cross-sectional sampling designs

• That is, regardless of the design it estimates the same population parameter

Page 69: HSRP 734:  Advanced Statistical Methods May 22, 2008

Take home messages

– Occurrence of disease measured by prevalence, or proportion

– Incidence measured by incidence rates, or proportion per unit time

– Risk is probability of developing disease over a specified period of time

Page 70: HSRP 734:  Advanced Statistical Methods May 22, 2008

Take home messages

– Association of disease with exposure measured by odds ratios and risk ratios

– Odds ratios are valid for cross-sectional, cohort, and case-control designs, risk ratios are not

Page 71: HSRP 734:  Advanced Statistical Methods May 22, 2008

HW #1

• Due May 29

• Can talk to others but turn in own work