choosing appropriate statistical test rss6 2104

70
Choosing Appropriate Statistical Test Amr Albanna, MD, MSc

Upload: rss6

Post on 21-Jan-2015

231 views

Category:

Health & Medicine


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Choosing appropriate statistical test RSS6 2104

Choosing Appropriate Statistical Test

Amr Albanna, MD, MSc

Page 2: Choosing appropriate statistical test RSS6 2104

Factors Influencing the Selection of Statistical Tests

Study Design

Type of Data

Page 3: Choosing appropriate statistical test RSS6 2104

Study Design

Page 4: Choosing appropriate statistical test RSS6 2104

4

Page 5: Choosing appropriate statistical test RSS6 2104

Descriptive Studies

• Prevalence

– Cross-sectional study

• Incidence

– Cohort study

Page 6: Choosing appropriate statistical test RSS6 2104

Prevalence Versus Incidence

• Prevalence can be viewed as describing a pool of disease in a population.

• Incidence describes the input flow of new cases into the pool.

• Deaths and cures reflects the output flow from the pool.

Page 7: Choosing appropriate statistical test RSS6 2104

Prevalence Versus Incidence

Prevalence at time t1 = 2/10 = 20%

Source: Silva 1999

Prevalence at time t2 = 3/8 = 38%

Incidence between t1 and t2: 4/8 = 50%

Page 8: Choosing appropriate statistical test RSS6 2104

Descriptive Studies

• Determine the size of health problem in the “study base” population.

• Promote public health policies.

Page 9: Choosing appropriate statistical test RSS6 2104

Analytic Studies

• Randomized-controlled trials.

• Cohort studies

• Case-control studies

• Diagnostic studies

Page 10: Choosing appropriate statistical test RSS6 2104

Analytic Studies

• To effectively practice medicine, we need evidence/knowledge on 3 fundamental types of professional knowing “gnosis”:

Dia-gnosis Etio-gnosis Pro-gnosis

Page 11: Choosing appropriate statistical test RSS6 2104

• Most fundamental application of clinical research: to identify causal associations between exposure(s) and outcome(s)

Exposure Outcome

?

Analytic Studies

Page 12: Choosing appropriate statistical test RSS6 2104

Causal Vs. Non-causal Association

A B

Accidental

No Association

Page 13: Choosing appropriate statistical test RSS6 2104

Causal Vs. Non-causal Association

A B

A cause B

Page 14: Choosing appropriate statistical test RSS6 2104

Causal Vs. Non-causal Association

A B

B cause A

Page 15: Choosing appropriate statistical test RSS6 2104

Direction of causality: does overeating cause obesity?

Taubes G

New Scientist

2008

Page 16: Choosing appropriate statistical test RSS6 2104

Causal Vs. Non-causal Association

A B

A is not causally associated with B

C e.g. Smoking

e.g. Lung cancer e.g. Coffee

Page 17: Choosing appropriate statistical test RSS6 2104

A Research Scenario

• Study question: Does eating affect student intellectual ability.

• 100 students underwent an exam after eating lunch.

• 50% failed the exam.

• You conclude that eating worsen students intellectual ability.

Page 18: Choosing appropriate statistical test RSS6 2104

Compared to what?

• In an old movie, comedian Groucho Marx is asked: “Groucho, how’s your wife?”

• Groucho quips: “Compared to what?”

http://en.wikipedia.org

Page 19: Choosing appropriate statistical test RSS6 2104

Outcome

Outcome Counterfactual, unexposed cohort

Exposed cohort

Ideal counterfactual comparison to determine

causal effects

Maldonado & Greenland, Int J Epi 2002;31:422-29

“Initial conditions” are identical in

the exposed and unexposed groups

– because they are the same

population!

Page 20: Choosing appropriate statistical test RSS6 2104

Outcome

Outcome

Counterfactual, unexposed cohort

Exposed cohort

Substitute, unexposed cohort

Outcome

What happens in reality?

counterfactual state

is not observed

(latent)

A substitute will usually be a population other than the target population

during the etiologic time period - INITIAL CONDITIONS MAY BE

DIFFERENT

Page 21: Choosing appropriate statistical test RSS6 2104

Risk

Rate

Risk Difference

Risk Ratio

Rate Ratio

Odds Ratio

Page 22: Choosing appropriate statistical test RSS6 2104

Measures of disease

freq

Measures of effect

Measures of potential

impact

Page 23: Choosing appropriate statistical test RSS6 2104

Attributable Risk

Page 24: Choosing appropriate statistical test RSS6 2104

Population Attributable Risk

Page 25: Choosing appropriate statistical test RSS6 2104

How PAR is dependent on prevalence of exposure

Szklo & Nieto. Epidemiology: Beyond the basics. 2nd Edition, 2007

Page 26: Choosing appropriate statistical test RSS6 2104

Randomization helps to make the groups “comparable” (i.e. similar

initial conditions)

Eligible patients

Treatment

Randomization

Placebo

Outcomes

Outcomes

Randomized-controlled trials

Incidence

Incidence

Difference: “RR” or “RD”

Page 27: Choosing appropriate statistical test RSS6 2104

Observational Studies

E

E

E

E

E

E

E

E

E E

E

E

N

E

E

N

N

N

N

N

N

N

N

N

N

N

N

N

N N

N

N

N

N

N

N

N

N

E

E

E

E

E

E

E

E

E

N

N N

N

N

N

N

N

Page 28: Choosing appropriate statistical test RSS6 2104

Cohort

E E

E

E

E

E E

E

E

E E

E

N

E

E

N N

N

N N N N

N

N

N

N

N N

N

N

N

N

N

N

N N

N

N

E

E

E

E

E E

E

E E

N

N

N

N N

N

N

N

N

Un-Exposed

Exposed

Page 29: Choosing appropriate statistical test RSS6 2104

Study population

Exposed Unexposed

Disease No

Disease Disease

No Disease

Incidence of disease in exposed

Incidence of disease in unexposed

Cohort

“Risk Ratio”

“Risk Difference”

Page 30: Choosing appropriate statistical test RSS6 2104

Case-Control

E E E

E

E E E

E E

E E

E

N

E

E

N

N

N

N N N N N

N

N N N N N N

N

N

N N N N

N N E E E

E E

E E E

E

N

N

N

N

N N

N N N

Cases

Controls

Page 31: Choosing appropriate statistical test RSS6 2104

Study population

Disease No disease

Exposed Un-

exposed Exposed

Un-exposed

Odds of being exposed

Odds of being exposed

Case-control

“Odds Ratio” approximate “Risk Ratio”

Page 32: Choosing appropriate statistical test RSS6 2104

Observational Studies: Problem

Association between birth order and Down syndrome

Source: Rothman 2002 Data from Stark and Mantel (1966)

Page 33: Choosing appropriate statistical test RSS6 2104

Source: Rothman 2002

Association between maternal age and Down syndrome

Data from Stark and Mantel (1966)

Page 34: Choosing appropriate statistical test RSS6 2104

Source: Rothman 2002

Association between maternal age and Down syndrome, stratified by

birth order

Data from Stark and Mantel (1966)

Page 35: Choosing appropriate statistical test RSS6 2104

Criteria to define confounder

• A factor is a confounder if 3 criteria are met:

– a) a confounder must be causally or noncausally associated with the exposure in the source population;

– b) a confounder must be a causal risk factor (or a surrogate measure of a cause) for the disease;

– c) a confounder must not be an intermediate cause (in other words, a confounder must not be an intermediate step in the causal pathway between the exposure and the disease)

Page 36: Choosing appropriate statistical test RSS6 2104

Exposure Disease (outcome)

Confounder

Confounding Schematic

E D

C

Szklo M, Nieto JF. Epidemiology: Beyond the basics. Aspen Publishers, Inc., 2000.

Gordis L. Epidemiology. Philadelphia: WB Saunders, 4th Edition.

Page 37: Choosing appropriate statistical test RSS6 2104

Exposure Confounder

Intermediate cause

E D C

Disease

Page 38: Choosing appropriate statistical test RSS6 2104

Birth Order Down Syndrome

Confounding factor:

Maternal Age

Confounding Schematic

E D

C

Page 39: Choosing appropriate statistical test RSS6 2104

HRT use Heart disease

Confounding factor:

SES

Are confounding criteria met?

Association between HRT and heart disease

Page 40: Choosing appropriate statistical test RSS6 2104

Control of confounding: Outline

• Control at the design stage

– Randomization

– Restriction

– Matching

• Control at the analysis stage

– Conventional approaches

• Stratified analyses

• Multivariate analyses

– Newer approaches

• Propensity scores

Page 41: Choosing appropriate statistical test RSS6 2104

Observational Study on Vit E and Coronary Heart Disease

Fitzmaurice, 2004

Crude OR = (50)(384)/(501)(65) = 0.59

Are there potential confounders that can explain this crude OR?

Page 42: Choosing appropriate statistical test RSS6 2104

Vitamin E CHD

Confounding factor:

Smoking

Stratify on the

confounding

variable

Could reduced smoking among Vit E users partly

explain the observed protective effect?

Page 43: Choosing appropriate statistical test RSS6 2104

Stratified Analyses (by smoking status)

Fitzmaurice, 2004

OR (smokers) = (11)(200)/(40)(49) = 1.12

OR (non-smokers) = (39)(184)/(461)(16) = 0.97

Stratum 1

Stratum 2

Page 44: Choosing appropriate statistical test RSS6 2104

Multivariate Analysis

Page 45: Choosing appropriate statistical test RSS6 2104

•Diagnostic 2 X 2 table*:

Disease + Disease -

Test + True

Positive

False

Positive

Test - False

Negative

True

Negative

*When test results are not dichotomous, then can use ROC curves [see later]

Diagnostic Studies

Page 46: Choosing appropriate statistical test RSS6 2104

Disease

present

Disease

absent

Test

positive

True

positives

False

positives

Test

negative

False

negative

True

negatives

Sensitivity

[true positive rate]

The proportion of patients with disease who test

positive = P(T+|D+) = TP / (TP+FN)

Page 47: Choosing appropriate statistical test RSS6 2104

Disease

present

Disease

absent

Test

positive

True

positives

False

positives

Test

negative

False

negative

True

negatives

Specificity

[true negative rate]

The proportion of patients without disease who test

negative: P(T-|D-) = TN / (TN + FP).

Page 48: Choosing appropriate statistical test RSS6 2104

Disease

present

Disease

absent

Test

positive

True

positives

False

positives

Test

negative

False

negative

True

negatives

Predictive value of a positive test

Proportion of patients with positive tests who have

disease = P(D+|T+) = TP / (TP+FP)

Page 49: Choosing appropriate statistical test RSS6 2104

Disease

present

Disease

absent

Test

positive

True

positives

False

positives

Test

negative

False

negative

True

negatives

Predictive value of a negative test

Proportion of patients with negative tests who do not have

disease = P(D-|T-) = TN / (TN+FN)

Page 50: Choosing appropriate statistical test RSS6 2104

Disease

present

Disease

absent

Test

positive

True

positives

False

positives

Test

negative

False

negative

True

negatives

Likelihood Ratio of a Positive

Test

LR+ = TPR / FPR )|Pr(

)|Pr(

DT

DTLR

How more often a positive test result occurs in persons with compared to those without the target condition

Page 51: Choosing appropriate statistical test RSS6 2104

Disease

present

Disease

absent

Test

positive

True

positives

False

positives

Test

negative

False

negative

True

negatives

Likelihood Ratio of a Negative

Test

LR- = FNR / TNR )|Pr(

)|Pr(

DT

DTLR

How less likely a negative test result is in persons with the target condition compared to those without the target condition

Page 52: Choosing appropriate statistical test RSS6 2104

Continuous results: Receiver operating characteristic (ROC)curve

Blood sugar level

(2-hour after

food) in

mg/100 ml

Sensitivity

(%)

Specificity

(100%)

70

80

90

100

110

120

130

140

150

160

170

180

190

200

98.6

97.1

94.3

88.6

85.7

71.4

64.3

57.1

50.0

47.1

42.9

38.6

34.3

27.1

8.8

25.5

47.6

69.8

84.1

92.5

96.9

99.4

99.6

99.8

100

100

100

100

Area under the curve (AUC) can range from 0.5 (random chance, or no predictive ability; refers to the 45 degree line in the ROC plot) to 1 (perfect discrimination/accuracy).

The closer the curve follows the left-hand border and then the top-border of the ROC space, the more accurate the test. The closer the curve comes to the 45-degree diagonal of the ROC space, the less accurate the test.

Page 53: Choosing appropriate statistical test RSS6 2104

Systematic Review

Bates et al. Arch Intern Med 2007

Meta-analysis

Page 54: Choosing appropriate statistical test RSS6 2104

Ried K. Aus Fam Phys 2006

Page 55: Choosing appropriate statistical test RSS6 2104

Type of Data

Page 56: Choosing appropriate statistical test RSS6 2104

Continuous Variables

• Mean and 95% CI • Median and IQR

Descriptive analysis

Page 57: Choosing appropriate statistical test RSS6 2104

Continuous Variables

• Two Variable

– Student t test

– Paired t test (matched pairs)

– Univariate Linear Regression

• More than two variables

– ANOVA

– Multivariate Linear Regression

Comparative analysis

Page 58: Choosing appropriate statistical test RSS6 2104

Categorical Variables

• Descriptive analysis

– Proportion and 95% CI

• Comparative analysis

– Chi Square test

– Fisher's exact test

– Logistic Regression

Page 59: Choosing appropriate statistical test RSS6 2104

Incidence Risk Vs. Incidence Rate Hypothetical cohort of 12 initially disease-free subjects followed

over a 5-year period from 1990 to 1995.

Incidence risk = 5/12 = 42/100 persons Incidence rate = 5/25 = 20/100 person-year

Kleinbaum et al. ActivEpi

Page 60: Choosing appropriate statistical test RSS6 2104
Page 61: Choosing appropriate statistical test RSS6 2104
Page 62: Choosing appropriate statistical test RSS6 2104
Page 63: Choosing appropriate statistical test RSS6 2104
Page 64: Choosing appropriate statistical test RSS6 2104

Incidence Rate

Page 65: Choosing appropriate statistical test RSS6 2104

Example Hypothetical cohort of 12 initially disease-free subjects followed

over a 5-year period from 1990 to 1995.

Kleinbaum et al. ActivEpi

Incidence risk = 5/12 = 0.42 (42 per 100 persons)

Incidence rate = 5/25 = 0.2 per person year

Page 66: Choosing appropriate statistical test RSS6 2104

Statistical Significance: P-Value “or” 95% Confidence Interval

Page 67: Choosing appropriate statistical test RSS6 2104

Hypothesis Testing (P-value)

• Null hypothesis No difference.

• P-value < 0.05 Reject the null hypothesis (there is difference).

Page 68: Choosing appropriate statistical test RSS6 2104

Problems with P-values

• Does not measure the magnitude of the difference.

• Depends on the sample size.

– Very small difference can become significant by increasing the sample size.

• Multiple testing will increase the chance of having positive (significant difference) result due to random error.

Page 69: Choosing appropriate statistical test RSS6 2104

Biggest problem!

• We know that the null hypothesis (difference = zero) is not true.

• We just need enough power (sample size) to reject the null hypothesis (and make our study “POSITIVE”).

• Example: 5-years mortality

Group 1 Group 2

0.0021633098649999 0.0021633098649999

Page 70: Choosing appropriate statistical test RSS6 2104

Confidence Interval

No difference (equivalent)

Inconclusive

Better

No difference

May be better, not worse

Better Worse