x treatment population control population 0 examples: drug vs. placebo, drugs vs. surgery, new tx...
TRANSCRIPT
Overview of Biostatistical Methods
Randomized
Clinical Trials (RCT)
X
Treatment population Control
population
0
Overview of Biostatistical Methods
Randomized
Clinical Trials (RCT)
Examples: Drug vs. Placebo, Drugs vs. Surgery, New Tx vs. Standard Tx Let X = cholesterol level (mg/dL);
Patients satisfying inclusion criteria
RANDOMIZE
Treatment Arm
Control Arm
RANDOM SAMPLES
End of Study
T-test F-test
(ANOVA)
Experiment
~ GOLD STANDARD ~Designed to compare two or more treatment groups for a statistically significant difference between them – i.e., beyond random chance – often measured via a “p-value” (e.g., p < .05).
significant?
1 2
0 1 2:H
possible expected distributions:
X
Post-Tx population Pre-Tx
population
Overview of Biostatistical Methods
Randomized
Clinical Trials (RCT)
Examples: Drug vs. Placebo, Drugs vs. Surgery, New Tx vs. Standard Tx Let X = cholesterol level (mg/dL)
Patients satisfying inclusion criteria
Pre-Tx Arm
Post-Tx Arm
PAIRED SAMPLES
End of Study
Paired T-test, ANOVA F-test
“repeated measures”
Experiment
~ GOLD STANDARD ~Designed to compare two or more treatment groups for a statistically significant difference between them – i.e., beyond random chance – often measured via a “p-value” (e.g., p < .05).
significant?
1 2
0 1 2:H
0
from baseline, on same patients
S(t) = P(T > t)
0
1
T
Overview of Biostatistical Methods
Randomized
Clinical Trials (RCT)
Examples: Drug vs. Placebo, Drugs vs. Surgery, New Tx vs. Standard Tx
~ GOLD STANDARD ~Designed to compare two or more treatment groups for a statistically significant difference between them – i.e., beyond random chance – often measured via a “p-value” (e.g., p < .05).
Let T = Survival time (months);
End of Study
Log-Rank Test,Cox Proportional Hazards Model
Kaplan-Meier estimates
population survival curves:
significant?
S2(t)Control
S1(t)Treatment
AUC difference
0 1 2: ( ) ( )H S t S t
survival probability
Overview of Biostatistical Methods
Case-Control studies
Case-Control studies
Cohort studiesCohort studies
E+ vs. E–
Overview of Biostatistical Methods
Observational study designs that test for a statistically significant association between a disease D and exposure E to a potential risk (or protective) factor, measured via “odds ratio,” “relative risk,” etc. Lung cancer / Smoking
PRESENT
E+ vs. E– ? D+ vs. D– ?
Case-Control studies
Case-Control studies
Cohort studiesCohort studies
Both types of study yield a 22 “contingency table” for binary variables D and E:
D+ D–
E+ a b a + b
E– c d c + d
a + c b + d n
relatively easy and inexpensive subject to faulty records, “recall bias”
D+ vs. D–
FUTUREPAST
measures direct effect of E on D expensive, extremely lengthy…
Example: Framingham, MA study
where a, b, c, d are the observed counts of individuals in each cell.
cases controls reference group
End of Study
Chi-squared Test
McNemar Test(for paired case-control study designs)
H0: No association between D and E.
–1 0 +1
Overview of Biostatistical MethodsAs seen, testing for association between categorical variables – such as disease D and exposure E – can generally be done via a Chi-squared Test.
But what if the two variables – say, X and Y – are numerical measurements?
Furthermore, if sample data does suggest that one exists, what is the nature of that association, and how can it be quantified, or modeled via Y = f (X)?
JAMA. 2003;290:1486-1493
Correlation Coefficient
measures the strength of linear association
between X and Y
X
Y
Scatterplot
r
positive linear correlation
negative linear correlation
–1 0 +1
Overview of Biostatistical MethodsAs seen, testing for association between categorical variables – such as disease D and exposure E – can generally be done via a Chi-squared Test.
Furthermore, if sample data does suggest that one exists, what is the nature of that association, and how can it be quantified, or modeled via Y = f (X)?
JAMA. 2003;290:1486-1493
Correlation Coefficient
measures the strength of linear association
between X and Y
X
Y
Scatterplot
r
positive linear correlation
negative linear correlation
But what if the two variables – say, X and Y – are numerical measurements?
–1 0 +1
Overview of Biostatistical MethodsAs seen, testing for association between categorical variables – such as disease D and exposure E – can generally be done via a Chi-squared Test.
Furthermore, if sample data does suggest that one exists, what is the nature of that association, and how can it be quantified, or modeled via Y = f (X)?
JAMA. 2003;290:1486-1493
Correlation Coefficient
measures the strength of linear association
between X and Y
X
Y
Scatterplot
r
positive linear correlation
negative linear correlation
But what if the two variables – say, X and Y – are numerical measurements?
Overview of Biostatistical MethodsAs seen, testing for association between categorical variables – such as disease D and exposure E – can generally be done via a Chi-squared Test.
Furthermore, if sample data does suggest that one exists, what is the nature of that association, and how can it be quantified, or modeled via Y = f (X)?
Correlation Coefficient
measures the strength of linear association
between X and Y
But what if the two variables – say, X and Y – are numerical measurements?
For this example, r = –0.387(weak, negative linear correl)
For this example, r = –0.387(weak, negative linear correl)
residuals
Overview of Biostatistical MethodsAs seen, testing for association between categorical variables – such as disease D and exposure E – can generally be done via a Chi-squared Test.
Furthermore, if sample data does suggest that one exists, what is the nature of that association, and how can it be quantified, or modeled via Y = f (X)?
But what if the two variables – say, X and Y – are numerical measurements?
Want the unique line that minimizes the sum of the squared residuals.
Simple Linear Regression gives the “best” line
that fits the data.
Regression Methods
?
For this example, r = –0.387(weak, negative linear correl) For this example, r = –0.387(weak, negative linear correl)
Y = 8.790 – 4.733 X (p = .0055)
residuals
Overview of Biostatistical MethodsAs seen, testing for association between categorical variables – such as disease D and exposure E – can generally be done via a Chi-squared Test.
Furthermore, if sample data does suggest that one exists, what is the nature of that association, and how can it be quantified, or modeled via Y = f (X)?
Regression Methods
But what if the two variables – say, X and Y – are numerical measurements?
Want the unique line that minimizes the sum of the squared residuals.
Simple Linear Regression gives the “least squares”
regression line.
It can also be shown that the proportion of total variability in the data that is accounted for by the line is equal to r 2, which in this case, = (–0.387)2 = 0.1497 (15%)... very small.
Overview of Biostatistical Methods
Extensions of Simple Linear Regression
• Polynomial Regression – predictors X, X2, X3,…
• Multilinear Regression – independent predictors X1, X2,
…
w/o or w/ interaction (e.g., X5 X8)
• Logistic Regression – binary response Y (= 0 or 1)
• Transformations of data, e.g., semi-log, log-log,…
• Generalized Linear Models
• Nonlinear Models
• many more…
Numerical (Quantitative) e.g., $ Annual Income
Summary ~2 POPULATIONS:
H0: 1 = 2
Normally distributed?YesNo
Wilcoxon Rank Sum (aka Mann-Whitney U)
2-sample T (w/o pooling)
Yes1 2, 30?n n
“NonparametricTests”
No
Yes No
2-sample T (w/ pooling)
Equivariance?
• Satterwaithe
• Welch
“Approximate” T
• Q-Q plots• Shapiro-Wilk• Anderson-
Darling• others…
• F-test• Bartlett•
others…
2 POPULATIONS:
• ANOVA F-test• Regression Methods
Kruskal-Wallis
Various modifications
X
σ1 σ2
1 2
Independent e.g., RCT
Paired (Matched) e.g., Pre- vs. Post-
Sample 1
1 1 1, ,n x sSample 2
2 2 2, ,n x s
Yes No
• Sign Test• Wilcoxon Signed Rank
“NonparametricTests”
Paired T
ANOVA F-test(w/ “repeated measures”or “blocking”)
• Friedman• Kendall’s
W• others…
Categorical (Qualitative) e.g., Income Level: Low, Mid, High
Summary ~
2 CATEGORIES per each of two variables:
J
1 2 3 • • • c
I
1
2
3
•• •
etc.
r
H0: “There is no association between (the categories of) I and
(the categories of) J.” r × c contingency table
Chi-squared Tests
Test of Independence (1 population, 2 categorical variables)
Test of Homogeneity (2 populations, 1 categorical variable)
“Goodness-of-Fit” Test (1 population, 1 categorical variable)
Modifications• McNemar Test for paired
2 × 2 categorical data, to control for “confounding variables” e.g., case-control studies
• Fisher’s Exact Test for small “expected values” (< 5) to avoid possible “spurious significance”
Introduction to Basic Statistical Methods
Part 1: Statistics in a Nutshell
UWHC Scholarly ForumMay 21, 2014
Ismor Fischer, Ph.D.UW Dept of [email protected]
Part 2: Overview of Biostatistics: “Which Test Do I Use??” Sincere thanks to…
• Judith Payne
• Heidi Miller
• Samantha Goodrich
• Troy Lawrence
• YOU! All slides posted at http://www.stat.wisc.edu/~ifischer/Intro_Stat/UWHC