biostat6651_lecture3_rev1
TRANSCRIPT
-
7/27/2019 Biostat6651_Lecture3_rev1
1/26
Categorical Data
Analysis
Biostat 6651 Lecture 3
Fall 2013 02 Oct 2013
Dr. Lynn Eudey
-
7/27/2019 Biostat6651_Lecture3_rev1
2/26
Contingency Tables
Joint, marginal and conditional
probabilities
Independence
Differences in Proportions
Odds, Relative Risk, Odds Ratio
Sensitivity and Specificity
Types of Studies
-
7/27/2019 Biostat6651_Lecture3_rev1
3/26
Joint, marginal, and conditional
probabilitiesGiven a table of joint probabilities (or the population
proportions of the cells) the row and column totals are the
respective marginal probabilities
Column1 Column2 Column3 MarginalRow1
11 12 13 1+Row2
21 22 23 2+Row3
31 32 33 3+Margin
+1 +2 +3 1.00
-
7/27/2019 Biostat6651_Lecture3_rev1
4/26
Joint, marginal, and conditional
probabilitiesConditional probabilities are found by dividing the appropriate joint
probability by the appropriate cell probability.
P(Col j | Row i) = iji+and P(Row i | Col j ) = ij+j
Column1 Column2 Column3 MarginalRow1
11 12 13 1+Row2
21 22 23 2+Row3
31 32 33 3+Margin
+1 +2 +3 1.00
-
7/27/2019 Biostat6651_Lecture3_rev1
5/26
Independence
Independence of the two categorical
variables happens if and only if
P(col j | row i) = P(col j) for each & every rowand
P(row i | col j) = P( row i) for each & every col
ij =
i+
+j
(i,j) pair (
ij cell)Notation: for each and every
is defined as
-
7/27/2019 Biostat6651_Lecture3_rev1
6/26
Tests for Independence
Large sample and cells populated
Chi-square test see lecture 2
Low cell frequency or sparse table
Chi-square test does not apply
If only a few zeros, can add 0.5 to each and
every cell frequency and proceed If many levels to each factor (or variable) then
we can combine levels to increase frequency
-
7/27/2019 Biostat6651_Lecture3_rev1
7/26
Tests for Independence
Small sample 2 by 2 contingency table
Fishers exact test
Uses the hypergeometric distribution of the cellcounts given fixed marginal totals
Gives an exactp-value for given margins
Can implement in SAS, R, StatExact
-
7/27/2019 Biostat6651_Lecture3_rev1
8/26
Tests for Independence
Small sample 2 by 2 contingency table
Fishers exact test
Only 1 degree of freedom (once you know n11allother cell counts are determined from the marginal
totals
p-value =
P(of a result as extreme or more extreme towards HA | marginal totals)
-
7/27/2019 Biostat6651_Lecture3_rev1
9/26
Tests for Independence
Small sample 2 by 2 contingency table
Fishers exact test
P(X11 = n11 | margins ) =
+
++
+
Figure what is more extreme with respect to HA,
either > n11 or < n11and find correspondingprobabilities
Add these probabilities to getp-value
-
7/27/2019 Biostat6651_Lecture3_rev1
10/26
Look at 2 x 2 Tables
Success FailureRow 1 11 12 1+Row 2 21 22 2+
+1 +2 1.00
Table of joint probabilities and marginal probabilities
Table of within row probabilities (conditional probabilities)
Success FailureRow 1 S|1 F|1 1.00
Row 2 S|2 F|2 1.00The column totals make no sense
Often the rows are risk categories.
For example Seat Belt versus No Seat Belt
-
7/27/2019 Biostat6651_Lecture3_rev1
11/26
Comparing Probabilities
Differences
Relative Risks
Odds Ratios
-
7/27/2019 Biostat6651_Lecture3_rev1
12/26
Difference between conditional
probabilities of success Estimation ofS|1 - S|2
Use mles for point estimates |=
+
SE(S|1 - S|2) see chalkboard For large samples (S|1 - S|2) has an
approximate Normal distribution
Create Wald-type confidence intervals
H0: S|1 = S|2 then use Z =S|1
S|2
-
7/27/2019 Biostat6651_Lecture3_rev1
13/26
Comparing Probabilities
Differences
Relative Risks
Odds Ratios
-
7/27/2019 Biostat6651_Lecture3_rev1
14/26
Relative Risks Relative Risk for a 2 by 2 Table
RR(Success) =(| 1)
2)
Gives a multiplicative comparison of the(conditional) probabilities of success
Point Estimate =
+
+
Distribution is very skewed; take natural log
See Problem 2.15
-
7/27/2019 Biostat6651_Lecture3_rev1
15/26
Comparing Probabilities
Differences
Relative Risks
Odds Ratios
-
7/27/2019 Biostat6651_Lecture3_rev1
16/26
Odds and Odds Ratios
Univariate binary variable
(Success or Failure)
Odds of Success =()
()=
()
1()
-
7/27/2019 Biostat6651_Lecture3_rev1
17/26
Odds and Odds Ratios
In a two by two table
OddsRow1 =
(|1)
(|1)
OddsRow2 =(|2)
(|2)
Odds Ratio = =OddsRow1OddsRow2
-
7/27/2019 Biostat6651_Lecture3_rev1
18/26
Properties of
0
=
= 1 if Success/Failure is independent of
row categorization
has the same value if rows and columnsare switched; might be 1
-
7/27/2019 Biostat6651_Lecture3_rev1
19/26
Inference for
=
Distribution ofis very skewed; half the values arebetween 0 and 1, half the values are > 1
Take natural log: ln() For large samples ln() ~ N(ln(), Var(ln()))
SE(ln()) = 1
1
1
1
-
7/27/2019 Biostat6651_Lecture3_rev1
20/26
Inference for
ln() = 0 when variables are independent
Wald-type confidence intervals for ln()
Then exponentiate the endpoints to get a
confidence interval for
Use Z-score for testing independence
=
ln 0
ln()
-
7/27/2019 Biostat6651_Lecture3_rev1
21/26
Sensitivity and Specificity
Disease is
PresentNo Disease
Test is Positive P(D & +) P(ND & +) P(+)Test is Negative P(D & ) P(ND &) P()
P(D) P(ND)
Sensitivity = P(+ | D)
Specificity = P( | ND)What do patients want to know?
-
7/27/2019 Biostat6651_Lecture3_rev1
22/26
Types of studies
Observational Studies both the
independent and dependent variable are
observed can study association only
Experiments- the independent variable is
assigned by the investigator Optimally this is a random assignment to minimize
bias
Part of an argument for causation
-
7/27/2019 Biostat6651_Lecture3_rev1
23/26
Observational Studies
Cross-sectional
One point in time; cross-section of the population
Neither of the marginal totals are fixed
Cohort studies
Follow risk groups through time to observe whether or not
indication of interest occurs
Prospective study
Risk categories often have fixed totals (rows)
Case-control studies Match controls to cases
Retrospective study
The disease categories have fixed totals (columns)
-
7/27/2019 Biostat6651_Lecture3_rev1
24/26
-
7/27/2019 Biostat6651_Lecture3_rev1
25/26
Relative Risk, Odds Ratios
and Types of Studies Relative Risk can be estimated for
Cross-sectional
Cohort studies
Experiments
Odds Ratios can be estimated for
All of the above
Case-Control studies
-
7/27/2019 Biostat6651_Lecture3_rev1
26/26
HW 2 due Oct 9th
Chapter 2
2.1, 2.2, 2.6, 2.7, 2.12, 2.15, 2.16, 2.30
Read the rest of chapter 2