biostat6651_lecture3_rev1

7/27/2019 Biostat6651_Lecture3_rev1

1/26

Categorical Data

Analysis

Biostat 6651 Lecture 3

Fall 2013 02 Oct 2013

Dr. Lynn Eudey


2/26

Contingency Tables

Joint, marginal and conditional

probabilities

Independence

Differences in Proportions

Odds, Relative Risk, Odds Ratio

Sensitivity and Specificity

Types of Studies


3/26

Joint, marginal, and conditional

probabilitiesGiven a table of joint probabilities (or the population

proportions of the cells) the row and column totals are the

respective marginal probabilities

Column1 Column2 Column3 MarginalRow1

11 12 13 1+Row2

21 22 23 2+Row3

31 32 33 3+Margin

+1 +2 +3 1.00


4/26

Joint, marginal, and conditional

probabilitiesConditional probabilities are found by dividing the appropriate joint

probability by the appropriate cell probability.

P(Col j | Row i) = iji+and P(Row i | Col j ) = ij+j

Column1 Column2 Column3 MarginalRow1

11 12 13 1+Row2

21 22 23 2+Row3

31 32 33 3+Margin

+1 +2 +3 1.00


5/26

Independence

Independence of the two categorical

variables happens if and only if

P(col j | row i) = P(col j) for each & every rowand

P(row i | col j) = P( row i) for each & every col

ij =

i+

+j

(i,j) pair (

ij cell)Notation: for each and every

is defined as


6/26

Tests for Independence

Large sample and cells populated

Chi-square test see lecture 2

Low cell frequency or sparse table

Chi-square test does not apply

If only a few zeros, can add 0.5 to each and

every cell frequency and proceed If many levels to each factor (or variable) then

we can combine levels to increase frequency


7/26


Small sample 2 by 2 contingency table

Fishers exact test

Uses the hypergeometric distribution of the cellcounts given fixed marginal totals

Gives an exactp-value for given margins

Can implement in SAS, R, StatExact


8/26



Fishers exact test

Only 1 degree of freedom (once you know n11allother cell counts are determined from the marginal

totals

p-value =

P(of a result as extreme or more extreme towards HA | marginal totals)


9/26



Fishers exact test

P(X11 = n11 | margins ) =

+

++

+

Figure what is more extreme with respect to HA,

either > n11 or < n11and find correspondingprobabilities

Add these probabilities to getp-value


10/26

Look at 2 x 2 Tables

Success FailureRow 1 11 12 1+Row 2 21 22 2+

+1 +2 1.00

Table of joint probabilities and marginal probabilities

Table of within row probabilities (conditional probabilities)

Success FailureRow 1 S|1 F|1 1.00

Row 2 S|2 F|2 1.00The column totals make no sense

Often the rows are risk categories.

For example Seat Belt versus No Seat Belt


11/26

Comparing Probabilities

Differences

Relative Risks

Odds Ratios


12/26

Difference between conditional

probabilities of success Estimation ofS|1 - S|2

Use mles for point estimates |=

+

SE(S|1 - S|2) see chalkboard For large samples (S|1 - S|2) has an

approximate Normal distribution

Create Wald-type confidence intervals

H0: S|1 = S|2 then use Z =S|1

S|2


13/26


Differences

Relative Risks

Odds Ratios


14/26

Relative Risks Relative Risk for a 2 by 2 Table

RR(Success) =(| 1)

2)

Gives a multiplicative comparison of the(conditional) probabilities of success

Point Estimate =

+

+

Distribution is very skewed; take natural log

See Problem 2.15


15/26


Differences

Relative Risks

Odds Ratios


16/26

Odds and Odds Ratios

Univariate binary variable

(Success or Failure)

Odds of Success =()

()=

()

1()


17/26

Odds and Odds Ratios

In a two by two table

OddsRow1 =

(|1)

(|1)

OddsRow2 =(|2)

(|2)

Odds Ratio = =OddsRow1OddsRow2


18/26

Properties of

0

=

= 1 if Success/Failure is independent of

row categorization

has the same value if rows and columnsare switched; might be 1


19/26

Inference for

=

Distribution ofis very skewed; half the values arebetween 0 and 1, half the values are > 1

Take natural log: ln() For large samples ln() ~ N(ln(), Var(ln()))

SE(ln()) = 1

1

1

1


20/26

Inference for

ln() = 0 when variables are independent

Wald-type confidence intervals for ln()

Then exponentiate the endpoints to get a

confidence interval for

Use Z-score for testing independence

=

ln 0

ln()


21/26

Sensitivity and Specificity

Disease is

PresentNo Disease

Test is Positive P(D & +) P(ND & +) P(+)Test is Negative P(D & ) P(ND &) P()

P(D) P(ND)

Sensitivity = P(+ | D)

Specificity = P( | ND)What do patients want to know?


22/26

Types of studies

Observational Studies both the

independent and dependent variable are

observed can study association only

Experiments- the independent variable is

assigned by the investigator Optimally this is a random assignment to minimize

bias

Part of an argument for causation


23/26

Observational Studies

Cross-sectional

One point in time; cross-section of the population

Neither of the marginal totals are fixed

Cohort studies

Follow risk groups through time to observe whether or not

indication of interest occurs

Prospective study

Risk categories often have fixed totals (rows)

Case-control studies Match controls to cases

Retrospective study

The disease categories have fixed totals (columns)


24/26


25/26

Relative Risk, Odds Ratios

and Types of Studies Relative Risk can be estimated for

Cross-sectional

Cohort studies

Experiments

Odds Ratios can be estimated for

All of the above

Case-Control studies


26/26

HW 2 due Oct 9th

Chapter 2

2.1, 2.2, 2.6, 2.7, 2.12, 2.15, 2.16, 2.30

Read the rest of chapter 2

biostat6651_lecture3_rev1

Documents