biostat6651_lecture3_rev1

Upload: hayward2sac

Post on 14-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Biostat6651_Lecture3_rev1

    1/26

    Categorical Data

    Analysis

    Biostat 6651 Lecture 3

    Fall 2013 02 Oct 2013

    Dr. Lynn Eudey

  • 7/27/2019 Biostat6651_Lecture3_rev1

    2/26

    Contingency Tables

    Joint, marginal and conditional

    probabilities

    Independence

    Differences in Proportions

    Odds, Relative Risk, Odds Ratio

    Sensitivity and Specificity

    Types of Studies

  • 7/27/2019 Biostat6651_Lecture3_rev1

    3/26

    Joint, marginal, and conditional

    probabilitiesGiven a table of joint probabilities (or the population

    proportions of the cells) the row and column totals are the

    respective marginal probabilities

    Column1 Column2 Column3 MarginalRow1

    11 12 13 1+Row2

    21 22 23 2+Row3

    31 32 33 3+Margin

    +1 +2 +3 1.00

  • 7/27/2019 Biostat6651_Lecture3_rev1

    4/26

    Joint, marginal, and conditional

    probabilitiesConditional probabilities are found by dividing the appropriate joint

    probability by the appropriate cell probability.

    P(Col j | Row i) = iji+and P(Row i | Col j ) = ij+j

    Column1 Column2 Column3 MarginalRow1

    11 12 13 1+Row2

    21 22 23 2+Row3

    31 32 33 3+Margin

    +1 +2 +3 1.00

  • 7/27/2019 Biostat6651_Lecture3_rev1

    5/26

    Independence

    Independence of the two categorical

    variables happens if and only if

    P(col j | row i) = P(col j) for each & every rowand

    P(row i | col j) = P( row i) for each & every col

    ij =

    i+

    +j

    (i,j) pair (

    ij cell)Notation: for each and every

    is defined as

  • 7/27/2019 Biostat6651_Lecture3_rev1

    6/26

    Tests for Independence

    Large sample and cells populated

    Chi-square test see lecture 2

    Low cell frequency or sparse table

    Chi-square test does not apply

    If only a few zeros, can add 0.5 to each and

    every cell frequency and proceed If many levels to each factor (or variable) then

    we can combine levels to increase frequency

  • 7/27/2019 Biostat6651_Lecture3_rev1

    7/26

    Tests for Independence

    Small sample 2 by 2 contingency table

    Fishers exact test

    Uses the hypergeometric distribution of the cellcounts given fixed marginal totals

    Gives an exactp-value for given margins

    Can implement in SAS, R, StatExact

  • 7/27/2019 Biostat6651_Lecture3_rev1

    8/26

    Tests for Independence

    Small sample 2 by 2 contingency table

    Fishers exact test

    Only 1 degree of freedom (once you know n11allother cell counts are determined from the marginal

    totals

    p-value =

    P(of a result as extreme or more extreme towards HA | marginal totals)

  • 7/27/2019 Biostat6651_Lecture3_rev1

    9/26

    Tests for Independence

    Small sample 2 by 2 contingency table

    Fishers exact test

    P(X11 = n11 | margins ) =

    +

    ++

    +

    Figure what is more extreme with respect to HA,

    either > n11 or < n11and find correspondingprobabilities

    Add these probabilities to getp-value

  • 7/27/2019 Biostat6651_Lecture3_rev1

    10/26

    Look at 2 x 2 Tables

    Success FailureRow 1 11 12 1+Row 2 21 22 2+

    +1 +2 1.00

    Table of joint probabilities and marginal probabilities

    Table of within row probabilities (conditional probabilities)

    Success FailureRow 1 S|1 F|1 1.00

    Row 2 S|2 F|2 1.00The column totals make no sense

    Often the rows are risk categories.

    For example Seat Belt versus No Seat Belt

  • 7/27/2019 Biostat6651_Lecture3_rev1

    11/26

    Comparing Probabilities

    Differences

    Relative Risks

    Odds Ratios

  • 7/27/2019 Biostat6651_Lecture3_rev1

    12/26

    Difference between conditional

    probabilities of success Estimation ofS|1 - S|2

    Use mles for point estimates |=

    +

    SE(S|1 - S|2) see chalkboard For large samples (S|1 - S|2) has an

    approximate Normal distribution

    Create Wald-type confidence intervals

    H0: S|1 = S|2 then use Z =S|1

    S|2

  • 7/27/2019 Biostat6651_Lecture3_rev1

    13/26

    Comparing Probabilities

    Differences

    Relative Risks

    Odds Ratios

  • 7/27/2019 Biostat6651_Lecture3_rev1

    14/26

    Relative Risks Relative Risk for a 2 by 2 Table

    RR(Success) =(| 1)

    2)

    Gives a multiplicative comparison of the(conditional) probabilities of success

    Point Estimate =

    +

    +

    Distribution is very skewed; take natural log

    See Problem 2.15

  • 7/27/2019 Biostat6651_Lecture3_rev1

    15/26

    Comparing Probabilities

    Differences

    Relative Risks

    Odds Ratios

  • 7/27/2019 Biostat6651_Lecture3_rev1

    16/26

    Odds and Odds Ratios

    Univariate binary variable

    (Success or Failure)

    Odds of Success =()

    ()=

    ()

    1()

  • 7/27/2019 Biostat6651_Lecture3_rev1

    17/26

    Odds and Odds Ratios

    In a two by two table

    OddsRow1 =

    (|1)

    (|1)

    OddsRow2 =(|2)

    (|2)

    Odds Ratio = =OddsRow1OddsRow2

  • 7/27/2019 Biostat6651_Lecture3_rev1

    18/26

    Properties of

    0

    =

    = 1 if Success/Failure is independent of

    row categorization

    has the same value if rows and columnsare switched; might be 1

  • 7/27/2019 Biostat6651_Lecture3_rev1

    19/26

    Inference for

    =

    Distribution ofis very skewed; half the values arebetween 0 and 1, half the values are > 1

    Take natural log: ln() For large samples ln() ~ N(ln(), Var(ln()))

    SE(ln()) = 1

    1

    1

    1

  • 7/27/2019 Biostat6651_Lecture3_rev1

    20/26

    Inference for

    ln() = 0 when variables are independent

    Wald-type confidence intervals for ln()

    Then exponentiate the endpoints to get a

    confidence interval for

    Use Z-score for testing independence

    =

    ln 0

    ln()

  • 7/27/2019 Biostat6651_Lecture3_rev1

    21/26

    Sensitivity and Specificity

    Disease is

    PresentNo Disease

    Test is Positive P(D & +) P(ND & +) P(+)Test is Negative P(D & ) P(ND &) P()

    P(D) P(ND)

    Sensitivity = P(+ | D)

    Specificity = P( | ND)What do patients want to know?

  • 7/27/2019 Biostat6651_Lecture3_rev1

    22/26

    Types of studies

    Observational Studies both the

    independent and dependent variable are

    observed can study association only

    Experiments- the independent variable is

    assigned by the investigator Optimally this is a random assignment to minimize

    bias

    Part of an argument for causation

  • 7/27/2019 Biostat6651_Lecture3_rev1

    23/26

    Observational Studies

    Cross-sectional

    One point in time; cross-section of the population

    Neither of the marginal totals are fixed

    Cohort studies

    Follow risk groups through time to observe whether or not

    indication of interest occurs

    Prospective study

    Risk categories often have fixed totals (rows)

    Case-control studies Match controls to cases

    Retrospective study

    The disease categories have fixed totals (columns)

  • 7/27/2019 Biostat6651_Lecture3_rev1

    24/26

  • 7/27/2019 Biostat6651_Lecture3_rev1

    25/26

    Relative Risk, Odds Ratios

    and Types of Studies Relative Risk can be estimated for

    Cross-sectional

    Cohort studies

    Experiments

    Odds Ratios can be estimated for

    All of the above

    Case-Control studies

  • 7/27/2019 Biostat6651_Lecture3_rev1

    26/26

    HW 2 due Oct 9th

    Chapter 2

    2.1, 2.2, 2.6, 2.7, 2.12, 2.15, 2.16, 2.30

    Read the rest of chapter 2