linear models iii thursday may 31, 10:15-12:00 deborah rosenberg, phd research associate professor...

63
Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University of IL School of Public Health Training Course in MCH Epidemiology

Upload: david-allison

Post on 17-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

Linear Models IIIThursday May 31, 10:15-12:00

Deborah Rosenberg, PhDResearch Associate ProfessorDivision of Epidemiology and BiostatisticsUniversity of IL School of Public Health

Training Course in MCH Epidemiology

Page 2: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

2

Ordinal and Nominal Outcomes

Outcomes with More than 2 Categories

Examples of Outcomes which might be suited for ordinal or nominal regression:

Ordinal or Nominal bmi categories Nominal cause of death categories Ordinal or nominal severity of illness categories Ordinal or nominal categories of program

participation

Page 3: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

3

Ordinal and Nominal Outcomes

The Cumulative Logit Model

The primary motivation for using a logistic model with an ordinal outcome is to accommodate a truly ordinal variable that has a "ceiling" and "floor" effect and one in which the intervals between each response category can be somewhat arbitrary —that is, it is not a continuous variable.

Modeling an ordinal outcome as a continuous variable can yield biased results because it will yield predicted values outside the range of the ordinal variable.

Page 4: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

4

Ordinal and Nominal Outcomes

The Cumulative Logit Model

An ordered outcome may reflect an underlying continuous variable for which we have no data or for which we don't know the "real" threshold values.

For example, a Likert scale for satisfaction—very dissatisfied to very satisfied—or for agreement—strongly disagree to strongly agree—has response categories reflecting a continuous scale for which there is no data.

Page 5: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

55

Modeling Ordinal Outcomes

Some other ordinal variables that may reflect an underlying continuous construct that cannot be measured as such. The ordered values are intended to reflect distinct threshold values.

Examples of ordinal variables of this type:

access to care index reports of experience of life stress assessment of overall health status satisfaction with care

Page 6: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

6

Ordinal and Nominal Outcomes

The Cumulative Logit Model

To appropriately model an outcome as ordinal, the proportional odds assumption must hold.

The proportional odds assumption:

if an independent variable increases (or decreases) the odds of being in category 1 v. the remaining categories, then it also similarly increases (or decreases) the odds of being in category 2 and 1 combined v. the remaining categories, in categories 3, 2, and 1 combined v. the remaining categories, etc.

Page 7: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

7

Ordinal and Nominal Outcomes

The Cumulative Logit Model

The null hypothesis for the proportional odds assumption is that the odds ratios for the association between a risk factor and an ordinal outcome are constant regardless of how the category boundaries are drawn.

If the proportional odds assumption holds, then the association between an independent variable and the outcome can be expressed as a single summary estimate—a common odds ratio—across all categories.

Page 8: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

8

Ordinal and Nominal Outcomes

The Cumulative Logit Model

The proportional odds assumption can be tested with a chi-square statistic – a score test. A nonsignificant result means that the null hypothesis will not be rejected and that the cumulative logit model is appropriate; a significant result means that the proportional odds assumption may not hold.

Page 9: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

9

Ordinal and Nominal Outcomes

The Cumulative Logit Model:For an ordered outcome with k categories

Both the numerator and denominator change

http://www.indiana.edu/%7Estatmath/stat/all/cat/2b1.html

1k...21

1k...211k...21

21

2121

1

11

p1

plnOddsln

p1

plnOddsln

p1

plnOddsln

Page 10: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

10

Odds Among the exposed = a / b+c+d

Odds Among the exposed = a+b / c+d

Odds Among the exposed = a+b+c / d

Ordinal Outcome Variable Risk Factor 1 2 3 4 Total

Yes a b c d No e f g h

Ordinal Outcome Variable Risk Factor 1 2 3 4 Total

Yes a b c d No e f g h

Ordinal Outcome Variable Risk Factor 1 2 3 4 Total

Yes a b c d No e f g h

Ordinal and Nominal Outcomes

Page 11: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

11

Ordinal and Nominal Outcomes

The Cumulative Logit Model

Given k categories of an ordered outcome variable, a cumulative logit model yields k-1 intercept terms. Each intercept corresponds to a category combined with all adjacent lower-ordered categories.

Since proportional odds are assumed, and therefore a common odds ratio, the effect of each covariate is reflected in a single beta coefficient.

Page 12: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

12

Ordinal and Nominal Outcomes

The Cumulative Logit Model

Suppose an outcome variable has 4 categories and we are modeling one independent variable. The cumulative logit model will look as follows:

ln(Odds) = b0,1 + b0,12 + b0,123 + b1

The odds ratio is the same regardless of category:

1

11,0

11,0b

0bb

1bb

ee

e

1

112,0

112,0b

0bb

1bb

ee

e

1

1123,0

1123,0b

0bb

1bb

ee

e

Page 13: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

13

Ordinal and Nominal Outcomes

A stratified approach to mimic a cumulative logit model for a 4 category variable, would mean creating new dichotomous variables something like the following:

if ordvar = 1 then ordvar1 = 1; else if ordvar ^= . then ordvar1 = 0;if 1<=ordvar<=2 then ordvar2 = 1; else if ordvar ^= . then ordvar2 = 0;if 1<=ordvar<=3 then ordvar3 = 1; else if ordvar ^= . then ordvar3 = 0;

Page 14: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

14

Ordinal and Nominal Outcomes

Mimicking Cumulative Logit with Binary Logistic Models

proc logistic; The OR from each model model ordvar1 = factors; will be approx. the same if run; the proportional oddsproc logistic; assumption holds. model ordvar2 = factors;run;proc logistic; Note that all observations model ordvar3 = factors; are used in each model.run;

Page 15: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

15

Ordinal and Nominal Outcomes

The Cumulative Logit Model

If the proportional odds assumption does not hold, it might be because the outcome variable is nominal rather than ordinal, or it might be that we have mis-specified the categories, failing to pinpoint important thresholds on the underlying continuum.

The score test is quite sensitive—it is up to the analyst to examine the pattern of ORs for different dichotomous cutpoints and decide whether it is reasonable to use a cumulative logit model.

Page 16: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

16

Ordinal and Nominal Outcomes

The Generalized Logit Model

In contrast to the cumulative logit model, in a generalized logit model, the outcome categories are like dummy variables—mutually exclusive categories compared to a common reference group.

Page 17: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

17

Ordinal and Nominal Outcomes

The Generalized Logit Model:For a nominal outcome with k categories

Fixed denominator (reference category)http://www.indiana.edu/%7Estatmath/stat/all/cat/2b1.html

1k...21

1k1k

1k...21

22

1k...21

11

p1

plnOddsln

p1

plnOddsln

p1

plnOddsln

Page 18: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

18

Ordinal and Nominal Outcomes

Odds Among the exposed = a / d

Odds Among the exposed = b / d

Odds Among the exposed = c / d

Nominal Outcome Variable Risk Factor 4 3 2 1 Total

Yes a b c d No e f g h

Nominal Outcome Variable Risk Factor 4 3 2 1 Total

Yes a b c d No e f g h

Nominal Outcome Variable Risk Factor 4 3 2 1 Total

Yes a b c d No e f g h

Page 19: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

19

Ordinal and Nominal Outcomes

The Generalized Logit Model

Given k categories of an outcome variable, a generalized logit model yields k-1 intercept terms. Each intercept corresponds to a single category.

Since proportional odds are not assumed, odds ratios can vary across categories, and therefore the effect of each covariate is reflected in k-1 slope parameters.

Page 20: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

20

Ordinal and Nominal Outcomes

The Generalized Logit Model

Suppose an outcome variable has 4 categories and we are modeling one independent variable. The generalized logit model is as follows:

ln(Odds) = b0,1 + b0,2 + b0,3 + b1,1 + b1,2 +b1,3

1. The odds ratios are distinct for each category:

2. 3.

1,1

3,12,11,11,0

3,12,11,11,0b

0b0b0bb

0b0b1bb

ee

e

2,1

3,12,11,12,0

3,12,11,12,0b

0b0b0bb

0b1b0bb

ee

e

3,1

3,12,11,13,0

3,12,11,13,0b

0b0b0bb

1b0b0bb

ee

e

Page 21: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Ordinal and Nominal Outcomes

The Generalized Logit Model

Each slope parameter tests the odds of being in one outcome category compared to the odds of being in the reference category Compared to those without Factor A, individuals with factor A

have ___ times the odds of having the outcomecategory 1; Compared to those without Factor A, individuals with factor A

have ___ times the odds of having the outcomecategory 2; Compared to those without Factor A, individuals with factor A

have ___ times the odds of having the outcomecategory 3;

21

Page 22: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

22

Ordinal and Nominal Outcomes

A stratified approach to mimic generalized logit model for a 4 category variable, would not require creation of new variables, but would mean running models like the following:

Page 23: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

23

Ordinal and Nominal Outcomes

proc logistic; Mimicking Generalized Logit where ordvar in(1,4); with Binary Logistic Models model ordvar = factors;run;proc logistic; The ORs from the where ordvar in(2,4); models will differ. model ordvar = factors;run;proc logistic; Note that different where ordvar in(3,4); subsets of observations model ordvar = factors; are used in each model.run;

Page 24: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

24

Example 1.

The Association of Smoking and Fetal/Infant Death

in Preterm Deliveries

Crude OR=1.07

Frequency|Smoking and Mortality Percent |Dichotomous Outcome Row Pct | Col Pct |fetal or|survivor| Total | neonata| >=28 da| |l death |ys | ---------+--------+--------+ yes | 79 | 1135 | 1214 | 0.87 | 12.50 | 13.37 | 6.51 | 93.49 | | 14.08 | 13.32 | ---------+--------+--------+ no | 482 | 7385 | 7867 | 5.31 | 81.32 | 86.63 | 6.13 | 93.87 | | 85.92 | 86.68 | ---------+--------+--------+ Total 561 8520 9081 6.18 93.82 100.00

Page 25: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

25

Example 1.

The Association of Smoking and Fetal/Infant Death in Preterm Deliveries

Crude Logistic Model with Dichotomous Outcome

Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.7293 0.0470 3370.3800 <.0001 smoking yes 1 0.0643 0.1255 0.2627 0.6083 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits smoking yes vs no 1.066 0.834 1.364

Page 26: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

26

Cumulative Logit: Odds of type of death among smokers and the OR for smoker v. nonsmoker

Odds=46 / (33+1135)=0.04 Odds=(46+33) / 1135=0.07OR = 1.04 OR = 1.07

Frequency| Smoking and Mortality Percent | 3 Categories Row Pct | Col Pct | fetal d|neonatal|survivor| Total |eath >=2| death 0| >=28 da| |0 wks |-28 days|ys | ---------+--------+--------+--------+ yes | 46 | 33 | 1135 | 1214 | 0.51 | 0.36 | 12.50 | 13.37 | 3.79 | 2.72 | 93.49 | | 13.86 | 14.41 | 13.32 | ---------+--------+--------+--------+ no | 286 | 196 | 7385 | 7867 | 3.15 | 2.16 | 81.32 | 86.63 | 3.64 | 2.49 | 93.87 | | 86.14 | 85.59 | 86.68 | ---------+--------+--------+--------+ Total 332 229 8520 9081 3.66 2.52 93.82 100.00

Frequency| Smoking and Mortality Percent | 3 Categories Row Pct | Col Pct | fetal d|neonatal|survivor| Total |eath >=2| death 0| >=28 da| |0 wks |-28 days|ys | ---------+--------+--------+--------+ yes | 46 | 33 | 1135 | 1214 | 0.51 | 0.36 | 12.50 | 13.37 | 3.79 | 2.72 | 93.49 | | 13.86 | 14.41 | 13.32 | ---------+--------+--------+--------+ no | 286 | 196 | 7385 | 7867 | 3.15 | 2.16 | 81.32 | 86.63 | 3.64 | 2.49 | 93.87 | | 86.14 | 85.59 | 86.68 | ---------+--------+--------+--------+ Total 332 229 8520 9081 3.66 2.52 93.82 100.00

Example 1.

Page 27: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

27

Example 1.

Cumulative Logit Model with 3 Categories

Ordered Value outcome5 Frequency 1 fetal death >=20 wks 332 2 neonatal death 0-28 days 229 3 survivor >=28 days 8520

Probabilities modeled are cumulated over the lower Ordered Values.

Score Test for the Proportional Odds AssumptionChi-Square DF Pr > ChiSq The proportional 0.0400 1 0.8414 odds assumption holds

Page 28: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

28

Example 1.

Cumulative Logit: Each intercept corresponds to a category plus all categories with lower ordered values v. the remaining categories.

The odds ratio is an ‘average’ of the cumulative logits

46 / (33+1135) = e-3.2803+0.0635 = 0.04 (46+33) / 1135 = e-2.7291+0.0635 = 0.07

Page 29: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

29

Example 1.

Generalized Logit Model with 3 Categories

In a generalized logit model, each intercept and slope correspond to a single category.

Is 1.07 a reasonable summary of 1.047 and 1.096?

Page 30: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

30

Example 2.

The Association of Maternal Risk and Fetal/Infant Death in Preterm Deliveries

Frequency| Matern Risk and Mortality Percent | 3 Categories Row Pct | Col Pct | fetal d|neonatal|survivor| Total |eath >=2| death 0| >=28 da| |0 wks |-28 days|ys | ---------+--------+--------+--------+ yes | 153 | 129 | 3836 | 4118 | 1.50 | 1.26 | 37.50 | 40.26 | 3.72 | 3.13 | 93.15 | | 36.60 | 49.43 | 40.17 | ---------+--------+--------+--------+ no | 265 | 132 | 5713 | 6110 | 2.59 | 1.29 | 55.86 | 59.74 | 4.34 | 2.16 | 93.50 | | 63.40 | 50.57 | 59.83 | ---------+--------+--------+--------+ Total 418 261 9549 10228 4.09 2.55 93.36 100.00

Page 31: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

31

Example 2.

The Association of Maternal Risk and Fetal/Infant Death in Preterm Deliveries

Crude Logistic Model with Dichotomous Outcome

Page 32: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

32

Example 2.

Cumulative Logit Model with 3 Categories

Ordered Value outcome5 Frequency

1 fetal death >=20 wks 418

2 neonatal death 0-28 days 261

3 survivor >=28 days 9549

Probabilities modeled are cumulated over the lower Ordered Values.

Score Test for the Proportional Odds Assumption

Chi-Square DF Pr > ChiSq The proportional

10.7077 1 0.0011 odds assumption

does not hold.

Page 33: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

33

Example 2.

Cumulative Logit Model with 3 Categories

The odds ratio is an ‘average’ of the cumulative logits

e-3.1750+0.0473 = 0.04e-2.6629+0.0473 = 0.07

Page 34: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

34

Example 2.

Generalized Logit Model with 3 Categories

Is 1.048 a reasonable summary of 0.86 and 1.5?

Page 35: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

35

Example 3. LBW

Modeling a 3 category birthweight variable:

/*cumulative logit */

proc logistic order=formatted;

model bwcat = smoking late_no_pnc;

run;

Page 36: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

36

Example 3. LBW

Page 37: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

37

Example 3. LBW

/*mimicking cumulative logit with binary models*/proc logistic order=formatted; model vlbw = smoking late_no_pnc;run;

vlbw v.mlbw and normal

proc logistic order=formatted; model lbw = smoking late_no_pnc;run;

vlbw and mlbw v.normal

Both models include all observations in the sample

Page 38: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

38

Example 3. LBW

/* generalized logit */

proc logistic order=formatted;

model bwcat(ref='normal bw') = smoking late_no_pnc

/ link=glogit;

run;

Page 39: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

39

Example 3. LBW

vlbw v. normal and mlbw v. normal

Page 40: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

40

Example 3. LBW

/* mimicking generalized logit with binary models*/

proc logistic order=formatted; where bwcat = 2 or bwcat = 0; model bwcat(ref='normal bw') = smoking late_no_pnc / link=glogit;run;

proc logistic order=formatted; where bwcat = 1 or bwcat = 0; model bwcat(ref='normal bw') = smoking late_no_pnc / link=glogit;run;

Page 41: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

41

Example 3. LBW

Generalized logit approach using binary models with only a subset of observations in each model

vlbw v.

normal

mlbw v.

normal

Page 42: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

42

Example 3. LBW

Generalized logit models can get complicated, but custom estimates can still be obtained in the usual way.

proc logistic order=formatted;where 2<=momage<=3;class parityrisk(ref='no hx preterm') / param=ref; model bwcat = smoking late_no_pnc matrisk momage parityrisk smoking*parityrisk / link=glogit; contrast 'sm-risk, hxpreterm' smoking 1 matrisk 1 smoking*parityrisk 1 0 / estimate=exp; contrast 'sm-risk, primips'smoking 1 matrisk 1 smoking*parityrisk 0 1 / estimate=exp; contrast 'sm-risk, lorisk multips' smoking 1 matrisk 1 smoking*parityrisk 0 0 / estimate=exp;run;

Page 43: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

43

Example 3. LBW

The tests for the constructs in the model are all statistically significant:

Page 44: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

44

Example 3. LBW

Not all beta coefficients are statistically significant.

Page 45: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

45

Example 3. LBW

Parity-specific contrasts of the joint effect of smoking and having some antepartum medical risk, adjusting for entry into prenatal care and maternal age.

Should we leave the smoking*parityrisk term in the model?

Page 46: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Example 4. Prenatal Care

Should we consider the categories ordinal or nominal?

46

Table of prevlbw by indexsum

prevlbw indexsum(two factor summary index)

Frequency Row Pct

No Pnc Inadeq Inter Adeq Adeq+

Total

prev lbw 736.34 3.71

3097.6 15.62

2363.3 11.91

5274.7 26.59

8364 42.17

19836

no hx lbw or primip

3315.8 1.18

19576 6.98

33170 11.83

138719 49.46

85667 30.55

280448

Page 47: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Example 4. Prenatal Care

The Overlapping dichotomous ContrastsNo Pnc v. Any PNC, OR = 3.2 Inad/No v. Adeq+/Adeq/Inter,

OR=2.7

Inter/Inad/No v. Adeq+/Adeq, OR=1.8 All others v. Adeq+, OR=0.60

47

prevlbw indexsum(two factor summary index)

Frequency Row Pct

No Pnc Inadeq Inter Adeq Adeq+

prev lbw 736.34 3.71

3097.6 15.62

2363.3 11.91

5274.7 26.59

8364 42.17

no hx lbw or primip

3315.8 1.18

19576 6.98

33170 11.83

138719 49.46

85667 30.55

prevlbw indexsum(two factor summary index)

Frequency Row Pct

No Pnc Inadeq Inter Adeq Adeq+

prev lbw 736.34 3.71

3097.6 15.62

2363.3 11.91

5274.7 26.59

8364 42.17

no hx lbw or primip

3315.8 1.18

19576 6.98

33170 11.83

138719 49.46

85667 30.55

prevlbw indexsum(two factor summary index)

Frequency Row Pct

No Pnc Inadeq Inter Adeq Adeq+

prev lbw 736.34 3.71

3097.6 15.62

2363.3 11.91

5274.7 26.59

8364 42.17

no hx lbw or primip

3315.8 1.18

19576 6.98

33170 11.83

138719 49.46

85667 30.55

prevlbw indexsum(two factor summary index)

Frequency Row Pct

No Pnc Inadeq Inter Adeq Adeq+

prev lbw 736.34 3.71

3097.6 15.62

2363.3 11.91

5274.7 26.59

8364 42.17

no hx lbw or primip

3315.8 1.18

19576 6.98

33170 11.83

138719 49.46

85667 30.55

Page 48: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Example 4. Prenatal Care

Non-overlapping dichotomous contrasts:

48

prevlbw indexsum(two factor summary index)

Frequency Row Pct

Inadeqq Adeq

prev lbw 3097.6 15.62

5274.7 26.59

no hx lbw or primip

19576 6.98

138719 49.46

prevlbw indexsum(two factor summary index)

Frequency Row Pct

No Pnc Adeq

prev lbw 736.34 3.71

5274.7 26.59

no hx lbw or primip

3315.8 1.18

138719 49.46

prevlbw indexsum(two factor summary index)

Frequency Row Pct

Inter Adeq

prev lbw 2363.3 11.91

5274.7 26.59

no hx lbw or primip

33170 11.83

138719 49.46

prevlbw indexsum(two factor summary index)

Frequency Row Pct

Adeq+ Adeq

prev lbw 8364 42.17

5274.7 26.59

no hx lbw or primip

85667 30.55

138719 49.46

Page 49: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Example 4. Prenatal Care

Cumulative Logit: The null hypothesis of proportional odds is rejected.

Any association is obscured by averagingacross levels of APNCU.

49

Score Test for the Proportional Odds Assumption Chi-Square DF Pr > ChiSq 7014.0733 3 <.0001

Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept No PNC 1 -4.2917 0.1749 601.9645 <.0001 Intercept Inadequate 1 -2.3257 0.0701 1101.3880 <.0001 Intercept Intermediate 1 -1.3409 0.0495 732.9840 <.0001 Intercept Adequate 1 0.7857 0.0423 345.3622 <.0001 prevlbw 1 -0.00326 0.1698 0.0004 0.9847

Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits prevlbw 0.997 0.715 1.390

Page 50: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Example 4. Prenatal Care

Generalized

Logit

50

Standard Wald Parameter indexsum DF Estimate Error Chi-Square Pr > ChiSq Intercept No PNC 1 -3.7338 0.2019 342.1621 <.0001 Intercept Inadequate 1 -1.9581 0.0842 541.3514 <.0001 Intercept Intermediate 1 -1.4308 0.0670 455.9302 <.0001 Intercept adequate+ 1 -0.4820 0.0459 110.4236 <.0001 prevlbw No PNC 1 1.7648 0.4114 18.4034 <.0001 prevlbw Inadequate 1 1.4258 0.2606 29.9399 <.0001 prevlbw Intermediate 1 0.6280 0.2691 5.4441 0.0196 prevlbw adequate+ 1 0.9430 0.1861 25.6809 <.0001

Odds Ratio Estimates Point 95% Wald Effect indexsum Estimate Confidence Limits prevlbw No PNC 5.840 2.608 13.080 prevlbw Inadequate 4.161 2.497 6.935 prevlbw Intermediate 1.874 1.106 3.175 prevlbw adequate+ 2.568 1.783 3.698

Page 51: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Example 4. Prenatal Care

Women with a prior lbw delivery had more than 4 times the odds of receiving no or inadequate prenatal care rather than adequate care compared to women with no history of lbw delivery.

Compared to women without a history of lbw delivery, however, these high risk women also had more than twice the odds of appropriately receiving care beyond what is considered adequate for most women.

51

Page 52: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Example 5.

Outcome is a3 level rating of MCH epidemiologyfunctioning:

•above average•average•below average

52

Cumulative Logit Model for the Associations Between Key Features Across Domains and Higher Levels of

MCH Epidemiology Functioning

Odds Ratio

95% CI P

Organizational Position* 2.0 0.8- 4.8 0.14

Agenda-Setting by Consensus 6.1 1.1-34.3 0.04

Agenda-Setting by Consensus Including External Partners

6.6 1.3-33.2 0.02

Total Key Staff with Doctoral Training 2.5 1.3 - 5.0 0.01

Additional Staff: Assignees, Fellows, or Interns

6.4 1.3-32.1 0.03

Routine Data Sharing (internal and external) & Data Integration Occurring

4.0 0.9-18.3 0.07

* Organizational position is the three level ordinal variable: named MCH epidemiology unit, no named unit, but recognized presence, and no or diffuse Effort

Page 53: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Summary: Ordinal and Nominal Outcomes

• Proportional odds assumption—assess the series of binary comparisons from collapsing categories

• k-1 intercepts• 1 slope / 1 odds ratio

• No assumption of the shape of the association

• Categories compared to a reference group

• k-1 intercepts• k-1 slopes / k-1 odds

ratios

53

Cumulative--Ordinal Generalized--Nominal

Page 54: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

54

Summary: Ordinal and Nominal Outcomes

Issues for categorizing an outcome variable are similar to those for defining categories for independent variables:

Conceptual meaning of the categories Statistical tests v. judgment about differences

between categories Sample size and power

Page 55: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

55

Summary: Ordinal and Nominal Outcomes

Model Building

Similar to beginning with examining dummy variables for an independent variable prior to deciding whether to use it in an ordinal form, sometimes it is useful to run a generalized logit model first, since it requires no assumption about the ordering of the categories, and empirically assess whether the variation in category-specific odds ratios is important or negligible.

Page 56: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

56

Summary: Ordinal and Nominal Outcomes

And even if the proportional odds assumption holds, reporting separate odds ratios for each category—using generalized logit—may be important in order to emphasize the similarity of the strength of the association across categories.

In addition, the cumulative logit model will not only force the strength of association to be uniform, the predicted values will also be forced to be linear. Using generalized logit, the predicted odds and odds ratios will both more closely reflect the observed values.

Page 57: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Summary: Ordinal and Nominal Outcomes

Why Not Just Always Run Stratified Models for Generalized Logit?

For nominal outcomes, using a single model may be more efficient than using separate binary models

With separate binary models, need to decide whether each model should include the same independent variables or whether different final, category-specific models make sense, each including only those variables which are risk or protective factors for a particular binary comparison

57

Page 58: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

58

Summary: Ordinal and Nominal Outcomes

Using a single multinomial model permits a unified profile of risk and protective factors across the categories—both significant and insignificant

Page 59: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

59

Summary: Ordinal and Nominal Outcomes

For a variable that is actually continuous, are there reasons to use a cumulative logit model instead of a

continuous outcome model?

For example, when would modeling ordinal categories of birthweight be preferable either to modeling birthweight continuously in grams or categorized into nominal groups?

using a variable as ordinal (with fewer categories) as opposed to continuous will yield odds ratios instead of mean differences

No assumption of normality required

Page 60: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

60

Summary: Ordinal and Nominal Outcomes

For a variable that meets the proportional odds assumption, is it still appropriate to choose to use a

generalized logit approach?

using ordinal as opposed to nominal categories will be more efficient if there is truly an ordinal effect

Why "waste" degrees of freedom on multiple odds ratios, if the effect is constant across categories?

Page 61: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Which Modeling Approach?

Choosing the form of the outcome variable:

Stressful Life Events

• Any stressful life event (y/n) = independent vars(dichotomous)

• Fin. Emot. Traum. Partner = independent vars(Nominal - No stressful life events as the reference)

• Sum of stressful life events = independent vars(continuous)

• Scale of stressful life events = independent vars(ordinal)

61

Page 62: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Which Modeling Approach?

Choosing the form of the outcome variable:

Maternal Depression

• Any depression (y/n) = independent vars

• Pre&Post Pre_Only PP_Only = independent vars(Nominal - No depression as the reference)

• Severe Moderate Mild = independent vars(Ordinal or Nominal)

• Depression Severity Scale = independent vars(ordinal)

62

Page 63: Linear Models III Thursday May 31, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Which Modeling Approach?

Choosing the form of the outcome variable:Breastfeeding

• Ever Breastfed (yes v. no) = independent vars

• Exclusive BF>=2 mos. (yes v. no) = independent vars

• Exclusive >=2 mo. Exclusive BF<=2 mo.= independent varsNever Breastfed as reference

• BF<2 mo. BF 2-6 mo. BF > 6 mo. = independent varsNever Breastfed as reference

• Breastfeeding duration in weeks = independent vars

63