1 a. analysis of count data introduction to log-linear models

45
1 A. Analysis of count data Introduction to log- linear models

Upload: joseph-reeves

Post on 13-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 A. Analysis of count data Introduction to log-linear models

1

A. Analysis of count data

Introduction to log-linear models

Page 2: 1 A. Analysis of count data Introduction to log-linear models

2

Log-linear analysis

• Contingency-table analysis

• Categorical data analysis• Discrete multivariate analysis (Bishop, Fienberg

and Holland, 1975)

• Analysis of cross-classified data• Multivariate analysis of qualitative data

(Goodman, 1978)

• Count data analysis

Page 3: 1 A. Analysis of count data Introduction to log-linear models

3

Contrast CodingLog-linear models for two-way tables

μμμλAB

ij

B

j

A

iij μ ln

Saturated log-linear model:

μ

μA

i μ

B

j

Overall effect (level)

Main effects(marginal freq.)

Interaction effect μAB

ij

In case of 2 x 2 table:

4 observations

9 parameters

Normalisation constraints

Page 4: 1 A. Analysis of count data Introduction to log-linear models

4

Survey: leaving parental home in the Netherlands

Age Female Male Total<20 135 74 209>=20 143 178 321Total 278 252 530Censored 13 40 53Total 291 292 583

The survey (Sept. 1987 - Febr. 1988):Sample of 583 young adults born in 1961530 left home before survey53 censored cases

Number leaving perantal home, by age and sex, 1961 birth cohort

Sex

Page 5: 1 A. Analysis of count data Introduction to log-linear models

5

• Counts

• Percentages

• Odds of leaving home early rather than late

Descriptive statistics

Age Female Male Total Female Male Total<20 48.6 29.4 39.4 64.6 35.4 100.0>=20 51.4 70.6 60.6 44.5 55.5 100.0Total 100.0 100.0 100.0 52.5 47.5 100.0

SexSex

Female Male TotalOdds 0.9441 0.4157 0.6511Odds ratio (ref.cat: males): 2.271

SexReference category

Leaving home

Page 6: 1 A. Analysis of count data Introduction to log-linear models

6

Log-linear models for two-way tables4 models

Model 1: Null model or overall effect model

All categories are equiprobable (an observation is equally likely to fall into any cell)

μ ln λij for all i and j

= 4.887 s.e. 0.0434

ij is expected count (frequency) in cell (ij): category i of variable A (row) and category j of variable B (column)

Exp(4.887) = 132.5

= 530/4

Leaving home

Page 7: 1 A. Analysis of count data Introduction to log-linear models

7

λ λ ijij 1/ ][ln Var μVar

Where ij is a cell frequency generated by a Poisson process and Var[aX] = a2 Var[X] where a is a constant (e.g. Fingleton, 1984, p. 29)

4ij

ijij

ij41 λ λ ln ln μ

50.132

1

50.132

1

50.132

1

50.132

1 μVar 4

1 2

ij

ij41 λ ln μ

ijij

2

ijij4

1 λ λ ln ]ln Var[ μVar 41

Leaving home

Page 8: 1 A. Analysis of count data Introduction to log-linear models

8

Log-linear models for two-way tables

Model 2: B null model

Categories of variable B (sex) are equiprobable within levels of variable A (age)

μ ln μλA

iij for all j

estimate s.e. Parameter Exp(parameter) 4.649 0.06914 Overall effect 104.5 0.0000 TIME(1)

0.4291 0.08886 TIME(2) 1.536

μ

μA

2

μA

1

GLIM

Leaving home

Page 9: 1 A. Analysis of count data Introduction to log-linear models

9

Log-linear models for two-way tables

Model 3: B null model

Categories of variable A (age) are equiprobable within levels of variable B (sex)

ln Bjij for all j

estimate s.e. Parameter Exp(parameter) 5.773 0.0558 Overall effect 321.5

-0.4283 0.0888 TIME(1) 0.6516

0.0000 TIME(2)

μ

μA

2

μA

1

SPSS

Leaving home

Page 10: 1 A. Analysis of count data Introduction to log-linear models

10

Log-linear models for two-way tablesModel 4: independence model (unsaturated model)

Categories of variable B (sex) are not equiprobable but the probability is independent of levels of variable A (time)

Bj

Aiijln

Bj

Ai

Bj

Aiij ]exp[

estimate s.e. Parameter Exp(parameter)

4.697 0.0806 Overall effect 109.62 0.429 0.0889 TIME(2) 1.536 -0.098 0.0870 SEX(2) 0.906

GLIM

A

2B2

Leaving home

Page 11: 1 A. Analysis of count data Introduction to log-linear models

11

LOG-LINEAR MODEL: predictions

Females leaving home early: 109.62

Females leaving home late: 109.62 * 1.536 = 168.37

Males leaving home early: 109.62 * 0.906 = 99.37

Males leaving home late: 109.62 * 1.536 * 0.906 = 152.63

Leaving home

Page 12: 1 A. Analysis of count data Introduction to log-linear models

12

Parameter Estimate SE

1 5.0280 .0721 Overall effect 2 -.4291 .0889 Time(1)

3 .0000 . Time(2)

4 .0982 .0870 Sex(1)

5 .0000 . Sex (2)

SPSS

μ

μA

1

μA

2

μB

2

μB

1

Leaving home

Page 13: 1 A. Analysis of count data Introduction to log-linear models

13

Log-linear models for two-way tablesModel 5: saturated model

The values of categories of variable B (sex) depend on levels of variable A (time)

μμμλAB

ij

B

j

A

iij μ ln

GLIM μ

μA

2

μB

2

estimate s.e. parameter 4.905 0.08607 Overall effect

0.05757 0.1200 TIME(2)

-0.6012 0.1446 SEX(2)

0.8201 0.1831 TIME(2).SEX(2) μAB

22

Leaving home

Page 14: 1 A. Analysis of count data Introduction to log-linear models

14

Parameter Estimate SE Parameter 1 5.1846 .0748 Overall effect

2 -.8738 .1379 Time(1)

3 .0000 . Time(2)

4 -.2183 .1121 Sex(1)

5 .0000 . Sex(2)

6 .8164 .1827 Time(1) * Sex(1)

7 .0000 . Time(1) * Sex(2)

8 .0000 . Time(2) * Sex(1)

9 .0000 . Time(2) * Sex(2)

μ

μA

1

μB

2

μAB

21

μA

2

μB

1

μAB

22

μAB

12

μAB

11

SPSSLeaving home

Page 15: 1 A. Analysis of count data Introduction to log-linear models

15

LOG-LINEAR MODEL: predictions

Expected frequencies

Observed Model 1 Model 2 Model 3 Model 4 Model 5Fem_<20 F11 135 132.50 104.50 139.00 109.63 135.00 Mal_<20 F12 74 132.50 104.50 126.00 99.37 74.00 Fem_>20 F21 143 132.50 160.50 139.00 168.37 143.00 Mal_>20 F22 178 132.50 160.50 126.00 152.63 178.00

D:\s\1\liebr\2_2\2_2.wq2

Leaving home

Page 16: 1 A. Analysis of count data Introduction to log-linear models

16

Relation log-linear model and Poisson regression model

μμμλAB

ij

B

j

A

iij μ ln

xxxλln 3ij32j21i10ij

x , , 3ij2j1i xx are dummy variables (0 if i or j is equal to 1and1 if i or j equal to 2) and interaction variable is x x*x 2j1i3ij

Page 17: 1 A. Analysis of count data Introduction to log-linear models

17

Observed

FMAge20974135< 20321178143> 20530252278

Page 18: 1 A. Analysis of count data Introduction to log-linear models

18

Aiijln

Model 1: Null Model

FMAge265132.5132.5< 20265132.5132.5> 20530265265

Page 19: 1 A. Analysis of count data Introduction to log-linear models

19

Aiijln

Model 2: B Null Model (sex equiprobable)

FMAge209104.5104.5< 20321160.5160.5> 20530265265

Page 20: 1 A. Analysis of count data Introduction to log-linear models

20

Bjijln

Model 3: A Null Model (age equiprobable)

FMAge265126139< 20265126139> 20530252278

Page 21: 1 A. Analysis of count data Introduction to log-linear models

21

Bj

Aiijln

Model 4: Independence Model (no interaction)

FMAge20999.37109.63< 20321152.63168.37> 20530252278

Page 22: 1 A. Analysis of count data Introduction to log-linear models

22

ABij

Bj

Aiijln

Model 5: Saturated Model

FMAge20974135< 20321178143> 20530252278

Page 23: 1 A. Analysis of count data Introduction to log-linear models

23

Log-linear model fit a model to a table of frequencies

Data: survey of political attitudes of British electors

by genderGender

Party Male Female TotalConservative 279 352 631Labour 335 291 626Total 614 643 1257

OBSERVED FREQUENCIES FOR VOTE

Source: Payne, C. (1977) The log-linear model for contingency. In: C.O. Muircheartaigh and C. Payne eds. The analysis of survey data. Vol 2: Model fitting, Wiley, New York, pp. 105-144 [data p. 106].(from Butler and Stokes, ‘Political change in Britain’, Macmillan, 2nd edidition, 1974)

Page 24: 1 A. Analysis of count data Introduction to log-linear models

24

The classical approach

Geometric means (Birch, 1963)

Effect coding (mean is ref. Cat.)

Birch, M.W. (1963) ‘Maximum likelihood in three-way contingency tables’,J. Royal Stat. Soc. (B), 25:220-233

Page 25: 1 A. Analysis of count data Introduction to log-linear models

25

GenderParty Male Female TotalConservative 5.6312 5.8636 11.4948Labour 5.8141 5.6733 11.4875Total 11.4453 11.5370 22.9823

Logarithm of frequencies

Overall effect : 22.98/4 = 5.7456

Effect of party : Conservative : 11.49/2 - 5.7456 = 0.0018 Labour : 11.49/2 - 5.7456 = -0.0018

Effect of gender : Male : 11.44/2 - 5.7456 = -0.0229 Female : 11.54/2 - 5.7456 = 0.0229

Interaction effects: Gender-Party interaction effect Male conservative : 5.6312 - 5.7456 - 0.0018 + 0.0229 = -0.0933 Female conservative : 5.8636 - 5.7456 - 0.0018 - 0.0229 = 0.0933 Male labour : 5.8141 - 5.7456 + 0.0018 + 0.0229 = 0.0933 Female labour : 5.6733 - 5.7456 + 0.0018 - 0.0229 = -0.0933

The basic modelPolitical attitudes

Page 26: 1 A. Analysis of count data Introduction to log-linear models

26

The basic model (Effect Coding: Mean)Birch, M.W. (1963) ‘Maximum likelihood in three-way contingency tables’,J. Royal Stat. Soc. (B), 25:220-233

μ

μA

i

μB

j

μAB

ij

Main effect 5.7456Party effect Conservative 0.0018 Labour -0.0018Gender effect Male -0.0229 Female 0.0229Gender-Party interaction Male conservative -0.0933 Female conservative 0.0933 Male labour 0.0933 Female labour -0.0933

0 i

A

0 j

B

Coding: effect coding

0 j

AB

iji

AB

ijμμ

Parameters are subject to constraints: normalisation constraints

Only first-order contrasts can be estimated: μμA

1

A

2 -

Political attitudes

Page 27: 1 A. Analysis of count data Introduction to log-linear models

27

The basic model (GLIM)

μ

μA

i

μB

j

μAB

ij

Main effect 5.6310 0.0599Party effect Conservative 0.0000 . Labour 0.1829 0.0811Gender effect Male 0.0000 . Female 0.2324 0.0802Gender-Party interaction Male conservative 0.0000 . Female conservative 0.0000 . Male labour 0.0000 . Female labour -0.3732 0.1133

Estimate S.E.

Political attitudes

Page 28: 1 A. Analysis of count data Introduction to log-linear models

28

The basic model (SPSS)

Estimate SE Lower Upper

Main effect 5.6750 0.0586 5.56 5.79Party effect Conservative 0.1900 0.0792 0.03 0.35 Labour 0.0000 . . .Gender effect Male 0.1406 0.0801 -0.02 0.30 Female 0.0000 . . .Gender-Party interaction Male conservative -0.3726 0.1133 -0.59 -0.15 Female conservative 0.0000 . . . Male labour 0.0000 . . . Female labour 0.0000 . . .

Asymptotic 95% CI

Political attitudes

Page 29: 1 A. Analysis of count data Introduction to log-linear models

29

μμμλAB

ij

B

j

A

iij μ ln

ln 11 = 5.7456 + 0.0018 - 0.0229 - 0.0933 = 5.6312

ln 12 = 5.7456 + 0.0018 + 0.0229 + 0.0933 = 5.8636

ln 21 = 5.7456 - 0.0018 - 0.0229 + 0.0933 = 5.8142

ln 22 = 5.7456 - 0.0018 + 0.0229 - 0.0933 = 5.6734

GenderParty Male Female TotalConservative 5.6312 5.8636 11.4948Labour 5.8141 5.6733 11.4875Total 11.4453 11.5370 22.9823

Logarithm of frequencies

] exp[ μμμμλAB

ij

B

j

A

iij

The basic model (1)Political attitudes

Page 30: 1 A. Analysis of count data Introduction to log-linear models

30

The design-matrix approach

Page 31: 1 A. Analysis of count data Introduction to log-linear models

31

I. Design matrix: Effect Codingunsaturated log-linear model

μ ln μμλB

j

A

iij

uuuuu

λλλλ

B

2

B

1

A

2

A

1

22

21

12

11

10101

01101

10011

01011

ln

ln

ln

ln

Number of parameters exceeds number of equations need for additional equations

(X’X)-1 is singular identify linear dependencies

μ Y X Yμ X'X-1

Page 32: 1 A. Analysis of count data Introduction to log-linear models

32

I. Design matrixunsaturated log-linear model

μ ln μμλB

j

A

iij

μμA

1

A

2 -

uuu

λλλλ

B

1

A

1

22

21

12

11

1-1-1

1 1-1

1-1 1

1 1 1

ln

ln

ln

ln

μμB

1

B

2 - (additional eq.)

Coding!

Page 33: 1 A. Analysis of count data Introduction to log-linear models

33

uu

λλλ

B

1

A

1

21

12

11 u

1 11

11 1

1 1 1

ln

ln

ln

3 unknowns 3 equations

λλλ

ln ln

ln

uuu

21

12

11

B

1

A

1

A

1

0 0.5-0.5

0.5-0 0.5

0.5 0.5 0

λλλ

1 11

11 1

1 1 1

uu

21

12

11

1

B

1

A

1

ln

ln

ln

u

μ ln μμλB

j

A

iij where is the frequency predicted by the model

Page 34: 1 A. Analysis of count data Introduction to log-linear models

34

322.78 6431257

631 F

FF

λ 2

1

12

308.22 6141257

631 F

FF

λ 1

1

11

305.78 6141257

626 F

FF

λ 1

2

21

320.22 6431257

626 F

FF

λ 2

2

22

by genderGender

Party Male Female TotalConservative 279 352 631Labour 335 291 626Total 614 643 1257

OBSERVED FREQUENCIES FOR VOTE

by genderGender

Party Male Female TotalConservative 308.22 322.78 631.00Labour 305.78 320.22 626.00Total 614.00 643.00 1257.00

PREDICTED FREQUENCIES FOR VOTE

Political attitudes

Page 35: 1 A. Analysis of count data Introduction to log-linear models

35

305.78ln

322.78 ln

308.22 ln

0 0.5-0.5

0.5-0 0.5

0.5 0.5 0

uuu

B

1

A

1

A

1

0.02310-

0.00395

5.74995

7229.5

7770.5

7308.5

0 0.5- 0.5

0.5- 0 0.5

0.5 0.5 0

uuu

B

1

A

1

A

1

9772.0

0040.1

17.314

]exp[

]exp[

]exp[

uuu

τττ

A

1

A

1

B

1

A

1

λλλ

uuu

21

12

11

B

1

A

1

A

1

ln

ln

ln

0 0.5-0.5

0.5-0 0.5

0.5 0.5 0

314.17*1.0040*0.9772 = 308.23 B

1

A

1

11 τττλ

B

1

A

1

21 τττλ ][ 1/ 314.17*[1/1.0040]*0.9772 = 305.78

Political attitudes

Page 36: 1 A. Analysis of count data Introduction to log-linear models

36

uuuu

λλλλ

AB

11

B

1

A

1

22

21

12

11

1 1-1-1

1-1 1-1

1-1-1 1

1 1 1 1

ln

ln

ln

ln

Design matrixSaturated log-linear model

μμμλAB

ij

B

j

A

iij μ ln

μμA

1

A

2 - μμ

B

1

B

2 -

μμAB

11

AB

12 - μμ

AB

11

AB

21 - μμ

AB

11

AB

22

Page 37: 1 A. Analysis of count data Introduction to log-linear models

37

λλλλ

1 1-1-1

1-1 1-1

1-1-1 1

1 1 1 1

uuuu

22

21

12

11

-1

AB

11

B

1

A

1

ln

ln

ln

ln

0.09330-

0.02290-

0.00185

5.74555

5.6733

5.8141

5.8636

5.6312

25.0 25.0-25.0-0.25

25.0-25.0 25.0-0.25

25.0-30.0-25.0 0.25

25.0 25.0 0.25 0.25

uuuu

AB

11

B

1

A

1

] exp[ μμμμλAB

11

B

1

A

111 exp[5.7456+0.0018-0.0229-0.0933] = exp[5.6312] = 279

] exp[ μμμμλAB

21

B

1

A

221 exp[5.7456-0.0018-0.0229+0.0933] = 335

Political attitudes

Page 38: 1 A. Analysis of count data Introduction to log-linear models

38

Type of model Overall Party Gender Unsatur. SaturatedObserved Model 1 Model 2 Model 3 Model 4 Model 5

Mal_Cons F11F11 279 314.25 315.50 307.00 308.22 279.00Fem_Cons F12 352 314.25 315.50 321.50 322.78 352.00Mal_Labour F21 335 314.25 313.00 307.00 305.78 335.00Fem_Labour F22 291 314.25 313.00 321.50 320.22 291.00--------------------------------------------------------------------------Chi-square 11.58 11.54 10.9 10.89 0Degrees of freedom 3 2 2 1 0

A. Additive modelType of model Overall Party Gender Unsatur. Satur.Main effect 5.7502 5.7542 5.7269 5.7308 5.6312Gender effect 0.0000 0.0000 0.0461 0.0462 0.2324Party effect 0.0000 -0.0080 0.0000 -0.0080 0.1829Gender-Party interaction effect 0.0000 0.0000 0.0000 0.0000 -0.3732

B. Multiplicative model [exp(u)]Type of model Overall Party Gender Unsatur. Satur.Main effect 314.2500 315.5001 307.0007 308.2157 278.9967Gender effect 0.0000 1.0000 1.0472 1.0472 1.2616Party effect 0.0000 0.9920 1.0000 0.9920 1.2007Gender-Party interaction effect 0.0000 1.0000 1.0000 1.0000 0.6885

LOG-LINEAR MODEL: expected frequencies

LOG-LINEAR MODEL: Parameters (EFFECT CODING: first category = 0)

Political attitudes

Page 39: 1 A. Analysis of count data Introduction to log-linear models

39

Other Ways of RestrictingII. Design Matrix: Contrast Coding

Page 40: 1 A. Analysis of count data Introduction to log-linear models

40

III. Design matrix: other restrictions on parameterssaturated log-linear model

μ ln μμμλAB

ij

B

j

A

iij

0 μA

2

uuuu

λλλλ

AB

11

B

1

A

1

22

21

12

11

0001

0101

0011

1111

ln

ln

ln

ln

0 μB

2 (SPSS)0 μμμ

AB

22

AB

21

AB

12

Page 41: 1 A. Analysis of count data Introduction to log-linear models

41

0.3726-

0.1406

0.1900

5.6750

5.6733

5.8141

5.8636

5.6312

291ln

335ln

352ln

279ln

1 1-1-1

1-1 0 0

1-0 1 0

1 0 0 0

1 1-1-1

1-1 0 0

1-0 1 0

1 0 0 0

uuuu

AB

11

B

1

A

1

Coding 2 Coding 1(SPSS) (Birch)

Main effect 5.6750 5.7456Party effect Conservative 0.1900 0.0019 Labour 0.0000 -0.0019Gender effect Male 0.1406 -0.0229 Female 0.0000 0.0229Gender-Party interaction Male conservative -0.3726 -0.0933 Female conservative 0.0000 0.0933

Political attitudes

Page 42: 1 A. Analysis of count data Introduction to log-linear models

42

0.3726-

0.1406

0.1900

5.6750

5.6733

5.8141

5.8636

5.6312

291ln

335ln

352ln

279ln

1 1-1-1

1-1 0 0

1-0 1 0

1 0 0 0

1 1-1-1

1-1 0 0

1-0 1 0

1 0 0 0

uuuu

AB

11

B

1

A

1

Coding 2 Coding 1(SPSS) (Birch)

Main effect 5.6750 5.7456Party effect Conservative 0.1900 0.0019 Labour 0.0000 -0.0019Gender effect Male 0.1406 -0.0229 Female 0.0000 0.0229Gender-Party interaction Male conservative -0.3726 -0.0933 Female conservative 0.0000 0.0933 Male labour 0.0000 0.0933 Female labour 0.0000 -0.0933

Political attitudes

Page 43: 1 A. Analysis of count data Introduction to log-linear models

43

OBSERVED FREQUENCIES FOR VOTE BY SEXSex

Party Male Female TotalConservative 279 352 631Labour 335 291 626Total 614 643 1257

mu exp(mu) mu exp(mu) mu exp(mu)Main effect 5.6750 291.49 5.6312 279.00 5.7456 312.80Party effect Conservative 0.1900 1.2092 0.0000 1.0000 0.0019 1.0019 Labour 0.0000 1.0000 0.1829 1.2007 -0.0019 0.9982Gender effect Male 0.1406 1.1510 0.0000 1.0000 -0.0229 0.9774 Female 0.0000 1.0000 0.2324 1.2616 0.0229 1.0232Gender-Party interaction Male conservative -0.3726 0.6889 0.0000 1.0000 -0.0933 0.9109 Female conservative 0.0000 1.0000 0.0000 1.0000 0.0933 1.0978 Male labour 0.0000 1.0000 0.0000 1.0000 0.0933 1.0978 Female labour 0.0000 1.0000 -0.3732 0.6885 -0.0933 0.9109

Parameter estimatesContrast coding Effect coding

(SPSS)Contrast coding

(GLIM) (Birch)

Political attitudes

Page 44: 1 A. Analysis of count data Introduction to log-linear models

44

Param s.e. Param s.e.Main effect 5.6750 0.0586 5.6312 0.0599

Party effect Conservative 0.1900 0.0792 0.0000 .

Labour 0.0000 . 0.1829 0.0811

Gender effect Male 0.1406 0.0801 0.0000 .

Female 0.0000 . 0.2324 0.0802

Gender-Party interaction Male conservative -0.3726 0.1133 0.0000 .

Female conservative 0.0000 . 0.0000 .

Male labour 0.0000 . 0.0000 .

Female labour 0.0000 . -0.3732 0.1133

Parameters estimates and standard errorContrast coding

(SPSS)Contrast coding

(GLIM)

Political attitudes

Page 45: 1 A. Analysis of count data Introduction to log-linear models

45

B. Contrast coding: GLIM291 = 279 * 1.2616 * 1.2007 * 0.6885 (females voting labour)279 = 279 * 1 * 1 * 1 (males voting conservative = ref.cat)352 = 279 * 1.2616 * 1 * 1 (females voting conservative)335 = 279 * 1 * 1.2007 * 1 (males voting labour)C. Contrast coding: SPSS (SPSS adds 0.5 to observed values )

279.5 = 291.5 * 1.15096 * 1.20925 * 0.68894352.5 = 291.5 * 1 * 1.20925 * 1291.5 = 291.5 * 1 * 1 * 1 (females voting labour = ref.cat)335.5 = 291.5 * 1.15096 * 1 * 1

A. Effect coding279 = 312.80 * 0.97736 * 1.00185 * 0.91092352 = 312.80 * 1.02316 * 1.00185 * 1.09779335 = 312.80 * 0.97736 * 0.99815 * 1.09779291 = 312.80 * 1.02316 * 0.99815 * 0.91092

Prediction of counts or frequencies:Political attitudes