threads 2013

22

Upload: ben-bolker

Post on 03-Jul-2015

256 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Threads 2013

De�nitions Statistics Computation Sociological Conclusions References

General-purpose toolsfor generalized linear mixed models

Ben Bolker

McMaster University, Mathematics & Statistics and Biology

13 September 2013

Ben Bolker

GLMMs

Page 2: Threads 2013

De�nitions Statistics Computation Sociological Conclusions References

Outline

1 De�nitions and context

2 Statistical challenges

3 Computational challenges

4 Sociological challenges

5 Conclusions

Ben Bolker

GLMMs

Page 3: Threads 2013

De�nitions Statistics Computation Sociological Conclusions References

Outline

1 De�nitions and context

2 Statistical challenges

3 Computational challenges

4 Sociological challenges

5 Conclusions

Ben Bolker

GLMMs

Page 4: Threads 2013

De�nitions Statistics Computation Sociological Conclusions References

Generalized linear mixed models

GLMMs: a statistical modeling framework incorporating:

Linear combinations of categorical and continuouspredictors, and interactions

Response distributions in the exponential family

(binomial, Poisson, and extensions)

Any smooth, monotonic link function

(e.g. logistic, exponential models)

Flexible combinations of blocking factors

(clustering; random e�ects)

Applications in ecology, neurobiology, behaviour, epidemiology, realestate, . . .

Ben Bolker

GLMMs

Page 5: Threads 2013

De�nitions Statistics Computation Sociological Conclusions References

Generalized linear mixed models

GLMMs: a statistical modeling framework incorporating:

Linear combinations of categorical and continuouspredictors, and interactions

Response distributions in the exponential family

(binomial, Poisson, and extensions)

Any smooth, monotonic link function

(e.g. logistic, exponential models)

Flexible combinations of blocking factors

(clustering; random e�ects)

Applications in ecology, neurobiology, behaviour, epidemiology, realestate, . . .

Ben Bolker

GLMMs

Page 6: Threads 2013

De�nitions Statistics Computation Sociological Conclusions References

Generalized linear mixed models

GLMMs: a statistical modeling framework incorporating:

Linear combinations of categorical and continuouspredictors, and interactions

Response distributions in the exponential family

(binomial, Poisson, and extensions)

Any smooth, monotonic link function

(e.g. logistic, exponential models)

Flexible combinations of blocking factors

(clustering; random e�ects)

Applications in ecology, neurobiology, behaviour, epidemiology, realestate, . . .

Ben Bolker

GLMMs

Page 7: Threads 2013

De�nitions Statistics Computation Sociological Conclusions References

Generalized linear mixed models

GLMMs: a statistical modeling framework incorporating:

Linear combinations of categorical and continuouspredictors, and interactions

Response distributions in the exponential family

(binomial, Poisson, and extensions)

Any smooth, monotonic link function

(e.g. logistic, exponential models)

Flexible combinations of blocking factors

(clustering; random e�ects)

Applications in ecology, neurobiology, behaviour, epidemiology, realestate, . . .

Ben Bolker

GLMMs

Page 8: Threads 2013

De�nitions Statistics Computation Sociological Conclusions References

Technical de�nition

Yi︸︷︷︸response

conditionaldistribution︷︸︸︷Distr (g−1(ηi )︸ ︷︷ ︸

inverselink

function

, φ︸︷︷︸scale

parameter

)

η︸︷︷︸linear

predictor

= Xβ︸︷︷︸�xede�ects

+ Zb︸︷︷︸randome�ects

b︸︷︷︸conditionalmodes

∼ MVN(0, Σ(θ)︸ ︷︷ ︸variance-covariancematrix

)

Ben Bolker

GLMMs

Page 9: Threads 2013

De�nitions Statistics Computation Sociological Conclusions References

Outline

1 De�nitions and context

2 Statistical challenges

3 Computational challenges

4 Sociological challenges

5 Conclusions

Ben Bolker

GLMMs

Page 10: Threads 2013

De�nitions Statistics Computation Sociological Conclusions References

Estimation

Maximum likelihood estimation

L(Yi |θ,β)︸ ︷︷ ︸likelihood

=

∫· · ·

∫L(Yi |θ,β′)︸ ︷︷ ︸

data|random e�ects

×L(β′|Σ(θ))︸ ︷︷ ︸random e�ects

dβ′

deterministic: precision vs. computational cost:penalized quasi-likelihood, Laplace approximation, adaptiveGauss-Hermite quadrature (Breslow, 2004) . . .

Monte Carlo: frequentist and Bayesian (Booth and Hobert,1999; Ponciano et al., 2009; Sung, 2007)

Ben Bolker

GLMMs

Page 11: Threads 2013

De�nitions Statistics Computation Sociological Conclusions References

Estimation: example (McKeon et al., 2012)

Log−odds of predation−6 −4 −2 0 2

Symbiont

Crab vs. Shrimp

Added symbiont

GLM (fixed)GLM (pooled)PQLLaplaceAGQ

Ben Bolker

GLMMs

Page 12: Threads 2013

De�nitions Statistics Computation Sociological Conclusions References

Inference

Big problem.

Inferential tools: either asymptotic

or taken from classical linear

models

boundary solutions (Stram andLee, 1994)

the great p-value/degrees offreedom debate

small numbers of clusters

solutions: computationaland/or Bayesian(parametric bootstrap, MCMC)

True p value

Infe

rred

p v

alue

0.02

0.04

0.06

0.08

0.02 0.06

Osm Cu

H2S

0.02 0.06

0.02

0.04

0.06

0.08

Anoxia

Ben Bolker

GLMMs

Page 13: Threads 2013

De�nitions Statistics Computation Sociological Conclusions References

Outline

1 De�nitions and context

2 Statistical challenges

3 Computational challenges

4 Sociological challenges

5 Conclusions

Ben Bolker

GLMMs

Page 14: Threads 2013

De�nitions Statistics Computation Sociological Conclusions References

Sparse matrix algorithms

repeated decomposition oflarge, matrices (especially Z )

�ll-reducing permutation toimprove sparsity pattern

further improvements possible:better matrix representation,parallelization?

Ben Bolker

GLMMs

Page 15: Threads 2013

De�nitions Statistics Computation Sociological Conclusions References

Bounded optimization

Parameterizevariance-covariance matrix Σ(θ)(Pinheiro and Bates, 1996)

Positive de�nite or onlysemi-de�nite?

Disadvantages of transformingto unconstrain

(Disadvantages of boundarysolutions)

raw log

0

10

20

30

0 1 2 3 −3 −2 −1 0

devi

ance

Ben Bolker

GLMMs

Page 16: Threads 2013

De�nitions Statistics Computation Sociological Conclusions References

Outline

1 De�nitions and context

2 Statistical challenges

3 Computational challenges

4 Sociological challenges

5 Conclusions

Ben Bolker

GLMMs

Page 17: Threads 2013

De�nitions Statistics Computation Sociological Conclusions References

Sociological issues

The curse of neophilia

Wide user base:

As usual when software for complicated statistical

inference procedures is broadly disseminated, there is

potential for abuse and misinterpretation.

(Breslow, 2004)

What if there is no good answer?�do no harm� vs. �better me than someone else�

Diagnostics and warning messages

End users vs. downstream developers

Ben Bolker

GLMMs

Page 18: Threads 2013

De�nitions Statistics Computation Sociological Conclusions References

Outline

1 De�nitions and context

2 Statistical challenges

3 Computational challenges

4 Sociological challenges

5 Conclusions

Ben Bolker

GLMMs

Page 19: Threads 2013

De�nitions Statistics Computation Sociological Conclusions References

Next steps

Alternative platforms/languages

Flexible correlation structures:spatial, temporal, phylogenetic . . .

Improved MCMC methods?

Simulation tests of inferential tools (sigh)

Ben Bolker

GLMMs

Page 20: Threads 2013

De�nitions Statistics Computation Sociological Conclusions References

Is it science?

Science is what we

understand well enough to

explain to a computer. Art

is everything else we do.

(Donald Knuth)

10

20

30

4050

2006 2008 2010 2012Date

artic

les

per

mon

th

key

glmm

lme4

Ben Bolker

GLMMs

Page 21: Threads 2013

De�nitions Statistics Computation Sociological Conclusions References

Acknowledgments

lme4: Doug Bates, MartinMächler, Steve Walker

Data: Adrian Stier (UBC/OSU),Sea McKeon (Smithsonian),David Julian (UF)

NSERC (Discovery)

SHARCnet

Ben Bolker

GLMMs

Page 22: Threads 2013

De�nitions Statistics Computation Sociological Conclusions References

Booth, J.G. and Hobert, J.P., 1999. Journal of the Royal Statistical Society. Series B, 61(1):265�285.doi:10.1111/1467-9868.00176.

Breslow, N.E., 2004. In D.Y. Lin and P.J. Heagerty, editors, Proceedings of the second Seattlesymposium in biostatistics: Analysis of correlated data, pages 1�22. Springer. ISBN 0387208623.

McKeon, C.S., Stier, A., et al., 2012. Oecologia, 169(4):1095�1103. ISSN 0029-8549.doi:10.1007/s00442-012-2275-2.

Pinheiro, J.C. and Bates, D.M., 1996. Statistics and Computing, 6(3):289�296.doi:10.1007/BF00140873.

Ponciano, J.M., Taper, M.L., et al., 2009. Ecology, 90(2):356�362. ISSN 0012-9658.

Stram, D.O. and Lee, J.W., 1994. Biometrics, 50(4):1171�1177.

Sung, Y.J., 2007. The Annals of Statistics, 35(3):990�1011. ISSN 0090-5364.doi:10.1214/009053606000001389.

Ben Bolker

GLMMs