montpellier

37

Upload: ben-bolker

Post on 05-Jul-2015

1.802 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Montpellier

Overview New methods War stories Conclusions References

Statistical machismo and common sense

Ben Bolker, McMaster UniversityDepartments of Mathematics & Statistics and Biology

ISEC

July 2014

Page 2: Montpellier

Overview New methods War stories Conclusions References

Outline

1 OverviewStatistical machismoWhy do ecological statistics?

2 New methodsDesiderataStatisticians and software

3 War stories

4 Conclusions

Page 3: Montpellier

Overview New methods War stories Conclusions References

Acknowledgements

People Steve Walker, Mollie Brooks, Mike McCoy

Support NSERC Discovery grant

Page 4: Montpellier

Overview New methods War stories Conclusions References

Outline

1 OverviewStatistical machismoWhy do ecological statistics?

2 New methodsDesiderataStatisticians and software

3 War stories

4 Conclusions

Page 5: Montpellier

Overview New methods War stories Conclusions References

Statistical machismo

blog post by Brian McGill on Dynamic Ecology

≈ �method ageism� (M. Brewer)

criticizes unnecessarily fancy statistics, e.g.

Bonferroni correctionsphylogenetic correctionsspatial regressionestimation of detectabilityBayesian methods

Also cf. Murtaugh (2007; 2009; 2014)

Page 6: Montpellier

Overview New methods War stories Conclusions References

Statistical machismo

blog post by Brian McGill on Dynamic Ecology

≈ �method ageism� (M. Brewer)

criticizes unnecessarily fancy statistics, e.g.

Bonferroni correctionsphylogenetic correctionsspatial regressionestimation of detectabilityBayesian methods

Also cf. Murtaugh (2007; 2009; 2014)

Page 7: Montpellier

Overview New methods War stories Conclusions References

Statistical machismo

blog post by Brian McGill on Dynamic Ecology

≈ �method ageism� (M. Brewer)

criticizes unnecessarily fancy statistics, e.g.

Bonferroni correctionsphylogenetic correctionsspatial regressionestimation of detectabilityBayesian methods

Also cf. Murtaugh (2007; 2009; 2014)

Page 8: Montpellier

Overview New methods War stories Conclusions References

Statistical machismo (2)

Slippery slope: what is �good enough�? (GLM vs ANOVA)

McGill: �And when the p-values are <0.0000001 and unlikelyto change . . . �

Criticizing dogma, not methods per se

Does this apply to statistical ecologists?Are we enabling bad practice?

Caveat: researcher/teacher vs. researcher/consultant niche

Page 9: Montpellier

Overview New methods War stories Conclusions References

Outline

1 OverviewStatistical machismoWhy do ecological statistics?

2 New methodsDesiderataStatisticians and software

3 War stories

4 Conclusions

Page 10: Montpellier

Overview New methods War stories Conclusions References

Intellectual satisfaction

Novelty

Seek elegant/rigorous solutions

Mathematical or computational challenges

Page 11: Montpellier

Overview New methods War stories Conclusions References

Solve scienti�c and societal problems

Knuth: �The important thing, once you have enough to eat and anice house, is what you can do for others, what you can contributeto the enterprise as a whole�

Basic science

Hypothesis testing (broad sense)Links to ecological theory

Applied science

PredictionDecision analysisTechnology transfer(where possible)

Page 12: Montpellier

Overview New methods War stories Conclusions References

Solve scienti�c and societal problems

Knuth: �The important thing, once you have enough to eat and anice house, is what you can do for others, what you can contributeto the enterprise as a whole�

Basic science

Hypothesis testing (broad sense)Links to ecological theory

Applied science

PredictionDecision analysisTechnology transfer(where possible)

Page 13: Montpellier

Overview New methods War stories Conclusions References

Career advancement

get a job/get tenure/get funded

satisfy colleagues

be either useful or rigorous!

avoid �too much collaboration�

curse of novelty

Page 14: Montpellier

Overview New methods War stories Conclusions References

Outline

1 OverviewStatistical machismoWhy do ecological statistics?

2 New methodsDesiderataStatisticians and software

3 War stories

4 Conclusions

Page 15: Montpellier

Overview New methods War stories Conclusions References

Robustness

less reliance on assumptions(Box �all models are wrong . . . �)

computational robustnesshandle crappy (small, noisy, missing . . . ) data

robust to user error

examples: sandwich estimators; M-estimators;design-based statistics; permutation tests

tradeo�s: complexity, loss of interpretability, increased bias,decreased e�ciency

Page 16: Montpellier

Overview New methods War stories Conclusions References

Speed and scalability

�just get a bigger computer�, or a cluster

quantitative becomes qualitative

modern methods (ensembles, resampling, MCMC . . . )increase need for speed(but usually trivially parallelizable)

scalability(parallel implementations; computational complexity)

examples: ensemble-based approaches; kernel-based methods;sparse matrix computation; map/reduce

tradeo�s: complexity, lack of generality

Page 17: Montpellier

Overview New methods War stories Conclusions References

Interpretability (Warton and Hui, 2010)

connection to mechanistic/theoretical models

. . . or to statistical paradigms (e.g. linear models)

maybe a hard sell

examples: Bayesian statistics (P(H|D) vs. P(D|H));GLMs

counterargument: Breiman (2001)

Page 18: Montpellier

Overview New methods War stories Conclusions References

Increased correctness (O'Hara and Kotze, 2010; Warton and Hui, 2010)

decrease bias/type I error; improve coverage

how big is bias generally? how much does it matter?

unpopular with ecologists!especially if it �lowers power�/makes e�ects disappear . . .

Page 19: Montpellier

Overview New methods War stories Conclusions References

Statistical power/e�ciency

squeezing more out of data is always good

. . . but how much?

maybe irrelevant if there's enough data (econometrics, e.g.Angrist and Pischke (2009))

examples: GLMs vs. least-squares; random vs �xed e�ects

tradeo�s: complexity, model-dependence/loss of robustness

Page 20: Montpellier

Overview New methods War stories Conclusions References

Handle new kinds of data / solve new problems

Hard to argue with this one!

Most often, methods for combining di�erent characteristics:

spatial/temporal/phylog. correlationnon-Normal datamissing dataregularization/smoothing/dimension limitationetc. ...

Page 21: Montpellier

Overview New methods War stories Conclusions References

Ease of use

general, �exible frameworks (maybe)(or domain-speci�c solutions)

interfaces

examples: MaxEnt; information-theoretic approaches

counterexamples: BUGS/JAGS, AD Model Builder?

Page 22: Montpellier

Overview New methods War stories Conclusions References

Outline

1 OverviewStatistical machismoWhy do ecological statistics?

2 New methodsDesiderataStatisticians and software

3 War stories

4 Conclusions

Page 23: Montpellier

Overview New methods War stories Conclusions References

User friendliness/interfaces

how hard should you try to make your methods usable?(Dynamic Ecology blog post)

what kind of interface?

equationspseudocodecode blobspackageGUI

Page 24: Montpellier

Overview New methods War stories Conclusions References

Technology transfer and software engineering

Technology transfer might be the most useful we can be

Boring? Unrewarding? New tech has to come from somewhere

statistical software now commoditized and often free

good statistical software takes a lot of time and e�ort(and few users want to pay for it)

eating dog food/developer blindness

building software exposes new issues:Knuth: �Science is what we understand well enough to explainto a computer. Art is everything else we do�

Page 25: Montpellier

Overview New methods War stories Conclusions References

Technology transfer and software engineering

Technology transfer might be the most useful we can be

Boring? Unrewarding? New tech has to come from somewhere

statistical software now commoditized and often free

good statistical software takes a lot of time and e�ort(and few users want to pay for it)

eating dog food/developer blindness

building software exposes new issues:Knuth: �Science is what we understand well enough to explainto a computer. Art is everything else we do�

Page 26: Montpellier

Overview New methods War stories Conclusions References

Responsibilities

Do you have a responsibility to

Fix bugs?Provide features?Advise users?

Does the �no liability� clause of free software licenses(e.g. GPL §16) absolve us of moral responsibility?

Page 27: Montpellier

Overview New methods War stories Conclusions References

Outline

1 OverviewStatistical machismoWhy do ecological statistics?

2 New methodsDesiderataStatisticians and software

3 War stories

4 Conclusions

Page 28: Montpellier

Overview New methods War stories Conclusions References

Spatial moment equations

general theoretical framework for ecological dynamics ofspatial point processes (Ovaskainen et al., 2014)

used mostly to understand qualitative ecological dynamics

framework for connecting spatial theory and data?

in particular, should be able to deconvolve e�ects of (e.g.)habitat and habitat preference

N(x) =

∫E (y − x)H(y) dy = (E ∗ H)(x)

then

H̃(ω) =C̃EN

C̃EE

Page 29: Montpellier

Overview New methods War stories Conclusions References

Deconvolution: example

habitat preference

distance

habitat

0

1

0 50

correlations & preference

distance

population

so far just a gleam in my eye

many details (non-Normality, nonlinear response) . . .

Page 30: Montpellier

Overview New methods War stories Conclusions References

Estimating growth autocorrelation in unmarked individuals(Brooks et al., 2013)

uncorrelated growth: σ2(t)linear

correlated growth: σ2(t)quadratic

straightforward estimation

Page 31: Montpellier

Overview New methods War stories Conclusions References

Growth autocorrelation (cont.)

power analysis:require 500 individuals@ 25%autocorrelation,100 individuals @ 50%autocorrelation

available data: 5�50individuals

so far unused(cf. Lavine et al.(2002))

Page 32: Montpellier

Overview New methods War stories Conclusions References

Mixed stock estimation(Bolker et al., 2003, 2007; Okuyama and Bolker, 2005)

Bayesian �mixed stock�analysis, following Pella andMasuda (2001)

unconditional likelihood:account for sampling errorin sources

better con�dence intervals

many-to-many methods(Chen et al., 2010)

widely used . . .

●●

●●

●●

●AS

AVCR

CYFL

GBGG

MX

BRSU

TR

BAH

BAR

FLF

CBGMOC

NIC

BRF

Page 33: Montpellier

Overview New methods War stories Conclusions References

A cautionary example

Mismatch between simpleGibbs sampler withequivalent WinBUGSimplementation

Bug (??) in relativelywidely used software;haven't found time todiagnose/�x it!

Contribution

Den

sity

05

1015

0.0 0.2 0.4 0.6 0.8 1.0

NWFLNWFL

0.0

0.5

1.0

1.5

SOFLNWFL

02

46

810

NWFLSOFL

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

SOFLSOFL

tmcmcwbugswbugsLwbugsLL

Page 34: Montpellier

Overview New methods War stories Conclusions References

Mixed models

(generalized) linear mixedmodels

very useful framework:GLMs (exponential family)+ random e�ects

linear algebra magic by D.Bates, M. Maechler

technology transfer

amazingly popular!

● ● ●● ● ● ●

● ●●

●●

● ●

●●

1995 2000 2005 2010 2015year

cita

tions w

GLMM paper

all others

Page 35: Montpellier

Overview New methods War stories Conclusions References

Conclusions

Can't be too gloomy . . . trickle-down process does work

But we should at least recognize when we are amusingourselves, and when we are doing science

Can we formalize these ideas?Is it worth it?

Page 36: Montpellier

Overview New methods War stories Conclusions References

Page 37: Montpellier

Overview New methods War stories Conclusions References

References

Angrist, J.D. and Pischke, J.S., 2009. MostlyHarmless Econometrics: An Empiricist'sCompanion. Princeton University Press,Princeton, 1 edition edition. ISBN9780691120355.

Bolker, B., Okuyama, T., et al., 2003. EcologicalApplications, 13(3):763�775.

Bolker, B.M., Okuyama, T., et al., 2007.Molecular Ecology, 16:685�695.doi:10.1111/j.1365-294X.2006.03161.x.

Breiman, L., 2001. Statistical Science,16(3):199�215. ISSN 08834237.

Brooks, M.E., McCoy, M.W., and Bolker, B.M.,2013. PLoS ONE, 8(10):e76389.doi:10.1371/journal.pone.0076389.

Chen, Y., Smith, S.J., and Campana, S.E., 2010.Canadian Journal of Fisheries and AquaticSciences, 67(10):1533�1548. ISSN 0706-652X,1205-7533. doi:10.1139/F10-078.

Lavine, M., Beckage, B., and Clark, J.S., 2002.Journal of Agricultural, Biological, andEnvironmental Statistics, 7(1):21�41. ISSN1085-7117, 1537-2693.doi:10.1198/108571102317475044.

Murtaugh, P.A., 2007. Ecology, 88(1):56�62.

�, 2009. Ecology Letters, 12(10):1061�1068.ISSN 1461023X.doi:10.1111/j.1461-0248.2009.01361.x.

�, 2014. Ecology, 95(3):611�617. ISSN0012-9658. doi:10.1890/13-0590.1.

O'Hara, R.B. and Kotze, D.J., 2010. Methods inEcology and Evolution, 1(2):118�122. ISSN2041-210X.doi:10.1111/j.2041-210X.2010.00021.x.

Okuyama, T. and Bolker, B.M., 2005. EcologicalApplications, 15(1):315�325.

Ovaskainen, O., Finkelshtein, D., et al., 2014.Theoretical Ecology, 7(1):101�113. ISSN1874-1738, 1874-1746.doi:10.1007/s12080-013-0202-8.

Pella, J. and Masuda, M., 2001. FisheriesBulletin, 99:151�167.

Warton, D.I. and Hui, F.K.C., 2010. Ecology,92(1):3�10. ISSN 0012-9658.doi:10.1890/10-0340.1.