john bunge jab18@cornell department of statistical science cornell university

55
John Bunge [email protected] Department of Statistical Science Cornell University 1 Estimating Microbial Diversity

Upload: aida

Post on 02-Feb-2016

61 views

Category:

Documents


5 download

DESCRIPTION

Estimating Microbial Diversity. John Bunge [email protected] Department of Statistical Science Cornell University. Thanks to: Amy Willis Fiona Walsh David Mark Welch Colleagues too numerous to mention. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: John Bunge jab18@cornell Department of Statistical Science Cornell University

John [email protected]

Department of Statistical ScienceCornell University

1

Estimating Microbial DiversityEstimating Microbial Diversity

Page 2: John Bunge jab18@cornell Department of Statistical Science Cornell University

2

Thanks to:Amy Willis

Fiona WalshDavid Mark Welch

Colleagues too numerous to mention

Thanks to:Amy Willis

Fiona WalshDavid Mark Welch

Colleagues too numerous to mention

Bunge, J., Willis, A. and Walsh, F. (2013) Estimating the number of species in microbial diversity studies. Ann. Rev. of Statist. and its Appl. v.1. Forthcoming.

Page 3: John Bunge jab18@cornell Department of Statistical Science Cornell University

3

Statisticians

Page 4: John Bunge jab18@cornell Department of Statistical Science Cornell University

4

Bioinformaticists

Page 5: John Bunge jab18@cornell Department of Statistical Science Cornell University

5

Statistics is not a collection of formulae, nor computer programs, but a conceptual framework, an intellectual

stance, a point of view, a theory of knowledge

Fundamental idea:distinction between sample and population

Classical or frequentist statistics is fundamentally dualistic

Statistics is not a collection of formulae, nor computer programs, but a conceptual framework, an intellectual

stance, a point of view, a theory of knowledge

Fundamental idea:distinction between sample and population

Classical or frequentist statistics is fundamentally dualistic

Page 6: John Bunge jab18@cornell Department of Statistical Science Cornell University

6

Plato’s Republic, VII,7

Behold! human beings living in an underground den, which has a mouth open towards the light and reaching all along the den; here they have been from their childhood […]Above and behind them a fire is blazing at a distance, […] you will see, if you look, a low wall built along the way, like the screen which marionette players have in front of them, over which they show the puppets.  […]They see only their own shadows, or the shadows of one another, which the fire throws on the opposite wall of the cave […]To them, I said, the truth would be literally nothing but the shadows of the images.

Page 7: John Bunge jab18@cornell Department of Statistical Science Cornell University

7

Old Testament

Ecclesiastes 1:15

What is crooked cannot be straightened; what is lacking cannot be counted.

New Testament

Corinthians 13:12

For now we see through a glass, darkly, but then face to face: now I know in part; but then shall I know even as also I am known.

Page 8: John Bunge jab18@cornell Department of Statistical Science Cornell University

8

The knowledge problem in microbiome studies

Metagenomics is the study of metagenomes, genetic material recovered directly from environmental samples.-Wikipedia

DNA extraction bias notwithstanding, metagenomics is the most unrestricted and comprehensive approach. Our ability to interpret these data is always improving, and we stand on a precipice of unprecedented discovery […] Microbes are not the only group to benefit from these surveys; viruses exist at 10 times the abundance of microbes […]. - Gilbert, 2011

BUT: METAGENOMIC SURVEYS RECOVER ONLY A SMALL FRACTION OF THE EXTANT DIVERSITY. NONETHELESS,

MANY METHODS TREAT THE OBSERVED SAMPLE AS THE POPULATION.

Page 9: John Bunge jab18@cornell Department of Statistical Science Cornell University

9

MACHINES

The fundamental idea of statistics:Distinction between

Population (or universe) and Sample (or data)

The fundamental idea of statistics:Distinction between

Population (or universe) and Sample (or data)

Page 10: John Bunge jab18@cornell Department of Statistical Science Cornell University

10

Statistical inference: Extract maximum information from sample in order to draw conclusions about population

Inductive not deductive

THE SAMPLE IS A SUBSET OF THE POPULATION

Page 11: John Bunge jab18@cornell Department of Statistical Science Cornell University

11

Question: In a microbial diversity study,What is the population?

DefinitionThe population is what would be observed if the operative sampling and analysis protocols were carried out to infinite

effort.

Page 12: John Bunge jab18@cornell Department of Statistical Science Cornell University

12

How do we statistically estimate total microbial taxonomic

richness?

Page 13: John Bunge jab18@cornell Department of Statistical Science Cornell University

13

Cluster sequences at some % “identity,” typically 97%{clusters} = {OTUs}

OTU = “operational taxonomic unit”

Page 14: John Bunge jab18@cornell Department of Statistical Science Cornell University

14

Statistical problem:

Estimate total population diversity – number of species, classes, taxa, OTUs – based on frequency count data

Data = # of units observed exactly once in sample (singletons);# observed exactly twice (doubletons); # observed exactly three times; … .

Page 15: John Bunge jab18@cornell Department of Statistical Science Cornell University

15

Frequency count data exampleMicrobial ecology

• Data from soil in apple orchards• Use of antibiotics on bacterial populations in

soil ecosystems • Singletons ≈ 2x doubletons – may be 10x!• Goal is to estimate taxonomic richness of

community• Change with respect to

intervention/covariates/metadata

Fiona Walsh et al.

Walsh F, Owens S, Duffy B, Smith DP, Frey JE. 2013. Streptomycin use in apple orchards did not alter the soil bacterial communities

freq count freq count1 317 124 12 179 128 13 127 133 14 77 134 15 66 149 16 61 159 17 39 170 18 42 184 19 29 195 1

10 24 208 111 12 232 112 27 … 262 1

Page 16: John Bunge jab18@cornell Department of Statistical Science Cornell University

16

Issues:•High diversity•Typical of microbial data•Singletons ~ 2x doubletons•Data acquisition / bioinformatic issues•Spurious singletons?

• Correct at what stage? Statistical approach?

Page 17: John Bunge jab18@cornell Department of Statistical Science Cornell University

17

Statistical inference from frequency count data

STANDARD MODEL• C classes/taxa/species in population. Each species independently contributes Poisson-distributed # of representatives to the sample.

• Counts ~ zero-truncated mixed Poisson.

)(Poisson~ 11 X

)(Poisson~ 22 X)(Poisson~ 33 X

)(Poisson~ CCX

sample

Page 18: John Bunge jab18@cornell Department of Statistical Science Cornell University

18

The mixed-Poisson model

Species (taxon) i contributes a Poisson-distributed number Xi of replicates to the sample – i.e., taxon i appears in the sample Xi times. Units appear independently in the sample

Fundamental problem: heterogeneity, i.e., unequal Poisson means λi

• Standard approach: model λi‘s as i.i.d. replicates from some mixing distribution F

• Frequency counts fi are then marginally i.i.d. F-mixed Poisson random variables

• Zero-truncated since zero counts Xi are unobservable

Page 19: John Bunge jab18@cornell Department of Statistical Science Cornell University

19

The mixed-Poisson model cont’d

Mixing distribution F, i.e., distribution of sampling intensities λ, is also called species abundance distribution Probably a misnomer Mathematical treatment (marginalization) implies that each species contribution to the sample is independent and identically distributed Both assumptions are certainly wrong How to account for dependent or differently distributed species counts? Not in standard model.

Page 20: John Bunge jab18@cornell Department of Statistical Science Cornell University

20

Mixing distributions F

Parametric, low-dimensional parameter vector• None ≡ point mass at λ ≡ all equal species

sizes• Gamma (Fisher, 1943)• Lognormal• Inverse Gaussian, generalized inverse

Gaussian (Sichel)• Pareto• Log-t• Stable

Finite mixture of exponentials - semiparametric

Page 21: John Bunge jab18@cornell Department of Statistical Science Cornell University

21

Richness estimation under the Poisson model

Diversity estimate is then

where PF(0) = F-mixed Poisson probability of 0:

is the Horvitz-Thompson estimator (HTE) and is uniformly minimum variance unbiased (UMVU).

Require empirical version of , i.e., require estimate of PF(0) (frequentist version).

)0(1

samplein taxa#:ˆ

FF PN

eEdFeP FF )()0(

FN

FN

Page 22: John Bunge jab18@cornell Department of Statistical Science Cornell University

22

Richness estimation under the Poisson model, cont’d

Require empirical version of HTE

Estimate θ by ML, using zero-truncated F-mixed Poisson, conditional on # of observed taxa. Final estimator:

SE via Fisher information CI via (approximation to) profile likelihood

),(,)0(1

samplein taxa#:ˆ FF

PN

FF

)ˆ,0(1

samplein taxa#:ˆ

F

FP

N

Page 23: John Bunge jab18@cornell Department of Statistical Science Cornell University

23

CatchAll softwarewww.northeastern.edu/catchall

or: STAMPS!

Developed under NSF grant DEB – 0816638 by JB/LW/SC, in C# & C Implements

o finite mixtures of 0 – 4 exponential components (F)o weighted linear regression procedureo all Chao-type nonparametric procedureso model evaluation/GOF/selection/outlier assessment

Produces estimates, SEs, & CIsFast, efficient, platform-independentExcel graphics (VBA) packageSummary or copious output (text files)

Bunge J, Woodard L, Böhning D, Foster JA, Connolly S, Allen HK. 2012b. Estimating population diversity with CatchAll. Bioinformatics 28:1045--47

Page 24: John Bunge jab18@cornell Department of Statistical Science Cornell University

24

Partial CatchAll summary output for apple orchard data

Total Number of Observed Species = 1187 Model Tau

Observed Sp

Estimated Total Sp SE

Lower CB

Upper CB GOF0 GOF5

Best Parm Model

ThreeMixedExp 184 1183 1823.5 122.4 1625.1 2111.6 0.0118 0.6038

Parm Model 2aThreeMixedExp 118 1175 1854.9 158 1609.8 2242.3 0.1428 0.3632

Parm Model 2bThreeMixedExp 262 1187 1797.6 101.6 1628.6 2031.3 0 0.4029

Parm Model 2cTwoMixedExp 23 1087 1865.5 141 1640.4 2202.2 0.0001 0.0208

WLRM UnTransf 10 961 2285.8 572.7 1607.4 4058.9 0.0206

Parm Max TauThreeMixedExp 262 1187 1797.6 101.6 1628.6 2031.3 0 0.4029

WLRM Max Tau LogTransf 31 1114 1390.3 30.4 1338.9 1459.2

Page 25: John Bunge jab18@cornell Department of Statistical Science Cornell University

25

CatchAll fitted models for apple orchard data

Τ = 184

Page 26: John Bunge jab18@cornell Department of Statistical Science Cornell University

26

Data-analytic considerations• Problem of right cutoff point τ

o Typically no parametric model will fit complete frequency count dataseto Too many right outliers – highly abundant taxa in sample – with large gaps between countso Nonparametric methods do even worse with outliers, diverging to ∞ as outliers are included in data

• Data-analytic solution: remove large frequency counts forfrequencies > some cutoff τ

o Chao1: τ = 2o Chao-type coverage-based nonparametric methods: τ = 10 (arbitrary)o Parametric mixture models: τ selected by goodness-of-fit algorithmo Weighted linear regression model: selected by goodness-of-fit

• Further problem: model selection and outlier deletion confoundedo Computational solution: compute all methods at every τo Requires optimized codeo Use double selection algorithm to select “best of the best”o Introduces simultaneous inference problem: large number of simultaneous GOF tests. Little theory exists to correct for this.

Page 27: John Bunge jab18@cornell Department of Statistical Science Cornell University

27

Statistical analysis of standard model: The bigger picture

Philosophy/approach

Parametric Nonparametric

Frequentist Maximum likelihood(Bunge et al.)Weighted linear regression(Rocchetti et al. 2011)

Coverage-based(Chao et al.);Zelterman; NPMLE(Böhning et al.)

Bayesian Objective Bayes(Barger et al.; Quince et al.)

???(Tardella et al. for capture-recapture)

Page 28: John Bunge jab18@cornell Department of Statistical Science Cornell University

28

Statistical analysis of standard model – Chao-type nonparametrics• Coverage-based approaches• Coverage = proportion of population represented in sample• Random variable not parameter• Can interpret 1 – PF(0) as surrogate for coverage• Turing’s estimate of PF(0):

where n = # of individual units in sample• Good-Turing estimate of diversity:

• Chao’s abundance-based coverage estimators (ACE):

Good-Turing + adjustment for heterogeneity

nf /1

samplein taxaof #

1

n

f1

Chao, A. & J. Bunge. 2002. Estimating the number of species in a stochastic abundance model. Biometrics 58: 531–539

Page 29: John Bunge jab18@cornell Department of Statistical Science Cornell University

Coverage-based estimators diverge to

infinity as large frequency counts are

included

Hence coverage-based estimators

require τ ≤ 10

Page 30: John Bunge jab18@cornell Department of Statistical Science Cornell University

30

Statistical analysis of standard model: general nonparametrics

• Nonparametric maximum likelihood estimation• Leave species abundance distribution F unspecified, i.e., F varies across all possible distributions• Mathematical implications: F is actually non-identifiable• Nevertheless NPMLE is possible in principle.• Computational issues: difficult numerical search, highly complex error estimation.• Software CAMCR

Böhning D, Kuhnert R. 2009. CAMCR: Computer-Assisted Mixture model analysis for Capture-Recapture count data. AStA Adv. Stat. Anal. 93:61--71

Page 31: John Bunge jab18@cornell Department of Statistical Science Cornell University

31

The Bayesian paradigm• Rev. Thomas Bayes

• Bayesian statistics: Probabilistic & statistical statements concern degrees of belief• Usually parametric: statements concern values of parameters, e.g., species richness. Nonparametric Bayes is possible but complex.• Procedure:1.Investigator first declares existing belief about population value: this is prior distribution2.Collect sample data3.Update prior, based on data, to obtain posterior, i.e., final state of knowledge or belief about population.

Page 32: John Bunge jab18@cornell Department of Statistical Science Cornell University

)(

)()|()|(

AP

BPBAPABP

priorlikelihood

)parameters()parameters|data()data|parameters(

PPP

The Bayesian paradigm cont’d

Bayes’ Theorem:

Posterior distribution:

Bayesian computation is now fairly well established

Page 33: John Bunge jab18@cornell Department of Statistical Science Cornell University

Bayesian estimation of taxonomic richnessbased on the standard model

• Species abundance distribution F is parametric: F depends on a small number of parameters (typically 2-3), called

• Parameter of interest is total richness C• Procedure:

1. Establish prior distributions for and C 2. Likelihood function is known (based on mixed-

Poisson)3. Run Bayesian machinery4. Obtain posterior distribution, estimate, “credible

interval,” etc.• Quince et al. quasi-noninformative priors; Barger et al.

formal objective priors. Active research area in statistics.

Quince C, Curtis TP, Sloan WT. 2008. The rational exploration of microbial diversity. ISME J. 2:997—1006; Barger K, Bunge J. 2011. Objective Bayesian estimation for the number of species. J. Bayesian Analysis 5:765--86

Page 34: John Bunge jab18@cornell Department of Statistical Science Cornell University

34

A New Hope

Is it possible to estimate taxonomic richness withouta species abundance distributionindependent species contributions to the sampleidentically distributed species contributions to the sample

?

Yes, using ratios of frequency counts.

Page 35: John Bunge jab18@cornell Department of Statistical Science Cornell University

35

breakaway: Estimating taxonomic richness based onratios of frequency counts

jf

fjjr

j

j

1)1(:)(

j count (j+1)f_(j+1)/f_j1 317 1.132 179 2.133 127 2.434 77 4.295 66 5.556 61 4.487 39 8.628 42 6.219 29 8.28

10 24 5.5011 12 27.0012 27 7.70

Idea: ratios are ~ linearProject line downward to obtain f0 = # of unobserved species

Page 36: John Bunge jab18@cornell Department of Statistical Science Cornell University

36

breakaway: Estimating taxonomic richness based onratios of frequency counts, cont’d

3

32

21

33

22101

1 jjj

jjj

f

f

j

j

Some issues:

•Straight-line fit may go negative!•Can be fixed by ad hoc log-transformation (Rocchetti et al.)•Broad generalization: represent ratio of frequency counts as ratio of polynomials•Deep probabilistic justification; corrects negativity

Rocchetti I, Bunge J, Böhning D. 2011. Population size estimation based upon ratios of recapture probabilities. Ann. Appl. Stat. 5:1512—33; Willis A. and Bunge J. (2013) in prep.

Page 37: John Bunge jab18@cornell Department of Statistical Science Cornell University

37

breakaway: Estimating taxonomic richness based onratios of frequency counts, cont’d

################## Smoothed weights ##################The best estimate of total diversity is 1800

with std error 256The model employed was model_1_1The function selected was

f_{x+1}/f_{x} ~ (beta0+beta1*(x-xbar))/(1+alpha1*(x-xbar))Coef estimates Coef std errors

beta0 1.11078693 0.13241518beta1 0.05383757 0.02916098alpha1 0.03002143 0.03840271

Page 38: John Bunge jab18@cornell Department of Statistical Science Cornell University

38

breakaway: Estimating taxonomic richness based onratios of frequency counts, cont’d

• Nonlinear regression• Heteroscedastic (changing variance)• Autocorrelated: f2/f1 is correlated with f3/f2, etc.• Collinear: parameter estimates of α’s and β’s highly

correlated unless corrected• Multiple significant numerical challenges

Statistical questions• Model selection – degree of numerator and denominator

polynomials• Error estimation• Underlying probability theory: what do these models

imply, and what are they implied by?

Page 39: John Bunge jab18@cornell Department of Statistical Science Cornell University

39

Next generation sequencing technology […] has revolutionised the study of microbial diversity as it is now possible to sequence a substantial fraction of the 16S rRNA genes in a community. However, […] because of the large read numbers and the lack of consensus sequences it is vital to distinguish noise from true sequence diversity in this data. Otherwise this leads to inflated estimates of the number of types or operational taxonomic units (OTUs) present.

- Quince et al. (2011)

Noise and unreliable low frequency counts

Page 40: John Bunge jab18@cornell Department of Statistical Science Cornell University

Methods to address unreliable low frequency counts

40

I. Fix the data at the source!

•Example: PyroNoise and AmpliconNoise

- aim at “separately removing 454 sequencing

errors and PCR single base errors.” (Quince 2011)

•Direct, non-statistical approach

Page 41: John Bunge jab18@cornell Department of Statistical Science Cornell University

Methods to address unreliable low frequency counts

41

Page 42: John Bunge jab18@cornell Department of Statistical Science Cornell University

Methods to address unreliable low frequency counts

42

III. Deleting the high-diversity component of a

mixture model

Bunge J, Böhning D, Allen H, Foster JA. 2012a. Estimating population diversity with unreliable low frequency counts. In Biocomputing 2012: Proceedings of the Pacific Symposium, pp. 203--12. Hackensack, NJ: World Sci. Publ

Page 43: John Bunge jab18@cornell Department of Statistical Science Cornell University

Methods to address unreliable low frequency counts

43

IV. Bayesian approaches

•Informative or subjective: investigator specifies

non-trivial downweighting or rapidly decreasing prior

for higher diversity values

•Specific choice of prior?

Page 44: John Bunge jab18@cornell Department of Statistical Science Cornell University

Numerical results from viral phage data:Lower bounds and component deletion

44

Method EstDiv SE LCB UCBPoisson 8730 103 8535 8938

GoodTuring 11690 346 11050 12407

ThreeMixedExp 67792 8656 53009 87195

Discounted: TwoMixedExp 1727 221 1410 2305

Page 45: John Bunge jab18@cornell Department of Statistical Science Cornell University

Some notes on β-diversity•Crucial to distinguish between

Statistical inference procedures that (attempt to) account for unobserved as well as observed diversity

Procedures (computational, graphical, or qualitative) that treat the observed sample as the population. UniFrac, “ordination” methods, co-inertia.

•Only the former considered here. Estimation of population parameters, possible hypothesis testing.

45

Page 46: John Bunge jab18@cornell Department of Statistical Science Cornell University

Statistical inference for comparing taxonomic diversity across populations

•Simplest version: Estimate richness in each population, with associated standard errors and confidence intervals, & compare (e.g., do CI’s overlap?)•Can be done with existing methods: parametric, nonparametric, Bayesian, etc.•Exactly ONE known inferential procedure. Lower bound for # of shared taxa:

(D12 = observed # of shared species, fjk = # of species observed j times in sample 1 and k times in sample 2, a and b = constants)

46

2 2 212 12 1 2 1 2 11 22/ 2 / 2 / 4S D af f bf f abf f

Pan HY, Chao A, Foissner W. 2009. A nonparametric lower bound for the number of species shared by multiple communities. J. Agric. Biol. Environ. Stat. 14:452--68

Page 47: John Bunge jab18@cornell Department of Statistical Science Cornell University

Statistical inference for β-diversity:other scenarios

•Inference for the Jaccard index, accounting for unobserved species (Chao et al.)•Inference for “the probability of a draw from one distribution not being observed in k draws from another distribution.” (Hampton et al.)•Statistical work in this area not extensive – very fertile area for research.

47

Chao A, Chazdon RL, Colwell RK, Shen T-J. 2006. Abundance-based similarity indices and their estimation when there are unseen species in samples. Biometrics 62:361—71; Hampton J, Lladser ME. 2012. Estimation of distribution overlap of urn models. PLoS ONE 7:e42368

Page 48: John Bunge jab18@cornell Department of Statistical Science Cornell University

48

NEVERthrow away data when doing

statistical inference“Not even wrong” – Richard Feynman

Page 49: John Bunge jab18@cornell Department of Statistical Science Cornell University

49

There is no post hoc statistical fix for• Ill-posed research problem• Vaguely defined population• Statistical model not appropriate for

o population descriptiono sample generation process

• Model must compromise between detailed phenomenological description and parsimony• “To what extent can we idealize the properties of the system and still obtain satisfactory results? The answer to this question can only be given in the end by experiment. Only the comparison of the answers provided by analysis of our model with the results of the experiment will enable us to judge whether the idealization is legitimate.” Andronov (1937) Theory of Oscillators.

There is no post hoc statistical fix for• Ill-posed research problem• Vaguely defined population• Statistical model not appropriate for

o population descriptiono sample generation process

• Model must compromise between detailed phenomenological description and parsimony• “To what extent can we idealize the properties of the system and still obtain satisfactory results? The answer to this question can only be given in the end by experiment. Only the comparison of the answers provided by analysis of our model with the results of the experiment will enable us to judge whether the idealization is legitimate.” Andronov (1937) Theory of Oscillators.

Page 50: John Bunge jab18@cornell Department of Statistical Science Cornell University

50

On the sociology of science• Fact: Universities have statistics departments!

o Cornell: www.stat.cornell.eduo At least 131 university stat dept’s in U.S. – random sample of 10:

• University of California, Berkeley, Division of Biostatistics • Princeton University, Program in Statistics and Operations Research • Bowling Green State University, Department of Applied Statistics and Operations Research • University of Illinois, Urbana-Champaign, Department of Statistics • University of South Carolina, Department of Statistics • Columbia School of Public Health, Division of Biostatistics • Medical College of Georgia, Office of Biostatistics and Bioinformatics • Duke University, Institute of Statistics and Decision Sciences • Yale University Department of Statistics • University of Michigan, Department of Biostatistics

• Collaboration extremely valuable in both directions (even though academic incentive structure may not immediately reward it)• Be persistent: “Fall down seven times, get up eight”

Page 51: John Bunge jab18@cornell Department of Statistical Science Cornell University

51

CatchAllhttp://www.northeastern.edu/catchall/ or STAMPS!

•V.4 now available; mothur uses v.3 (?)•Two programs: basic analysis program + Excel graphics spreadsheet (macros)•Windows GUI, Windows command-line – .Net framework must be installed•Mac OS/Linux command-line – mono must be installed.•Input data file structure: *.csv (comma-separated values)

1,f1

2,f2

…m,fm

Page 52: John Bunge jab18@cornell Department of Statistical Science Cornell University

52

CatchAll cont’d

•Read in data•Go! (Can set option to omit most complex model, if too time-consuming; see manual)•Output files appear in “Output” folder/directory

datasetname_Analysis.csvComplete listing of all analyses

datasetname_BestModelsAnalysis.csvColumn‐formatted summary analysis output

datasetname_BestModelsFits.csvFitted values for the "best models" as selected by the model

selection algorithm

datasetname_BubblePlot.csvData to generate bubble plots using Excel spreadsheet

Page 53: John Bunge jab18@cornell Department of Statistical Science Cornell University

53

CatchAll cont’d: BestModelsAnalysis file

•Total number of observed species: self‐explanatory•Model: see manual•Tau: upper‐frequency cutoff•Observed Sp: number of species (counts) with frequencies up to τ only•Estimated total Sp: final estimate of the total number of species in the population•SE: standard error of preceding estimate•Lower CB, Upper CB: lower and upper 95% confidence bounds•GOF0, GOF5: Pearson goodness‐of‐fit p‐values, uncorrected and corrected

Page 54: John Bunge jab18@cornell Department of Statistical Science Cornell University

54

CatchAll cont’d: BestModelsAnalysis file

•Best Parm Model; Parm Model 2a, 2b, 2c. Parametric models (and τ’s) selected by various goodness‐of‐fit criteria•WLRM: weighted linear regression model•Parm Max Tau, WLRM Max Tau: best parametric model and WLRM computed on entire dataset•Best Discounted: best parametric model with low‐frequency/high‐diversity component deleted•Non‐P 1: Chao1, nonparametric lower bound for total number of species•Non‐P 2. Chao’s ACE or high‐diversity variant ACE1 (τ ≤ 10)•Non‐P 3. Chao’s ACE (τ ≤ 10)

Page 55: John Bunge jab18@cornell Department of Statistical Science Cornell University

55

CatchAll cont’d: Analysis file

•All models & procedures computed by CatchAll, including several not reported in summary analysis•All cutoffs τ•All supplementary/supporting information (GOF etc.)

•Question: what if no “best” parametric model selected?o Means no model passed most stringent GOF

criteriao Revert to alternative models (2a-c)o If necessary revert to lower bounds (Chao1 etc.)