dna barcoding statistics rasmus nielsen university of copenhagen

Post on 03-Jan-2016

217 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DNA Barcoding DNA Barcoding StatisticsStatistics

Rasmus NielsenRasmus Nielsen

University of CopenhagenUniversity of Copenhagen

Statistical ApproachesStatistical Approaches

Hypothesis testing problem.Hypothesis testing problem. Test membership of specific species.Test membership of specific species.

Decision theoretic/Bayesian problemDecision theoretic/Bayesian problem Choose assignment by weighing how Choose assignment by weighing how

desirable/undesirable false positives desirable/undesirable false positives and false negatives are.and false negatives are.

Species assignment and higher Species assignment and higher taxonomic assignment without taxonomic assignment without population genetics.population genetics.

Approach 1: Hypothesis Approach 1: Hypothesis testingtesting

Test HTest H00: : X Si In divergence model In divergence model X Si ~ ~ TT = 0 = 0 Likelihood ratio test Likelihood ratio test

based on based on

)(max

)(log2 0

TL

TL

T

T

a

T

Distribution of LRDistribution of LR

Statistical ApproachesStatistical Approaches

Hypothesis testing problem.Hypothesis testing problem. Test membership of specific species.Test membership of specific species.

Decision theoretic/Bayesian problemDecision theoretic/Bayesian problem Choose assignment by weighing how Choose assignment by weighing how

desirable/undesirable false positives desirable/undesirable false positives and false negatives are.and false negatives are.

Species assignment and higher Species assignment and higher taxonomic assignment without taxonomic assignment without population genetics.population genetics.

Approach 2: Classical Approach 2: Classical (decision theoretic) (decision theoretic)

assignment approachassignment approachBase assignment on Base assignment on Pr(X Si | D, X)

X: query sequenceSi : set of (mostly unobserved) sequences from species ID: all the avcailable DNA sequence data

ComputationComputation

Use MCMC under coalescence Use MCMC under coalescence model with divergence between model with divergence between species and other parameters.species and other parameters.

Calculate Calculate Pr(X Si | D, X) from MCMC output.

Currently only implemented for two species

Skipper butterfly Skipper butterfly Astraptes Astraptes fulgeratorfulgerator

Skipper butterfly Skipper butterfly Astraptes Astraptes fulgeratorfulgerator

Why not use assignment Why not use assignment based on marginal based on marginal

probabilities?probabilities?What if we usedWhat if we used

i.e. we can calculate posterior probabilities i.e. we can calculate posterior probabilities by assuming independence, i.e. ignoring by assuming independence, i.e. ignoring phylogeny.phylogeny.

jjjj

iiiii SXpSXX

SXpSXXXSX

)(),|Pr(

)(),|Pr(),|Pr(

D

DD

Assignment errorAssignment error

Approach 3: Coaleescence- Approach 3: Coaleescence- ShmoalescenceShmoalescence

Assign based on monophyly with Assign based on monophyly with other members of species other members of species (phylogenetic criterion).(phylogenetic criterion).

Do not estimate phylogeny but only Do not estimate phylogeny but only placement of query sequence placement of query sequence

of phylogeny.of phylogeny. Calculate posterior Calculate posterior

probability of assignment.probability of assignment.

AlgorithmsAlgorithms

BLAST to identify candidate set of BLAST to identify candidate set of species.species.

Possible iteration to ensure a Possible iteration to ensure a phylogenetic diverse sample.phylogenetic diverse sample.

Align and pipe to special version of Align and pipe to special version of MrBayes (by J. Huelsenbeck) which MrBayes (by J. Huelsenbeck) which maintains phylogenetic constraints.maintains phylogenetic constraints.

Caluclate assignment probability Caluclate assignment probability based on MrBayes output.based on MrBayes output.

Example taxonomy Example taxonomy summarysummary

fig2

Greenland Ice Cores Greenland Ice Cores ExampleExample

Greenland Ice Cores Greenland Ice Cores ExampleExample

Neanderthal ExampleNeanderthal Example

AcknowledgmentsAcknowledgments

Misha Matz (Coalescence based Misha Matz (Coalescence based methods).methods).

Wouter Boomsma and Kasper Munch Wouter Boomsma and Kasper Munch (Phylogenetic methods).(Phylogenetic methods).

John Huelsenbeck (MrBayes).John Huelsenbeck (MrBayes). Eske Willerslev (Ice and DNA Eske Willerslev (Ice and DNA

examples).examples). Jody Hey (discussion and inspiration).Jody Hey (discussion and inspiration).

top related