ismb tutorial: computational methods for comparative
TRANSCRIPT
![Page 1: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/1.jpg)
Phylogenomictreeconstruction
ISMBTutorial:Computationalmethodsforcomparativeregulatorygenomics
Lecture3SiavashMirarab
�1
![Page 2: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/2.jpg)
Topicsinthislecture
• Phylogenomics:premiseandchallenges
• Speciestreesversusgenetrees
• Causesfordiscordance
• Phylogeneticinferencedespitediscordance
• Questions
• Models
• Methodchoices
�2
![Page 3: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/3.jpg)
Phylogeny
OrangutanGorilla ChimpanzeeHuman
Phylogeny
�3
![Page 4: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/4.jpg)
Treeoflife
source: http://www.evogeneao.com/
“Nothing in biology makes sense except in the light of evolution.” Dobzhansky, 1973
Treeoflife
�4
![Page 5: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/5.jpg)
Treeoflife
source: http://www.evogeneao.com/
“Nothing in biology makes sense except in the light of evolution.” Dobzhansky, 1973
“Nothing in evolution makes sense except in the light of phylogeny.” multiple coinage
Treeoflife
�4
![Page 6: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/6.jpg)
5
ACTGCACACCG ACTGCCCCCG AATGCCCCCG CTGCACACGGOrangutanGorilla ChimpanzeeHuman
PhylogeneticreconstructionfromDNA
![Page 7: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/7.jpg)
5
ACTGCACACCG ACTGCCCCCG AATGCCCCCG CTGCACACGGOrangutanGorilla ChimpanzeeHuman
CTGCACACCGCTGCACACCG
CTGCACACGG
PhylogeneticreconstructionfromDNA
![Page 8: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/8.jpg)
5
ACTGCACACCG ACTGCCCCCG AATGCCCCCG CTGCACACGGOrangutanGorilla ChimpanzeeHuman
CTGCACACCGCTGCACACCG
CTGCACACGG
PhylogeneticreconstructionfromDNA
![Page 9: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/9.jpg)
5
Orangutan
Gorilla
ChimpanzeeHuman
ACTGCACACCG
ACTGC-CCCCG
AATGC-CCCCG
-CTGCACACGG
D
ACTGCACACCG ACTGCCCCCG AATGCCCCCG CTGCACACGGOrangutanGorilla ChimpanzeeHuman
CTGCACACCGCTGCACACCG
CTGCACACGG
PhylogeneticreconstructionfromDNA
![Page 10: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/10.jpg)
5
Orangutan
Gorilla
ChimpanzeeHuman
ACTGCACACCG
ACTGC-CCCCG
AATGC-CCCCG
-CTGCACACGG
D TP (D|T )
Orangutan
Gorilla
Chimpanzee
Human
ACTGCACACCG ACTGCCCCCG AATGCCCCCG CTGCACACGGOrangutanGorilla ChimpanzeeHuman
CTGCACACCGCTGCACACCG
CTGCACACGG
PhylogeneticreconstructionfromDNA
![Page 11: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/11.jpg)
5
Orangutan
Gorilla
ChimpanzeeHuman
ACTGCACACCG
ACTGC-CCCCG
AATGC-CCCCG
-CTGCACACGG
D TP (D|T )
Orangutan
Gorilla
Chimpanzee
Human
ACTGCACACCG ACTGCCCCCG AATGCCCCCG CTGCACACGGOrangutanGorilla ChimpanzeeHuman
CTGCACACCGCTGCACACCG
CTGCACACGG
statistical support
81%
PhylogeneticreconstructionfromDNA
![Page 12: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/12.jpg)
Phylogenomics:promise
6
gene 999gene 2ACTGCACACCG ACTGC-CCCCG AATGC-CCCCG -CTGCACACGG
CTGAGCATCG CTGAGC-TCG ATGAGC-TC- CTGA-CAC-G
AGCAGCATCGTG AGCAGC-TCGTG AGCAGC-TC-TG C-TA-CACGGTG
CAGGCACGCACGAA AGC-CACGC-CATA ATGGCACGC-C-TA AGCTAC-CACGGAT
gene 1000gene 1
Data (#genes)
Species tree error
Orangutan
Gorilla
Chimpanzee
Human
81%
“gene” here simply means a (recombination-free) parts of the genome
more data ⟶ better inference
![Page 13: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/13.jpg)
Phylogenomics:promise
6
gene 999gene 2ACTGCACACCG ACTGC-CCCCG AATGC-CCCCG -CTGCACACGG
CTGAGCATCG CTGAGC-TCG ATGAGC-TC- CTGA-CAC-G
AGCAGCATCGTG AGCAGC-TCGTG AGCAGC-TC-TG C-TA-CACGGTG
CAGGCACGCACGAA AGC-CACGC-CATA ATGGCACGC-C-TA AGCTAC-CACGGAT
gene 1000gene 1
Data (#genes)
Species tree error
Orangutan
Gorilla
Chimpanzee
Human
81%
“gene” here simply means a (recombination-free) parts of the genome
more data ⟶ better inference
![Page 14: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/14.jpg)
7
Phylogenomics: onlyafewyearslater
![Page 15: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/15.jpg)
Genetreediscordance
Orang.Gorilla ChimpHuman Orang.Gorilla Chimp Human
gene1000gene 1
�8
![Page 16: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/16.jpg)
Genetreediscordance
OrangutanGorilla ChimpHuman
The species tree
A gene treeOrang.Gorilla ChimpHuman Orang.Gorilla Chimp Human
gene1000gene 1
�8
![Page 17: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/17.jpg)
Genetreediscordance
OrangutanGorilla ChimpHuman
The species tree
A gene treeOrang.Gorilla ChimpHuman Orang.Gorilla Chimp Human
Causes of gene tree discordance include:– Duplication and loss – Horizontal Gene Transfer (HGT) and Hybridization – Incomplete Lineage Sorting (ILS)
gene1000gene 1
�8
![Page 18: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/18.jpg)
Geneduplicationandloss
• Insomespecies(e.g.plants)geneduplicationandlossisrampant.
picture from evolution-textbook.org (Fig 5-20), redrawn from Eisen, Genome Research, 1998�9
![Page 19: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/19.jpg)
Geneduplicationandloss
• Insomespecies(e.g.plants)geneduplicationandlossisrampant.
• RecallfromColin’stalk:Paralogs:genesthatdivergedinaduplicationevent;e.g:2Aand3BOrthologs:genesthatdivergedinaspeciationevent;e.g:1Band3B
picture from evolution-textbook.org (Fig 5-20), redrawn from Eisen, Genome Research, 1998�9
![Page 20: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/20.jpg)
Geneduplicationandloss
• Insomespecies(e.g.plants)geneduplicationandlossisrampant.
• RecallfromColin’stalk:Paralogs:genesthatdivergedinaduplicationevent;e.g:2Aand3BOrthologs:genesthatdivergedinaspeciationevent;e.g:1Band3B
• Agenetreethatincludesparalogousgenesmaydifferfromthespeciestree
– Westriveforfindingorthologousgenes,butwemayfail
picture from evolution-textbook.org (Fig 5-20), redrawn from Eisen, Genome Research, 1998�9
![Page 21: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/21.jpg)
HorizontalGeneTransfer(HGT)
• Horizontalgenetransfer:anorganismpicksupDNAfromanotherorganismortheenvironmentratherfromitsancestors
• Itmayreplaceasimilargenepresentinthetargetormaycreateanewcopy
• Rampantinprokaryotes;observedinplantsandothereukaryotes
�10[Degnan & Rosenberg, 2009]
![Page 22: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/22.jpg)
HorizontalGeneTransfer(HGT)
• Horizontalgenetransfer:anorganismpicksupDNAfromanotherorganismortheenvironmentratherfromitsancestors
• Itmayreplaceasimilargenepresentinthetargetormaycreateanewcopy
• Rampantinprokaryotes;observedinplantsandothereukaryotes
• Atreemaybeinsufficient.Aphylogeneticnetworkmaybeneeded.
�10[Degnan & Rosenberg, 2009]
![Page 23: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/23.jpg)
[Degnan & Rosenberg, 2009]
Hybridization
• Aviablespeciesiscreatedasaresultsofhybridizationbetweentwodifferentspecies
– Thecontributionoftheparentspeciestothenewspeciesneednotbeequal.
• Atreeisnotsufficient;networksneeded
�11
![Page 24: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/24.jpg)
[Degnan & Rosenberg, 2009]
Hybridization
• Aviablespeciesiscreatedasaresultsofhybridizationbetweentwodifferentspecies
– Thecontributionoftheparentspeciestothenewspeciesneednotbeequal.
• Atreeisnotsufficient;networksneeded
• Whatdoesaspeciesevenmean?
– 17differentdefinitions…amatterofgreatdebate
– Speciesdelineationisanactiveareaofresearch
�11
![Page 25: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/25.jpg)
• A random process related to the coalescence of alleles across various populations
Tracing alleles through generations
IncompleteLineageSorting(ILS)
![Page 26: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/26.jpg)
• A random process related to the coalescence of alleles across various populations
Tracing alleles through generations
IncompleteLineageSorting(ILS)
![Page 27: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/27.jpg)
• A random process related to the coalescence of alleles across various populations
• Omnipresent:
• Possible for every gene tree
• Likely for short branches or large population sizes
Tracing alleles through generations
IncompleteLineageSorting(ILS)
![Page 28: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/28.jpg)
Phylogeneticinferencedespitediscordance
Genetreediscordancehastobeaccountedfor.
Howso?
�13
Sofar…
welearnedvariousreasonsfordiscordance.
![Page 29: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/29.jpg)
Fourmainquestionsinphylogenomics
• Reconciliation:
• Mapagiven(i.e.,known)genetreeontoaspeciestree
• Explainshowagenetreeevolvedinsidethespeciestree
�14
![Page 30: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/30.jpg)
Fourmainquestionsinphylogenomics
• Reconciliation:
• Mapagiven(i.e.,known)genetreeontoaspeciestree
• Explainshowagenetreeevolvedinsidethespeciestree
• Inferthespeciestreegivenacollectionofknown(orinferred)genetrees
�14
![Page 31: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/31.jpg)
Fourmainquestionsinphylogenomics
• Reconciliation:
• Mapagiven(i.e.,known)genetreeontoaspeciestree
• Explainshowagenetreeevolvedinsidethespeciestree
• Inferthespeciestreegivenacollectionofknown(orinferred)genetrees
• Inferagenetreegivenaknownspeciestreeandsequencedataforthatgene(a.k.atreefixing)
�14
![Page 32: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/32.jpg)
Fourmainquestionsinphylogenomics
• Reconciliation:
• Mapagiven(i.e.,known)genetreeontoaspeciestree
• Explainshowagenetreeevolvedinsidethespeciestree
• Inferthespeciestreegivenacollectionofknown(orinferred)genetrees
• Inferagenetreegivenaknownspeciestreeandsequencedataforthatgene(a.k.atreefixing)
• Co-estimategenetreesandthespeciestreegiventhesequencedataforacollectionofgenes
�14
![Page 33: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/33.jpg)
Fourmainquestionsinphylogenomics
• Reconciliation:
• Mapagiven(i.e.,known)genetreeontoaspeciestree
• Explainshowagenetreeevolvedinsidethespeciestree
• Inferthespeciestreegivenacollectionofknown(orinferred)genetrees
• Inferagenetreegivenaknownspeciestreeandsequencedataforthatgene(a.k.atreefixing)
• Co-estimategenetreesandthespeciestreegiventhesequencedataforacollectionofgenes
• …andothers…
�14
![Page 34: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/34.jpg)
Approachestophylogenomicsinference
A. Parsimony-basedB. Model-basedC. Summary-based
�15
![Page 35: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/35.jpg)
Parsimony-based
• Describediscordancesbetweengenetreesandthespeciestreeusingtheminimumnumberof“events”
• Events:Duplications,losses,transfers,deepcoalescences
• Reliesheavilyonfast(lineartime)parsimoniousreconciliationbetweengenetreesandthespeciestree
FigurefromDoyonetal.,BriefingsinBioinformatics,2011doi:10.1093/bib/bbr045�16
![Page 36: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/36.jpg)
Parsimony-based
• Describediscordancesbetweengenetreesandthespeciestreeusingtheminimumnumberof“events”
• Events:Duplications,losses,transfers,deepcoalescences
• Reliesheavilyonfast(lineartime)parsimoniousreconciliationbetweengenetreesandthespeciestree
FigurefromDoyonetal.,BriefingsinBioinformatics,2011doi:10.1093/bib/bbr045�16
![Page 37: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/37.jpg)
Orang.GorillaChimp
Human Orang.Gorilla ChimpHuman
Orang.Gorilla
ChimpHuman
Orang.Chimp Human
ACTGCACACCG ACTGC-CCCCG AATGC-CCCCG -CTGCACACGG
CTGAGCATCG CTGAGC-TCG ATGAGC-TC- CTGA-CAC-G
AGCAGCATCGTG AGCAGC-TCGTG AGCAGC-TC-TG C-TA-CACGGTG
CAGGCACGCACGAA AGC-CACGC-CATA ATGGCACGC-C-TA AGCTAC-CACGGAT
Model-based A. Designagenerativemodelofgenetreeevolution
�17
Gene tree
Sequence data(Alignments)
Gene tree Gene tree Gene tree
Sequence data(Alignments)
Gene evolution model
Sequence evolution model P(D |G)
P(G |S)
OrangutanGorilla ChimpHuman
![Page 38: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/38.jpg)
Orang.GorillaChimp
Human Orang.Gorilla ChimpHuman
Orang.Gorilla
ChimpHuman
Orang.Chimp Human
ACTGCACACCG ACTGC-CCCCG AATGC-CCCCG -CTGCACACGG
CTGAGCATCG CTGAGC-TCG ATGAGC-TC- CTGA-CAC-G
AGCAGCATCGTG AGCAGC-TCGTG AGCAGC-TC-TG C-TA-CACGGTG
CAGGCACGCACGAA AGC-CACGC-CATA ATGGCACGC-C-TA AGCTAC-CACGGAT 18
Gene tree
Sequence data(Alignments)
Gene tree Gene tree Gene tree
Sequence data(Alignments)
Gene evolution model
Sequence evolution model P(D |G)
P(G |S)
OrangutanGorilla ChimpHuman
B. MLorBayesianinferenceunderthemodel
Model-based
![Page 39: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/39.jpg)
Modelsofgenetreeevolution
• ILS:modeledbytheMulti-SpeciesCoalescentmodel(MSC):anextensionoftheKingman’scoalescenttomultiplespecies
�19
![Page 40: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/40.jpg)
Modelsofgenetreeevolution
• ILS:modeledbytheMulti-SpeciesCoalescentmodel(MSC):anextensionoftheKingman’scoalescenttomultiplespecies
• Duploss:typicallymodeledusingbirthdeathprocesses,requiringarateofbirthanddeath(oftenfixed)
�19
![Page 41: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/41.jpg)
Modelsofgenetreeevolution
• ILS:modeledbytheMulti-SpeciesCoalescentmodel(MSC):anextensionoftheKingman’scoalescenttomultiplespecies
• Duploss:typicallymodeledusingbirthdeathprocesses,requiringarateofbirthanddeath(oftenfixed)
• HGT,geneflow/hybridization:thespeciesphylogenyismodeledasanetwork(DAG)andgenetreesarestochasticallyembeddedinthenetwork.
�19
![Page 42: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/42.jpg)
Modelsofgenetreeevolution
• ILS:modeledbytheMulti-SpeciesCoalescentmodel(MSC):anextensionoftheKingman’scoalescenttomultiplespecies
• Duploss:typicallymodeledusingbirthdeathprocesses,requiringarateofbirthanddeath(oftenfixed)
• HGT,geneflow/hybridization:thespeciesphylogenyismodeledasanetwork(DAG)andgenetreesarestochasticallyembeddedinthenetwork.
• Modelsofcombinedeffectsalsoexist
• DTLSR,DLCoal,ODT,Hybridization+ILS
�19
![Page 43: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/43.jpg)
Modelsofgenetreeevolution
• ILS:modeledbytheMulti-SpeciesCoalescentmodel(MSC):anextensionoftheKingman’scoalescenttomultiplespecies
• Duploss:typicallymodeledusingbirthdeathprocesses,requiringarateofbirthanddeath(oftenfixed)
• HGT,geneflow/hybridization:thespeciesphylogenyismodeledasanetwork(DAG)andgenetreesarestochasticallyembeddedinthenetwork.
• Modelsofcombinedeffectsalsoexist
• DTLSR,DLCoal,ODT,Hybridization+ILS
• Caution:inferenceundermanyofthesemodelsisdifficult
�19
![Page 44: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/44.jpg)
Summary-basedmethods
• Useexpectationsunderastatisticalmodelbutavoidcomputingthelikelihood
• Oftenarestatisticallyconsistentundersomemodelofgenetreeevolution
• Oftenbasedonsummarystatisticsordistancemeasures
�20
![Page 45: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/45.jpg)
Summary-basedmethods
• Useexpectationsunderastatisticalmodelbutavoidcomputingthelikelihood
• Oftenarestatisticallyconsistentundersomemodelofgenetreeevolution
• Oftenbasedonsummarystatisticsordistancemeasures
• Usuallyworkintwosteps:
• Genetreesareindependentlyinferredfromsequencedata
• Genetreesarecombinedtobuildthespeciestree
• Let’sseeanexample…
�20
![Page 46: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/46.jpg)
UnderMSCmodelofILS
For a quartet (4 species), the unrooted species tree topology has at least 1/3 probability of appearing in gene trees (Allman, et al. 2010)
21
Orang.
Gorilla Chimp
HumanOrang.
GorillaChimp
Human
Orang.
Gorilla
Chimp
Human
θ2=15% θ3=15%θ1=70%Gorilla
Orang.
Chimp
Human
d=0.8
The most frequent gene tree = The most likely species tree
![Page 47: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/47.jpg)
22
Orang.Gorilla ChimpHuman Rhesus
Morethan4species
For >4 species, the species tree topology can be different from the most like gene tree (called anomaly zone) (Degnan, 2013)
![Page 48: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/48.jpg)
22
Orang.Gorilla ChimpHuman Rhesus
1. Break gene trees into (n 4 ) quartets of species
2. Find the dominant tree for all quartets of taxa
3. Combine quartet trees
Some tools (e.g.. BUCKy-p [Larget, et al., 2010])
Morethan4species
For >4 species, the species tree topology can be different from the most like gene tree (called anomaly zone) (Degnan, 2013)
![Page 49: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/49.jpg)
22
Orang.Gorilla ChimpHuman Rhesus
1. Break gene trees into (n 4 ) quartets of species
2. Find the dominant tree for all quartets of taxa
3. Combine quartet trees
Some tools (e.g.. BUCKy-p [Larget, et al., 2010])
Morethan4species
For >4 species, the species tree topology can be different from the most like gene tree (called anomaly zone) (Degnan, 2013)
Alternative:
Weight all 3(n 4 ) quartet topologies by
their frequency and find the optimal tree
![Page 50: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/50.jpg)
MaximumQuartetSupportSpeciesTree
• Optimization problem:
• Theorem: Statistically consistent under the multi-species coalescent model when solved exactly
23
Find the species tree with the maximum number of induced quartet trees shared with the collection of input gene trees
the set of quartet trees induced by T
a gene treeScore(T ) =
mX
1
|Q(T ) \Q(ti)|
![Page 51: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/51.jpg)
MaximumQuartetSupportSpeciesTree
• Optimization problem:
• Theorem: Statistically consistent under the multi-species coalescent model when solved exactly
23
Find the species tree with the maximum number of induced quartet trees shared with the collection of input gene trees
the set of quartet trees induced by T
a gene tree
NP-Hard [Lafond & Scornavaccaori, 2016]
Score(T ) =mX
1
|Q(T ) \Q(ti)|
![Page 52: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/52.jpg)
[Mirarab, et al., Bioinformatics, 2014] [Mirarab and Warnow, Bioinformatics, 2015]
• Solve the Maximum Quartet Support problem exactly using dynamic programming
24
ASTRAL
![Page 53: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/53.jpg)
[Mirarab, et al., Bioinformatics, 2014] [Mirarab and Warnow, Bioinformatics, 2015]
• Solve the Maximum Quartet Support problem exactly using dynamic programming
• Constrains the search space to make large datasets feasible
• The constrained version remains statistically consistent
• Running time of constrained version increases polynomially with the input size
24
ASTRAL
![Page 54: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/54.jpg)
[Mirarab, et al., Bioinformatics, 2014] [Mirarab and Warnow, Bioinformatics, 2015]
• Solve the Maximum Quartet Support problem exactly using dynamic programming
• Constrains the search space to make large datasets feasible
• The constrained version remains statistically consistent
• Running time of constrained version increases polynomially with the input size
• Is adopted widely for incomplete lineage sorting 24
ASTRAL
![Page 55: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/55.jpg)
Sofar…
Threeapproaches:A. Parsimony-basedB. Model-basedC. Summary-based
FourQuestions:A. ReconciliationB. SpeciestreeinferenceC. GenetreeinferenceD. Co-estimation
Threetypesofdiscordance:A. DuplicationandlossesB. HGT&HybridizationC. ILS
*Eachcapturedbymanystatisticalmodels
�25
![Page 56: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/56.jpg)
Manymanytools
• Speciestreeinference
• ILS:ASTRAL,STAR,MP-EST,GLASS,NJst/ASTRID,DISTIQUE,STELLS,…
• Duploss:DupTree,iGTP,DynaDup,MulRF,…
• Hybridization:Phylonet,PhyloNetworks,SNaQ,…
• Agnostic:Bucky,MRP,MRL,guenomu,…
• Genetreecorrection
• TreeFix,Giga,RefineTree,SPIDIR,SPIMAP,PrIME-GSR,SYNERGY,ALE,ODT(seealsoEnsemblCompara)
• Reconciliation
• Notung,DLCoalRecon,PrIME,Phylonet,TreeMap,Korak,CoRe-Pa,Jane,Mowgli,AnGST,…
• Co-estimation
• PHYLDOG,*BEAST,BEST,BBCA,PhyloNet,…
�26
![Page 57: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/57.jpg)
Manymanytools
• Speciestreeinference
• ILS:ASTRAL,STAR,MP-EST,GLASS,NJst/ASTRID,DISTIQUE,STELLS,…
• Duploss:DupTree,iGTP,DynaDup,MulRF,…
• Hybridization:Phylonet,PhyloNetworks,SNaQ,…
• Agnostic:Bucky,MRP,MRL,guenomu,…
• Genetreecorrection
• TreeFix,Giga,RefineTree,SPIDIR,SPIMAP,PrIME-GSR,SYNERGY,ALE,ODT(seealsoEnsemblCompara)
• Reconciliation
• Notung,DLCoalRecon,PrIME,Phylonet,TreeMap,Korak,CoRe-Pa,Jane,Mowgli,AnGST,…
• Co-estimation
• PHYLDOG,*BEAST,BEST,BBCA,PhyloNet,…
Noscalablespecies/genetreeinferencemethodcancurrentlyaddressallcausesofdiscordance
�26
![Page 58: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/58.jpg)
Whatdoyouchoose?
• Whatcause(s)ofdiscordancedoyoubelievetobeprevalent?
• Doyouneedtoworryaboutduplicationandloss?
• Orperhapsyouhavearelativelyreliablewayoffindingorthology?Maybeusingwholegenomealignments.
• Doyouexpecthybridization,geneflow,orHGTforyourtaxa?
• IsILSlikelytobepresent?
• shortinternalbranches
• veryhighpopulationsizes
�27
![Page 59: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/59.jpg)
Whatdoyouchoose?
• Statisticalnoisecannotbeignored
�28[Jarvis, et al., Science, 2014]
medianmean
0
5%
10%
15%
20%
0% 25% 50% 75% 100%branch bootstrap support
bran
ches
(per
cent
age)
![Page 60: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/60.jpg)
Whatdoyouchoose?
• Statisticalnoisecannotbeignored
• Inferringgenetreesfromshortsequencesisoftenerror-prone
• Tree-fixingmethodsmayhelp
• Co-estimationislesspronetogenetreeestimationerror
�28[Jarvis, et al., Science, 2014]
medianmean
0
5%
10%
15%
20%
0% 25% 50% 75% 100%branch bootstrap support
bran
ches
(per
cent
age)
![Page 61: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/61.jpg)
Whatdoyouchoose?
• Whatisthedatasetsize?
• Methodsdiffervastlyintermsofscalability
• Fast/scalable:parsimony-basedandsummarymethods(+MLgenetrees)
• Slower:statisticalmodels,especiallywhenconsideringmultiplecausesofdiscordance
• Slowest:co-estimation
• Notallapproachesareimplementedinanoptimizedmanner
• Somemethodsareeasiertoparallelizethanothers
�29
![Page 62: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/62.jpg)
Takeawaymessages
• Causesofgenetreediscordancearevariedandcanco-exist
• Inferenceunderdiscordanceisanongoingareaofresearch
• Datasetsizematters!
• Thinkingaboutuncertaintyanderrorisimportant
�30
![Page 63: ISMB Tutorial: Computational methods for comparative](https://reader031.vdocument.in/reader031/viewer/2022013001/61ca9be0ed583321270ddad9/html5/thumbnails/63.jpg)
Reference
• Maddison, Wayne P. “Gene Trees in Species Trees.” Systematic Biology 46, no. 3 (September 1, 1997): 523–36. https://doi.org/10.2307/2413694.
• Degnan, James H., and Noah A. Rosenberg. “Gene Tree Discordance, Phylogenetic Inference and the Multispecies Coalescent.” Trends in Ecology and Evolution 24, no. 6 (June 1, 2009): 332–40. https://doi.org/10.1016/j.tree.2009.01.009.
• Doyon, J.-P. JP, Vincent Ranwez, Vincent Daubin, and V. Berry. “Models, Algorithms and Programs for Phylogeny Reconciliation.” Briefings in Bioinformatics 12, no. 5 (September 22, 2011): 392–400. https://doi.org/10.1093/bib/bbr045.
• Szöllõsi, G J, E Tannier, Vincent Daubin, and Bastien Boussau. “The Inference of Gene Trees with Species Trees.” Systematic Biology 64, no. 1 (July 28, 2014): e42–62. https://doi.org/10.1093/sysbio/syu048.
• Warnow, Tandy. Computational phylogenetics: An introduction to designing methods for phylogeny estimation. Cambridge University Press, 2017.
31