CSCE555 BioinformaticsCSCE555 Bioinformatics
Lecture 12 Phylogenetics I
Meeting: MW 4:00PM-5:15PM SWGN2A21Instructor: Dr. Jianjun HuCourse page: http://www.scigen.org/csce555
University of South CarolinaDepartment of Computer Science and Engineering2008 www.cse.sc.edu.
HAPPY CHINESE NEW YEAR
OutlineOutline
Introduction to EvolutionWhat is phylogeny and
phylogeneticsApplication of phylogeneticsAlgorithms for phylogenetic
inference
04/20/23 2
How did life evolve on How did life evolve on earth?earth?
Courtesy of the Tree of Life project
An international effort to An international effort to understand how life evolved on understand how life evolved on earthearth
Biomedical applications: drug Biomedical applications: drug design, protein structure and design, protein structure and function prediction, biodiversity.function prediction, biodiversity.
EvolutionEvolution
Evolution of new organisms is driven by
Mutations◦ The DNA sequence can
be changed due to single base changes, deletion/insertion of DNA segments, etc.
Selection bias
Theory of EvolutionTheory of EvolutionBasic idea
◦speciation events lead to creation of different species.
◦Speciation caused by physical separation into groups where different genetic variants become dominant
Any two species share a (possibly distant) common ancestor
Primate evolution
A phylogeny is a tree that describes the sequence of speciation events that lead to the forming of a set of current day species; also called a phylogenetic tree.
DNA Sequence EvolutionDNA Sequence Evolution
AAGACTT
TGGACTTAAGGCCT
-3 mil yrs
-2 mil yrs
-1 mil yrs
today
AGGGCAT TAGCCCT AGCACTT
AAGGCCT TGGACTT
TAGCCCA TAGACTT AGCGCTTAGCACAAAGGGCAT
AGGGCAT TAGCCCT AGCACTT
AAGACTT
TGGACTTAAGGCCT
AGGGCAT TAGCCCT AGCACTT
AAGGCCT TGGACTT
AGCGCTTAGCACAATAGACTTTAGCCCAAGGGCAT
Morphological vs. Morphological vs. MolecularMolecularClassical phylogenetic analysis: morphological features: number of legs, lengths of legs, etc.
Modern biological methods allow to use molecular features◦Gene sequences◦Protein sequences◦Whole genome sequences. E.g.
rearrangements
Morphological topology
BonoboChimpanzeeManGorillaSumatran orangutanBornean orangutanCommon gibbonBarbary apeBaboonWhite-fronted capuchinSlow lorisTree shrewJapanese pipistrelleLong-tailed batJamaican fruit-eating batHorseshoe bat
Little red flying foxRyukyu flying foxMouseRatVoleCane-ratGuinea pigSquirrelDormouseRabbitPikaPigHippopotamusSheepCowAlpacaBlue whaleFin whaleSperm whaleDonkeyHorseIndian rhinoWhite rhinoElephantAardvarkGrey sealHarbor sealDogCatAsiatic shrewLong-clawed shrewSmall Madagascar hedgehogHedgehogGymnureMoleArmadilloBandicootWallarooOpossumPlatypus
Archonta
Glires
Ungulata
Carnivora
Insectivora
Xenarthra
(Based on Mc Kenna and Bell, 1997)
Rat QEPGGLVVPPTDA
Rabbit QEPGGMVVPPTDA
Gorilla QEPGGLVVPPTDA
Cat REPGGLVVPPTEG
From sequences to a phylogenetic tree
There are many possible types of sequences to use (e.g. Mitochondrial vs Nuclear proteins).
DonkeyHorseIndian rhinoWhite rhinoGrey sealHarbor sealDogCatBlue whaleFin whaleSperm whaleHippopotamusSheepCowAlpacaPigLittle red flying foxRyukyu flying foxHorseshoe batJapanese pipistrelleLong-tailed batJamaican fruit-eating bat
Asiatic shrewLong-clawed shrew
MoleSmall Madagascar hedgehogAardvarkElephantArmadilloRabbitPikaTree shrewBonoboChimpanzeeManGorillaSumatran orangutanBornean orangutanCommon gibbonBarbary apeBaboon
White-fronted capuchinSlow lorisSquirrelDormouseCane-ratGuinea pigMouseRatVoleHedgehogGymnureBandicootWallarooOpossumPlatypus
Perissodactyla
Carnivora
Cetartiodactyla
Rodentia 1
HedgehogsRodentia 2
Primates
ChiropteraMoles+ShrewsAfrotheria
XenarthraLagomorpha
+ Scandentia
Mitochondrial topology(Based on Pupko et al.,)
Phylogenenetic treesPhylogenenetic trees
Leaves - current day species (or taxa – plural of taxon)
Internal vertices - hypothetical common ancestors
Edges length - “time” from one speciation to the next
Aardvark Bison Chimp Dog Elephant
Types of TreesTypes of TreesA natural model to consider is that
of rooted treesCommonAncestor
Types of treesTypes of treesUnrooted tree represents the same
phylogeny without the root node
Depending on the model, data from current day species does not distinguish between different placements of the root.
Rooted versus unrooted treesTree a
ab
Tree b
c
Tree c
Represents the three rooted trees
What is phylogenetics?What is phylogenetics?Phylogenetics is the study of
evolutionary relationships among and within species.◦Inference of trees from data◦Interpreting the evolutionary tree◦Application of evolutionary trees
crocodiles
birds
lizards
snakesrodents
primates
marsupials
What is phylogenetics?What is phylogenetics?
crocodiles
birds
lizards
snakes
rodents
primates
marsupials
This is an example of a phylogenetic tree.
• Forensics:Did a patient’s HIV infection result from an invasive
dental procedure performed by an HIV+ dentist?
Applications of Applications of phylogeneticsphylogenetics
• Conservation:How much gene flow is there among local populations of
island foxes off the coast of California?
• Medicine:What are the evolutionary relationships among the
various prion-related diseases? HIV case
Applications of Applications of phylogeneticsphylogenetics1. Forensics
Did a patient’s HIV infection result from an invasive dental procedure performed by an HIV+ dentist?
Phylogenetic analysisPhylogenetic analysis
So what do the results So what do the results mean?mean?
• 2 of 3 patients closer to dentist than to local controls. Statistical significance? More powerful analyses?
• Do we have enough data to be confident in our conclusions? What additional data would help?
• If we determine that the dentist’s virus is linked to those of patients E and G, what are possible interpretations of this pattern? How could we test between them?
Applications of Applications of phylogeneticsphylogenetics2. ConservationHow much gene flow is there
among local populations of island foxes off the coast of California?
http://bioquest.org/bedrock/
Wayne, K. R, Morin, P.A. 2004 Conservation Genetics in the New Molecular Age, Frontiers in Ecology and the Environment. 2: 89-97. (ESA publication)
Applications of Applications of phylogeneticsphylogenetics3. MedicineWhat are the evolutionary
relationships among the various prion-related diseases?
Inferring PhylogeniesInferring Phylogenies
Trees can be inferred:
◦ Morphology of the organisms
◦ Sequence comparison
Example:
Orc: ACAGTGACGCCCCAAACGT
Elf: ACAGTGACGCTACAAACGT
Dwarf: CCTGTGACGTAACAAACGA
Hobbit: CCTGTGACGTAGCAAACGA
Human: CCTGTGACGTAGCAAACGA
How Many Trees?How Many Trees?
Unrooted trees Rooted trees
# sequences
# pairwise distances # trees
# branches /
tree # trees
# branches
/tree
3
4
5
6
10
30
N
(assuming bifurcation only)
How Many Trees?How Many Trees?
2N - 2(2N - 3)!
2N - 2 (N - 2)!
2N - 3(2N - 5)!
2N - 3 (N - 3)!
N (N - 1)
2
N
584.95 1038578.69 103643530
1834,459,425172,027,0254510
109459105156
8105715105
6155364
433133
# branches
/tree# trees
# branches /
tree# trees
# pairwise distance
s
# sequence
s
Rooted treesUnrooted trees
Phylogenetic MethodsPhylogenetic Methods
Maximum likelihood• Maximizes likelihood of observed data
Many different procedures exist. Three of the most popular:
Maximum parsimony• Minimizes total evolutionary change
Neighbor-joining• Minimizes distance between nearest
neighbors
Comparison of MethodsComparison of Methods
Neighbor-joining Maximum parsimony Maximum likelihood
Very fast Slow Very slow
Easily trapped in local optima
Assumptions fail when evolution is rapid
Highly dependent on assumed evolution model
Good for generating tentative tree, or choosing among multiple trees
Best option when tractable (<30 taxa, strong conservation)
Good for very small data sets and for testing trees built using other methods
Distance based tree Distance based tree ConstructionConstructionDistance- A weighted tree that realizes the distances
between the objects.Given a set of species (leaves in a supposed tree), and
distances between them – construct a phylogeny which best “fits” the distances.
Distance MatrixDistance MatrixGiven n species, we can compute
the n x n distance matrix Dij
Dij may be defined as the edit distance between a gene in species i and species j, where the gene of interest is sequenced for all n species.
Distances in TreesDistances in Trees
Edges may have weights reflecting:◦Number of mutations on evolutionary path from one species to another
◦Time estimate for evolution of one species into another
In a tree T, we often compute dij(T) - the length of a path between leaves
i and j
Distance in Trees: an Distance in Trees: an ExampeExampe
d1,4 = 12 + 13 + 14 + 17 + 12 = 68
i
j
Fitting Distance MatrixFitting Distance MatrixGiven n species, we can compute
the n x n distance matrix Dij
Evolution of these genes is described by a tree that we don’t know.
We need an algorithm to construct a tree that best fits the distance matrix Dij
SummarySummaryEvolution and PhylogenyConcepts of Phylogenetics Application of PhylogeneticsCategory of phylogenetic inference
algorithms
Next lecture:Detailed algorithms for phylogenetic
inference
AcknowledgementAcknowledgementAnonymous authors