Building and visualizing phylogeny
Henrik LantzDept. of Medical Biochemistry and Microbiology, BMC, Uppsala University
MePhD in Plant Systematics, post
doc in Fungal SystematicsNow bioinformatician working
with genomic and transcriptomic data, mostly annotation of genomes
YouHow many of you have
experience with inferring phylogenies?
How many of you have experience with working with sequence data in the computer?
This lectureBasic facts about phylogenies
and nomenclature usedHow to infer a phylogeny from
sequence dataHow to visualize phylogenies
What is phylogeny?Evolutionary relationship of
organisms or genes – anything related by descent
Often visualized as a phylogenetic tree
From individual to phylogeny
Zoooooming out…
Overview
Organism based phylogeny
Gene families
A simple phylogeny
Branch
Node
Root
Time
A B C D
Root
C D A B
Root
Clades
A B C D
Root
C D A B
Root
Sister groups
A B C DE F G H
Sister-group relationships
Mouse Rabbit ChimpHumanBirds Kangaroo Horse Dog
A B C DE F G H
Monophyletic
Paraphyletic
Support values
Branch lengths
Phylogeny 1-2-3
1. Sequence data, nucleotides or amino acids, in FASTA-format
2. Align the sequences3. Run the alignment in the
phylogeny program4. Visualize the results in a tree
viewer
Phylogeny.fr does all of this!
Expasy
Phylogeny.fr
1. Fasta-format
>CO1_species1ACGTGTCCGA...>CO1_species2TCCGATGAAC...>CO1_species3GTGTCCGATC...Etc.
2. Alignment
From:>CO1_species1 ACGTGTCCGA>CO1_species2 TCCGATGAAC>CO1_species3 GTGTCCGATCTo:>CO1_species1 ACGTGTCCGA----->CO1_species2 -----TCCGATGAAC>CO1_species3 --GTGTCCGATC---
2. Alignment
>CO1_species1 ACGTGTCCGA----->CO1_species2 -----TCCGATGAAC>CO1_species3 --GTGTCCGATC--->CO1_species4 ACGTGACCGATC--->CO1_species5 -CGTGACCGATCAAC>CO1_species6 ACGTGTCCGATGAAC
Homology and orthologyHomology - traits shared due to
common ancestry, e.g., fingered forelimbs in birds and mammals
Analogy - traits of similar function, but not due to shared ancestry, e.g., wings in birds and insects
Orthology - Sequences were split due to speciation events
Paralogy - Sequences were split due to duplication events
OutgroupShould be something outside of
the study group. Used to orient the tree.
If you can, pick several taxa as outgroup
Most phylogenetic programs pick the upper-most sequence in your input-file as the outgroup unless you tell the program otherwise
3. Build the phylogeny - Phylogenetic methods
UPGMANeighbor joiningParsimonyLikelihood methods - Maximum Likelihood- Bayesian Methods
Do not use!Not recommendedOutdated
Use one of thesePhyml, Raxml
MrBayes
Substitution modelsJukes-Cantor - Transitions between all
nucleotides are the sameKimura - Different rates for transitions
and transversions (Purine AG, Pyrimidine CT)
GTR - Different rates for all transitions.Used by Phylogeny-frAmino acid models become much more complex as there are 20 states rather than 4
4. Visualization