lecture 5 : phylogenies
DESCRIPTION
Lecture 5 : Phylogenies. 9/16/09. Translated blast = protein vs translated database. Blasting Genbank - blastn. Z. bruijni - long beaked echidna T. aculeatus - echidna T. rostratus = honey possum. AX8GS9DG01S. Blasting Genbank - discont megablast - exactly same as blastn. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/1.jpg)
Lecture 5 : PhylogeniesLecture 5 : Phylogenies
9/16/09
![Page 2: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/2.jpg)
Translated blast = protein vs translated database
![Page 3: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/3.jpg)
Blasting Genbank - blastnBlasting Genbank - blastn
Z. bruijni - long beaked echidna T. aculeatus - echidna T. rostratus = honey possum
AX8GS9DG01S
![Page 4: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/4.jpg)
Blasting Genbank - discont Blasting Genbank - discont megablast - exactly same as megablast - exactly same as
blastnblastn
Z. bruijni - long beaked echidna T. aculeatus - echidna T. rostratus = honey possum
AX9N23U7014
![Page 5: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/5.jpg)
Blasting Genbank - megablast - Blasting Genbank - megablast - same species but different ordersame species but different order
Z. bruijni - long beaked echidna T. aculeatus - echidna T. rostratus = honey possum
AX9TUM1G016
![Page 6: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/6.jpg)
Blasting Genbank - Blasting Genbank - TblastnTblastn
AX9DYYTE01N
T. aculeatus - echidna S. brachyurus - quokka S. crassicaudata - fat tailed dunnart M. fasciatus - numbat I. obesulus - quenda
![Page 7: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/7.jpg)
Species found by BLASTSpecies found by BLAST
I. obesulus = quenda = bandicoot
T. aculeatus = echidna
M. fasciatus = numbat
T. rostratus = honey possum S. crassicaudata
= fat tailed dunnart
O. anatinus = platypus
S. brachyurus = quokka
Z. bruijni - Long beaked echidna
![Page 8: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/8.jpg)
Homologene - can be reached Homologene - can be reached from NCBI home pagefrom NCBI home page
Scroll down - they are listed alphabetically
![Page 9: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/9.jpg)
QuestionsQuestions
Phylogenies - what are they?
1. How do we build them?
2. What do they tell us?
![Page 10: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/10.jpg)
PhylogenyPhylogeny Evolutionary
history of a a group of organisms, especially as depicted in a family tree
Haeckel, 1879
![Page 11: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/11.jpg)
Things trees might tell Things trees might tell you :you :
How are organisms with particular trait related?
Did trait evolve multiple times or only once?
What is evolutionary pathwayOf organismsOf genes
![Page 12: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/12.jpg)
Molecules can be used to Molecules can be used to learn how organisms are learn how organisms are
relatedrelated
![Page 13: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/13.jpg)
To learn about vertebrate To learn about vertebrate evolution: Compare >600 genesevolution: Compare >600 genes
1998
![Page 14: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/14.jpg)
Used genes to measure time
1) Time since common ancestor with human
2) Time since two groups diverged
![Page 15: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/15.jpg)
More recent version of vertebrate evolution which shows divergence times on the animal tree
Ponting 2008
![Page 16: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/16.jpg)
OrangutanHumanChimpRhesus monkey
MouseRat
DogCatHorseCowOpposum
Wallaby
Anole
Chicken
FrogFish -Medaka Fugu Tetraodon ZebrafishElephant sharkLamprey
Platypus
![Page 17: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/17.jpg)
Primates 25 MY
Mammals 100 MY100 MY
All vertebrates 550 MY
Tetrapods 420 MY420 MY
Fish 320 MY
![Page 18: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/18.jpg)
Molecular clockMolecular clock
Molecules change at a steady rate We can calibrate how fast they
change using fossils The molecules then become a time
piece to measure how recently different groups split off from each other
![Page 19: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/19.jpg)
Sequence conservation may Sequence conservation may be highbe high
Gene might code for a protein which is highly constrained
Might have to interact with lots of other proteins
Selection might be quite strong
![Page 20: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/20.jpg)
Sequence conservation may Sequence conservation may be lowbe low
Not much constraint
Few sites of interaction
Selection might be weak
![Page 21: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/21.jpg)
Phylogeny stepsPhylogeny steps
Align sequences so homologous AA can be compared
Determine the similarity between sequences
Use this to generate a relationship between sequences
![Page 22: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/22.jpg)
Clustalw2 to align Clustalw2 to align sequencessequences
![Page 23: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/23.jpg)
Put sequences in FASTA Put sequences in FASTA filefile
>TetraodonG1MVWDGGIEPNGTEGKNFYIPMSNRTGIVRSPFEYPQYYLVDPIMFKMLALYMFFLICTGTPINGLTLLVTAQNKKLRQPLNYILVNLAVAGLIMCAFGFTITITSAINGYFILGATACAVEGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFTGTHAAVGVLFTWIMAFACAGPPLFGWSRYLPEGMQCSCGPDYYTLAPGYNNESYVIYMFVVHFFVPVFLIFFTYGSLVLTVRAAAQQQESESTQKAQREVTRMCILMVLGFLVAWTPYATFSGWIFMNKGAAFHPLTAALCAFFAKSSALYNPVIYVLMNKQFRNCMLSTFGMGGAVDDETSVSASKTEVSSVS
>ZebrafishG1MNGTEGSNFYIPMSNRTGLVRSPYDYTQYYLAEPWKFKALAFYMFLLIIFGFPINVLTLVVTAQHKKLRQPLNYILVNLAFAGTIMVIFGFTVSFYCSLVGYMALGPLGCVMEGFFATLGGQVALWSLVVLAIERYIVVCKPMGSFKFSANHAMAGIAFTWFMACSCAVPPLFGWSRYLPEGMQTSCGPDYYTLNPEYNNESYVMYMFSCHFCIPVTTIFFTYGSLVCTVKAAAAQQQESESTQKAEREVTRMVILMVLGFLFAWVPYASFAAWIFFNRGAAFSAQAMAVPAFFSKTSAVFNPIIYVLLNKQFRSCMLNTLFCGKSPLGDDESSSVSTSKTEVSSVSPA
>CichlidG1MAWEGGIEPNGTEGKNFYIPMSNRTGIVRSPFEYTQYYLADPIFFKLLAFYMFFLICTGTPINSLTLFVTAQNKKLRQPLNYILVNLAVAGLIMCCFGFTITITSAFNGYFILGSTFCAIEGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFSGAHAGAGVLFTWIMAMACAAPPLFGWSRYIPEGMQCSCGPDYYTLAPGFNNESYVIYMFVVHFFVPVFIIFFTYGSLVMTVKAAAAQQQDSASTQKAEKEVTRMCVLMVMGFLIAWTPYASFAGWIFMNKGASFSALTAAIPAFFAKSSALYNPVIYVLMNKQFRNCMLSTIGMGGMVEDETSVSTSKTEVSSVS
![Page 24: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/24.jpg)
Aligned sequences .aln ; Jalview gives colored version
Funky tree .dnd (need special program to draw)
Scroll down this page for tree (use Phylogram)
![Page 25: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/25.jpg)
CLUSTAL W (1.83) multiple sequence alignment
TetraodonG1 MVWDGGIEPNGTEGKNFYIPMSNRTGIVRSPFEYPQYYLVDPIMFKMLALYMFFLICTGT 60CichlidG1 MAWEGGIEPNGTEGKNFYIPMSNRTGIVRSPFEYTQYYLADPIFFKLLAFYMFFLICTGT 60ZebrafishG1 --------MNGTEGSNFYIPMSNRTGLVRSPYDYTQYYLAEPWKFKALAFYMFLLIIFGF 52 *****.***********:****::*.****.:* ** **:***:** *
TetraodonG1 PINGLTLLVTAQNKKLRQPLNYILVNLAVAGLIMCAFGFTITITSAINGYFILGATACAV 120CichlidG1 PINSLTLFVTAQNKKLRQPLNYILVNLAVAGLIMCCFGFTITITSAFNGYFILGSTFCAI 120ZebrafishG1 PINVLTLVVTAQHKKLRQPLNYILVNLAFAGTIMVIFGFTVSFYCSLVGYMALGPLGCVM 112 *** ***.****:***************.** ** ****::: .:: **: **. *.:
TetraodonG1 EGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFTGTHAAVGVLFTWIMAFACAGPPL 180CichlidG1 EGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFSGAHAGAGVLFTWIMAMACAAPPL 180ZebrafishG1 EGFFATLGGQVALWSLVVLAIERYIVVCKPMGSFKFSANHAMAGIAFTWFMACSCAVPPL 172 ***:*****:**************************:. ** .*: ***:** :** ***
TetraodonG1 FGWSRYLPEGMQCSCGPDYYTLAPGYNNESYVIYMFVVHFFVPVFLIFFTYGSLVLTVR- 239CichlidG1 FGWSRYIPEGMQCSCGPDYYTLAPGFNNESYVIYMFVVHFFVPVFIIFFTYGSLVMTVKA 240ZebrafishG1 FGWSRYLPEGMQTSCGPDYYTLNPEYNNESYVMYMFSCHFCIPVTTIFFTYGSLVCTVKA 232 ******:***** ********* * :******:*** ** :** ********* **:
TetraodonG1 AAAQQQESESTQKAQREVTRMCILMVLGFLVAWTPYATFSGWIFMNKGAAFHPLTAALCA 299CichlidG1 AAAQQQDSASTQKAEKEVTRMCVLMVMGFLIAWTPYASFAGWIFMNKGASFSALTAAIPA 300ZebrafishG1 AAAQQQESESTQKAEREVTRMVILMVLGFLFAWVPYASFAAWIFFNRGAAFSAQAMAVPA 292 ******:* *****::***** :***:***.**.***:*:.***:*:**:* . : *: *
TetraodonG1 FFAKSSALYNPVIYVLMNKQFRNCMLSTFGMGG--AVDDETS-VSASKTEVSSVS-- 351CichlidG1 FFAKSSALYNPVIYVLMNKQFRNCMLSTIGMGG--MVEDETS-VSTSKTEVSSVS-- 352ZebrafishG1 FFSKTSAVFNPIIYVLLNKQFRSCMLNTLFCGKSPLGDDESSSVSTSKTEVSSVSPA 349 **:*:**::**:****:*****.***.*: * :**:* **:*********
![Page 26: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/26.jpg)
Alignment is keyAlignment is key
Any other analysis that you do is only as good as your alignment
If your alignment is bad subsequent analyses will be bad
Junk in = Junk out
![Page 27: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/27.jpg)
AlignmentsAlignments
Tell you about sequence conservationHow much is there?Where is it?
![Page 28: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/28.jpg)
Calculate sequence Calculate sequence similaritiessimilarities
Zebrafish M--------NGTEGSNFYIPMSNR Trout M------Q-NGTEGSNFYIPMSNR Medaka M------E-NGTEGKNFYIPMNNR Cod M----RMEANGTEGKNFYIPMSNR Halibut MVWDGGIEPNGTEGKNFYIPMSNR Tetraodon MVWDGGIEPNGTEGKNFYIPMSNR Goldfish M--------NGTEGNNFYVPLSNR Killifish M---GYG-PNGTEGNNFYIPMSNK * *****.***:*:.*:
Pairwise comparisons
![Page 29: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/29.jpg)
Use tree to show Use tree to show sequence relationshipssequence relationships
Short branches mean sequences are more similarLong branches mean there are more differences
![Page 30: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/30.jpg)
Q3. How do we build Q3. How do we build phylogenies?phylogenies?
Assume the relationships involve bifurcating branches
ATC
ATG
ACG
CCG
CCC
ATC
ATG
ACG
CCG
CCC
![Page 31: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/31.jpg)
Methods to determine Methods to determine similaritiessimilarities
Parsimony
Distance
Maximum likelihood
Bayesian
![Page 32: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/32.jpg)
ParsimonyParsimony
The least complex explanation is the most likely to be correctOccam’s razor
The preferred phylogenetic tree is one that requires fewest changes Count up # changes for all possible
treesFind the shortest one
![Page 33: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/33.jpg)
Trees based on parsimonyTrees based on parsimony
ATCG
ATCG
ACCG
ACCG
ATCG
ACCG
ATCG
ACCG
CT
CT
CT
Most parsimonious
![Page 34: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/34.jpg)
Trees based on parsimonyTrees based on parsimony
T
T
C
C
T
C
T
C
CT
CT
CT
Most parsimonious
![Page 35: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/35.jpg)
Can’t always distinguish tree Can’t always distinguish tree topologiestopologies
T
T
C
C
T
T
C
C
CT CT
Equally parsimonious
![Page 36: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/36.jpg)
Other limitationsOther limitations
All changes are weighted the sameC-T same as C - ASame no matter how long it takes for
the change to occur
![Page 37: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/37.jpg)
Distance methodsDistance methods
Calculate a numerical value for sequence differencesDo for all pairwise combinations
Build tree by joining most similar sequences and then more divergent
![Page 38: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/38.jpg)
Distance methodsDistance methods
Fast Pretty robust Only deals with data in pairs
![Page 39: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/39.jpg)
Pairwise distancesPairwise distances
Taxa1 AACGGTCATGGCGTTGCATTTaxa2 AACGGTCAGGGCGTTGCATTTaxa3 AACGGTCACGCCGCTGCATT
1 2 3
1 0 .05 .15
2 .05 0 .15
3 .15 .15 0
![Page 40: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/40.jpg)
Distance, dDistance, d
p is fractional similarity of sequence
Simplest form of distance: d = 1 - p
AACGGTCATGGCGTTGCATTAACGGTCACGGCGTTGCATT
p = 19/20 d = 0.05
![Page 41: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/41.jpg)
Tree buildingTree building
Neighbor joiningJoin most similar pair of sequencesAdd more divergent after
1 2 3
1 0 .05 .15
2 .05 0 .15
3 .15 .15 0
1
2
3
![Page 42: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/42.jpg)
How different can 2 sequences How different can 2 sequences get?get?
At infinite time, random probability that two sequences are the sameProbability a base is same = 1/4
DNA only has 4 basesCertain sites will start to change
multiple timesNeed to account for these multiple hits
![Page 43: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/43.jpg)
Random sequencesRandom sequences
Write down 20 bases of sequence
![Page 44: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/44.jpg)
Compare your sequence Compare your sequence to this oneto this one
AGTCCGATTACGGCTAGCAG
What fraction of sites are the same in the two sequences?
![Page 45: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/45.jpg)
Sequence similarity Sequence similarity decays to 25% over long decays to 25% over long
timestimes
0
0.2
0.4
0.6
0.8
1
1.2
0 0.5 1 1.5 2 2.5 3 3.5
Time
Sequence similarity
![Page 46: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/46.jpg)
Sequence difference Sequence difference maxes at 0.75maxes at 0.75
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.5 1 1.5 2 2.5 3 3.5
Time
Sequence difference
![Page 47: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/47.jpg)
Sequence change accumulates Sequence change accumulates linearly with time at beginninglinearly with time at beginning
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.5 1 1.5 2 2.5 3 3.5
Time
Sequence difference
![Page 48: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/48.jpg)
DNA modelsDNA models Use different DNA models to
account for how sequences evolve with timeAllows you to apply different molecular
clocksRelate sequence change to timeClock is not linear except for small
changes and short times Models same as used in maximum
likelihood methods
![Page 49: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/49.jpg)
How good is your tree?How good is your tree?
Bootstrap approachRun the same method multiple timesSubsample data each time
Use 50% of dataSee how reproducible the trees areCount how many times a particular
grouping occurs
![Page 50: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/50.jpg)
Distance tree Distance tree for rod and for rod and cone cone transducin transducin alpha alpha subunitsubunit
Branch lengths Branch lengths are are proportional to proportional to sequence sequence
differencesdifferences
![Page 51: Lecture 5 : Phylogenies](https://reader035.vdocument.in/reader035/viewer/2022062222/56814fcd550346895dbd8fb9/html5/thumbnails/51.jpg)
Boot strap values are given for each node which tells how reproducible that
grouping is
58
100
100
95
98
72
69
72
98
86
98
68
97