rna structure, evolution and...
TRANSCRIPT
RNA Structure, Evolution and Phylogenetics
Dr. P. Higgs, U. Manchester (ITP 5/15/01) 1
Supported by BBSRC
RNA Structure, Evolution and Phylogenetics
School of Biological Sciences
Paul Higgs
David Hoyle
Cendrine Hudelot
Daniel Jameson
Stephen Morgan
Nick Savill
Dept of Computer Science
MagnusRattray
Howsun Jow
RNA Structure, Evolution and Phylogenetics
Dr. P. Higgs, U. Manchester (ITP 5/15/01) 2
Secondary S tructure: small subunit ribosomal RNA
��� ����� ���� � � ��� � � �� � � � ��� �
D o m a in : � � � � � � � �K in g d om : � � � "! # $ % & ' & ( )O rd e r : * + , -".�/ 0
132 4 5 687 9 9 :; <�= > ?A@ @ B
5 '
3 '
UUUCAACGGAG A G U U U G
AUCCUGGCUCAGGAU
GAACGCUGGCGGCG
UGC
UUA
ACAC AUGC
A
A
G U CG A
A C G A UG A
U C C G G U G CUU
GCACCGGGGAUUAGUGGC
GA
ACGGGUG A
GU
AACA
CGUGAGUA
ACC
UG
CCCUU
GACU C U G G GA U A A G C C U G G G
AA
ACUGGGUCUAAU
ACCGGAUA
UG
ACUCCUCAUCG
CAUGGUGGGGGGU
GG
AAAGCUU
UU U G U G
GUU
UUGG
AU
GGACUCGCGGC
CUA
UC
AG
CUU
GUU
GGU
GAGG
UAAUG
G CUC
A CC
A AG
GCGACG A C
GG
GU
A GCCGGCCUG
AG A G G G U
G AC
C G GC CA
CA
CUGGGACUGAGACA C GG
C C C A GAC
UCC
UA
C GG
GAG G C A G
CAGUGG
GGA A
UAUU
GCACAA
UGGGCGAA
A G C C U G A U G C A GCGA CG CCGCGUGAGG
GAUGA
CGGCCU
UC
G G G U UG U A A A
C C U CUU
UCA
GUA
GGGAAGAAG
CGA
A A G UGAC G G
UAC
CUG
C AGA
AGAA G
CGCCGGC
U A A CUACGUGC
CAGCA G C C GC
GGU
AAU
ACGUAG
GGCGCAA G CGUU
A UC CGGAA
UU A U
UGGGCGU
AA
AGAGCUCGU
AGGCGGUUUGUC
GCGUCUGCCGUG
AAAG
UCCGGGGCU
CA A C U C C G G
A U CU G C G G U G G G
U AC G G G C A G A C
UA
GAGUGAUGU
AGG
GGAGACUGGAAUUCCUGGUGUAGCGGU
GA
A A U G CGC A G A
UA U C A G G A G G A A C A C C G A
U GG C G
AA
GGCAGGUCUCUGG
GCAUUAACUG
ACG CU G A
GGA G CG
AA
A GCAUGGGG
AG CGAA
CAGG
AUU
A G AUA
CCCUG
GUA
G
UCCAUGC C G U
AAAC
GUU
G G G C A C UA GGU
GUG
GGG
GAC
AU U C
CA
CGU
UUU
CCG
CGCC GU
AG
CUA
ACGCAU
UAAGUGCCC
CGCCUGG G G A G UA C G G C C G
CA
AGGCUAAA
ACUCAAA G G A A U U G A C G
GG G G C C C G C A C A A G CG G C
GGAGCAUGCGGAU
UAAUU
CGA
UGCA
AC G CG
AAGA
AC C U U
A CCAAGGCU
UGA
CAUG
GACCGGACCGCCGCAG
AA
A U G U G G U U U CUCCU
U UUGGGGC
CGGUU
CA C AGG
UGGUGC
A UG
GUUG
UCGUCA
GCUCGUGUCG
UGAGAUGUUGGG
UU
A AGU
CCCG C
AA C G A G C
GC A A CC C U C G U U C C A U G U U G C C
A GC G C G U
AA
UGGCGGGGACU
CAUGGGAG
ACUG
CC
GGG
GU
CAAC
UC
GGA
GG
AAG
GUGGGGA
CGACG
UC AAAU C
AUC
AUGCCCCU U
AUG
UCUUGG
GC
UUC
ACGCAUGCUA CA A UG GCCGGU
AC
A A A G GGUU GC
GA U A C U G U
GAGGU
GGA
GCU
AAUCCC
AA
AAAGCCGGUCUCA
GU
UCGGAU
UGGGGUC U
GC
AACUCGA
CC
CCA
UG
AAGU
CG
GAGUCGCU
AGUA
A UCGCA
G A UCA
GCAAC
GCU
GCGGUGAAUA
CG
UUC
CCGGGCCUUGU
ACA
CACCGC
CCG
UC
AAGUC
ACGA
AAGUUGGU
AACACC
CGAA
GCCGGUG
GCCUA
A CCC
CU U G
UGGGA
GGGA
GCCG
UC
GAA
GGUG
GG
ACUGGC
GA
UUG
GGACUA
AGU
CGU
AAC
A AG
G
U A G C C G U A C C GGA
AGGUGCGGCUGGAUCACCUCCUUUCU
Typical transfer RNA structure
RNA Structure, Evolution and Phylogenetics
Dr. P. Higgs, U. Manchester (ITP 5/15/01) 3
G
G
G
C
A
A C
U
U AA
C
C G
A
C
U
U
A
CG C
G
A
C
U A
C
C
G C
U
A
C
G
A
A B
1
2
3
4
5
6
7
8
9
10
∆E = 3
∆E = 5
A B
A B
∆E = 1
A B
Trailer No. 2 RNA as a Disordered System Measure Barriers Between Groundstates
The relevant barrier height controlling kinetics at low temperature is the highest point on the lowest route
RNA Structure, Evolution and Phylogenetics
Dr. P. Higgs, U. Manchester (ITP 5/15/01) 4
1234567 7654321((((((( )))))))
Bacillus subtilis GGCUCGG CCGAGCCEscherichia coli GCCCGGA UCCGGGCSaccharomyces cerevisiae GCGGAUU AAUUCGCDrosophila melanogaster GCCGAAA UUUCGGCHomo sapiens GCCGAAA UUUCGGC
Compensatory Substitutions
Two sides of the acceptor stem from a tRNA are shown.
Due to structure conservation alignment is possible in widely different species.
Two possible mechanisms of compensatory substitutions
Frequency in population
GC GU AUAUGC
GU
time time
Two single substitutions
One double substitution
RNA Structure, Evolution and Phylogenetics
Dr. P. Higgs, U. Manchester (ITP 5/15/01) 5
Second site
A C G U
A
C
G
U
Population genetics model
First
site
Selection
Watson Crick Pairs Fitness 1
Mismatches Fitness 1-s
GU and UG pairs
Population size N Mutation rate u
Solve diffusion equation.
Higgs (1998) Genetica
Probability F(x) of observing a fraction x of AU sequences
F(x) is controlled by a parameter α = 8Nu2/s
α >> 1 all 4 WC pairs present simultaneously (curve a)
α << 1 population is monomorphic but switches between the 4 WC states (curve c)
RNA Structure, Evolution and Phylogenetics
Dr. P. Higgs, U. Manchester (ITP 5/15/01) 6
29680212121Number of pairs
4558464754884Number of sequences
0.3520.2980.1220.1730.0200.0210.014
0.3850.2960.1170.1040.0500.0220.026
0.4730.3200.0570.0770.0310.0200.022
0.3720.2600.1280.1420.0430.0250.030
0.2660.1210.2570.2330.0460.0300.046
Frequencies GCCGAUUAGUUG
MM
0.5450.674
0.5940.730
0.6360.829
0.5320.681
0.3390.448
G+C averageG+C heli cal regions
SSUrRNA
RnasePtRNAarchaea
tRNAgeneral
tRNAmitoch.
Analysis of RNA sequence databases
Selection for thermodynamically stable structures
Higgs (2000) Quart. Rev. Biophysics
A Substitution Rate Model for RNA Stems for use with Molecular Phylogenetics
Pij(t) = probabilit y of being in state j at time t
given that ancestor was in state i at time 0.
States label pairs (AU, GC etc.) not single bases.
i
t
j
kjk
ikij
rPdt
dP∑=
RNA Structure, Evolution and Phylogenetics
Dr. P. Higgs, U. Manchester (ITP 5/15/01) 7
*7
*6
*5
*4
*3
*2
*1
7654321
676575474373272171
677565464363262161
577566454353252151
477466455343242141
377366355344232131
277266255244233121
177166155144133122
απαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπαπ
MM
CG
UG
UA
GC
GU
AU
MMCGUGUAGCGUAU
Model 7A is a General Reversible 7-state Model
7 frequencies πi + 21 rate parameters αij
- 2 constraints = 26 free parameters
Probability of remaining in same state Pii
SSU rRNA sequences from Eubacteria
RNA Structure, Evolution and Phylogenetics
Dr. P. Higgs, U. Manchester (ITP 5/15/01) 8
Probability Pij of changes from CG to other pairs
SSU rRNA from Eubacteria
AU
GC
GU
CG
UA
UG
slow
fastfast
Selection against GU and UG is weaker than against mismatches.
Double transitions are faster than double transversions.
Double transitions are faster than single transitions to GU and UG states. This is explained by the theory of compensatory substitutions.
What is going on?
RNA Structure, Evolution and Phylogenetics
Dr. P. Higgs, U. Manchester (ITP 5/15/01) 9
2.83.68.92.01.6Double transitions /Transitions to GU or UG
2.13.12.31.74.7Double transitions / Double transversions
0.550.661.400.933.924.367.84
0.650.601.461.091.722.845.24
0.450.894.011.781.853.000.86
0.490.831.461.241.965.010.99
0.670.840.860.772.443.322.32
Mutabilities GCCGAUUAGUUG
MM
SSUrRNA
RnasePtRNAarchaea
tRNAgeneral
tRNAmitoch.
Analysis of RNA Substitution Rates
Thermodynamic properties influence Evolutionary properties
Model Selection
Savill , Hoyle & Higgs (2001) Genetics
How complex must a model be to explain the data?
How many parameters are justified?
Many other rate models have been proposed:
• Schöniger & von Haeseler• Muse• Rzhetsky• Till ier & Collins
Use likelihood-based statistical tests to distinguish between models.
RNA Structure, Evolution and Phylogenetics
Dr. P. Higgs, U. Manchester (ITP 5/15/01) 10
*MM
*CG
*UG
*UA
*GC
*GU
*AU
MMCGUGUAGCGUAU
sd
ss
ds
sd
ss
ds
γπγπγπγπγπγπγπαπαπβπβπβπγπαπαπβπβπβπγπαπαπβπβπβπγπβπβπβπαπαπγπβπβπβπαπαπγπβπβπβπαπαπ
654321
754321
764321
765321
765421
765431
765432
7
6
5
4
3
2
1
7654321
AU
GC
GU
αs
αs
MM
CG
UA
UG
αs
αsαd αd
β
γ γ
Model 7D
Also called OTRNA by
Tillier & Collins (1998)
*UU
*UC
*GG
*GA
*CU
*CC
*CA
*AG
*AC
*AA
/////*/CG
/////*/UG
//////*UA
/////*/GC
/////*/GU
//////*AU
UUUCGGGACUCCCAAGACAACGUGUAGCGUAU
CCGAGA
UCAGAG
AACUCU
GCAUCU
UCAGGA
UUAAGG
GUCAGU
GCACUU
CCGAGU
GCGCUU
GUCAAU
UCGACA
UCGCAG
UGACAU
UGACCA
UCGCAG
απαπφβπλβπφβπλβπαπαπβπφβπλβπλβπ
απαπλβπφβπλβπφβπαπβπαπλβπλβπφβπ
απαπβπλβπφβπλβπαπαπβπβπλβπλβπ
βπβπβπβπλαπλαπαπβπαπλβπφβπλβπ
βπβπβπβπλαπλαπαπβπαπβπλβπλβπ
λβπλβπλβπλαπλβπλφαπφβπφβπφβπφβπφλαπϕλαπλβπλβπλβπλαπλβπλφαπ
λβπλβπλβπλβπλαπλφαπφβπφβπφβπφβπφλαπϕλαπλβπλβπλβπλαπλβπλφαπ
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
000000000
Model 16D and related models
Proposed by Muse (1995)
Considers all 16 states.
Only single substitutions allowed. Distinguishes between paired and mismatch states.
RNA Structure, Evolution and Phylogenetics
Dr. P. Higgs, U. Manchester (ITP 5/15/01) 11
6A
6C
6B
6D
7A
7C 7D 7B
7E 7F
16H
16F16E
16G
16D
16B 16C
16A
Families of Nested Models
Bacill us subtili sRhodomicrobium vannielii
Sphingomonas capsulata
Pseudomonas aeruginosa
Escherichia coliT1
T2
T3
T6
T7
T4
T5
Test models using “known” phylogeny of five bacteria.
Small sub-unit ribosomal RNA sequences.
RNA Structure, Evolution and Phylogenetics
Dr. P. Higgs, U. Manchester (ITP 5/15/01) 12
Likelihood Ratio Test shows that rates of double substitutions are non-zero.
Likelihood Ratio Test shows General Reversible Model 7A is preferable to 7D.
RNA Structure, Evolution and Phylogenetics
Dr. P. Higgs, U. Manchester (ITP 5/15/01) 13
C o n d y lu ra *
C h o l o e p us d id . * C h o l o e p u s h o f.Ta m a n d u aM yrm e co p h a g aE u p h ra c tu s * C h a e to p h ra c tu sTr ich e ch u s *L o xo d o n ta * P ro c a v ia * E ch in o p s * O r yc te ro p u s *M a c ro sce l id e sE le p h a n tu lu s *D i d e lp h is * M a c ro p u s *
S ire n iaP ro b o sc i d e aH y ra c o id e aTe n re c id a eTu b u lid e n ta t a
X e n a rth ra
M a c r o sce l id e a
M arsup ia lia
Tethy theria
Paenungulata
Afrotheria
100 77 99
94< 50 99
97 63
< 50 79
< 50< 50 50
< 50< 50
85 62
< 50
656479
100100100
100100100
100100
100100
100100
100100100
100100
936485
100100 C a v ia *
H y d ro c h a e ru sA g o u tiE re th izo nM yo c a s to rD i n o m ysH y s tr ixH e te ro ce p h a lu s *M u s *R a tt u sC r ic e tu sP e d e te sC a s to rD i p o d o m ysTa m ia s *M u sca rd in u sS y lv ila g u s *O ch o to n a *H y lo b a te sH o m o *M a ca ca *A te le s *C a ll im icoC y n o ce p h a lu s * / * * L e m u r *Ta rs iu s *Tu p a ia *
Cavio-morpha Hystrico-
gnathi
Anthropoidea P ri m a te s
D e rm o p te ra * *
S ca n d e n tia
R o d e n ti a
L a g o m o rp h a
Glires
Lemuriform esTarsiiform es
P ri m a te s
88< 50
5398
90100
995295
97 87100
7287
9784
988099
60< 50< 50
67< 50< 50
9070
100100
100100100
100100
100100100
100100100
100100
100100
7183
100100100
9995
100100
III
749999
M e g a p te ra *Tu rs io p sH i p p o p o ta m u s *Tra g e la p h u sO ka p i aS u sL a m a * C e ra to th e r iu m *Ta p iru sE q u u s *F e lis * L e o p a rd u sP a n th e raC a n i s * U rsu sM a n is *A rti b e u s *N y c te r isP te ro p u s * R o u s e ttu sE ri n a ce u s *S o re x *A s io sca lo p s
Cetacea
C e ta r t i o d a c ty la
P e ri sso d a c ty l a
C a rn ivo ra
P h o lid o ta
C h i ro p te raM egachiroptera
Microchiroptera
'E u l ip o typ h la '
6677
9199
637267
92100 90
100100
9693
5270
100100100
100100100
100 97 98
100 96100
100100
100 99
100100100
100100
96 97100
1 0 01 0 0
1 0 01 0 0
IV
II
I
100100
< 50< 50 63
< 50< 50 55
< 50< 50
72
Murphy et al.
Nature (2001)
Afrotheria / Laurasiatheria
RNA Structure, Evolution and Phylogenetics
Dr. P. Higgs, U. Manchester (ITP 5/15/01) 14
RNA Phylogeny Package
Soon to be released …
Uses a range of evolutionary models designed specifically for RNA paired regions (as well as standard single site models).
Secondary structure is specified in the alignment as bracket notation.
Exhaustive Maximum Likelihood search for small numbers of species or small numbers of clusters.
Statistical tests to distibuish between proposed trees: Kishino-Hasegawa (important to consider correlation between sites)
Markov Chain Monte Carlo methods for tree searching.
RNA Structure, Evolution and Phylogenetics
Dr. P. Higgs, U. Manchester (ITP 5/15/01) 15
Sub-trees in theLaurasiatheria Group
Carnivora
(((Harbour Seal, Gray Seal), Dog), Cat)
Perissodactyla
((Donkey, Horse),(Indian Rhino, White Rhino))
Cetartiodactyla
((((Blue Whale, Sperm Whale), Hippo),(Cow, Sheep)), Pig)
Chiroptera
((Long-tailed Bat, Fruit-eating Bat),(Flying Fox, Flying Fox))
Insectivora
(Hedgehog, Mole)
Outgroup: Rabbit
Maximum Likelihood Cluster Arrangements
RNA Structure, Evolution and Phylogenetics
Dr. P. Higgs, U. Manchester (ITP 5/15/01) 16
Preliminary Results with MCMC
43 mammals with completely sequenced mitochondrial genomes
SSU + LSU + 14 tRNAs on the same strand
Study paired regions using the 7D model
Well -defined groups: 100% support
e.g. Primates, Marsupials, Rodents, Carnivores.
Four large-scalecladeswell -defined
Afrotheria (100%), Laurasiatheria (97%), Primates+Glires (86%), Xenarthra (100%)
Insectivores+Bats (87%)
A few wandering species: e.g. Pig
Thoughts for the day…
Thermodynamic parameters affect evolutionary parameters.
Rate models for phylogenies need to be tailored to the sequences studied.