research on mitochondrial genomes lectures for 4y03 paul higgs dept. of physics, mcmaster...
TRANSCRIPT
Research on Mitochondrial GenomesLectures for 4Y03
Paul Higgs
Dept. of Physics, McMaster University, Hamilton, Ontario.
Supported by
Canada Research Chairs
and BBSRC
1. Building a database for mitochondrial genomes.
2. Large scale - gene order evolution.
3. Medium scale – sequence evolution. Molecular phylogenetics.
4. Small scale – mutation and selection. Variation in base and amino acid frequencies. Codon usage.
5. Genetic code evolution People:
1. Wenli Jia, Bin Tang, Daniel Jameson
2. Howsun Jow, Magnus Rattray, Cendrine Hudelot, Vivek Gowri-Shankar, Xiaoguang Yang
3. Wei Xu, Daniel Jameson
4. Daniel Urbina, Wenli Jia.
5. Supratim Sengupta
Mitochondria are organelles inside eukaryotic cells.
They are the site of oxidative phosphorylation and ATP synthesis.
They contain their own genome distinct from the DNA in the nucleus.
Typical animal mitochondrial genomes are short and circular (~16,000 bases).
They usually contain:
2 rRNAs
22 tRNAs
13 proteins
LOCUS NC_001922 16646 bp DNA circular VRT 20-SEP-2002DEFINITION Alligator mississippiensis mitochondrion, complete genome.ACCESSION NC_001922VERSION NC_001922.1 GI:5835540KEYWORDS .SOURCE mitochondrion Alligator mississippiensis (American alligator) ORGANISM Alligator mississippiensis Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archosauria; Crocodylidae; Alligatorinae; Alligator.REFERENCE 1 (bases 1 to 16646) AUTHORS Janke,A. and Arnason,U. TITLE The complete mitochondrial genome of Alligator mississippiensis and the separation between recent archosauria (birds and crocodiles) JOURNAL Mol. Biol. Evol. 14 (12), 1266-1272 (1997) MEDLINE 98066357 PUBMED 9402737FEATURES Location/Qualifiers source 1..16646 /organism="Alligator mississippiensis" /organelle="mitochondrion" /mol_type="genomic DNA" /db_xref="taxon:8496" /tissue_type="liver" /dev_stage="adult" rRNA 1..976 /product="12S ribosomal RNA" tRNA 977..1044 /product="tRNA-Val" /anticodon=(pos:1009..1011,aa:Val) rRNA 1046..2635 /product="16S ribosomal RNA" tRNA 2636..2710 /product="tRNA-Leu" /note="codons recognized: UUR" /anticodon=(pos:2672..2674,aa:Leu) gene 2711..3676 /gene="ND1" CDS 2711..3676 /gene="ND1"
1 caacagactt agtcctggtc ttttcattag ctagtactca acttatacat gcaagcatcc 61 gcgaaccagt gagaacaccc tacaagtctg acagacgaat ggagccggca tcaggcacat 121 caaccgatag cccaaaacgc ctagcccagc cacaccccca agggtctcag cagtgattaa 181 ccttaaacca taagcgaaag cttgatttag ttagagtaga tatagaggcg gtcaactctc 241 gtgccagcaa ccgcggttag acgaaaacct caagttaatt gacaaacggc gtaaattgtg 301 gctagaactc tatctccccc attagtgcag atacggtatc acagtagtga taaacttcat 361 cacaccgcaa acatcaacac aaaactggcc ctaatctcaa agatgtactc gattccacga 421 aagctgagaa acaaactggg attagatacc ccactatgct cagcccttaa cattggtgta 481 gtacacaaca gactaccctc gccagagaat tacgagcccc gcttaaaact caaaggactt 541 gacggcactt taaacccccc tagaggagcc tgtcctataa tcgacagtac acgttacacc 601 cgaccacctt tagcctactc agtctgtata ccgccgtcgc aagcccgtcc catttgaggg 661 aaacaaaacg cgcgcaacag ctcaaccgag ctaacacgtc aggtcaaggt gcagccaaca 721 aggtggaaga gatgggctac attttctcaa catgtagaaa tattcaacgg agagccctat 781 gaaatacagg actgtcaaag ccggatttag cagtaaactg ggaaagaata cctagttgaa 841 gtcggtaacg aagtgcgtac acaccgcccg tcaccctcct cgaacccaac aaaatgccca 901 aacaacaggc acaatgttgg gcaagatggg gaaagtcgta acaaggtaag cgtaccggaa 961 ggtgcacttg gaacatcaaa atgtagctta aatttaaagc attcagttta cacctgaaaa 1021 agtcccacca tcggaccatt ttgaaaccca tatctagccc tacctccttt caacatgctt
An example of a GenBank file
Complete mitochondrial genome of the Alligator
OGRe (= Organellar Genome Retrieval) is a relational database.
available at http://ogre.mcmaster.ca
More than 800 complete animal mitochondrial genomes.
Efficient means of storage and retrieval of information. Uses PostgreSQL
Schema defines relationships between different types of information.
genome_codecodonstrandusage
codon_usage
group_nameparentalternative_parenthas_children
classification
genome_codeoffstfilename
fileindex
genome_codemedline_code
citations
species_codeclassificationgroup_namelatin_namecommon_name
species
feature_idstartstopstrand
feature_location
feature_idamino_acidanticodoncodon
trna
feature_namedescription
feature_descriptions
feature_idgenome_codetypefeature_namenotesdescriptionalignment_file
feature
genomegenome_codespecies_codegenome_typencbiacdescriptionncbidateaccession_datelastmodifiedlastmodifiedbynotesgenome_lengthgenetic_codea_contentc_contentg_contentt_contentfinalgene_ordergene_order_notrnarna_a_contentrna_c_contentrna_g_contentrna_t_content
The OGRe front page: http://ogre.mcmaster.ca
Sequence information for OGRe is taken from GenBank. We aim to keep up to date with publicly available animal mitochondrial genomes.
Species may be selected individually from an alphabetical list
Or taxa may be selected from a hierarchy. Here the Arthropods have been expanded and the Myriapods and Crustaceans have been selected
On the ogre web site, a visual comparison can be made of any two selected species. Colour is used to indicate conserved blocks of genes.
Alligator and Bird genomes differ by interchange of two tRNA genes (red and yellow)… …and by translocation of the two
genes in the blue block.
Large Scale – Evolution of Gene Order in Whole Genomes
Genome reshuffling mechanisms
Inversions:
A B C D A D
-C -B
A
BC
D
Translocations:
A (B C) D A D B C
Duplications and deletions
A B C B C DA B C D A C B D/ /
Example of an inversion
Example of a translocation
The T and –F genes are duplicated in Cordylus warreni.
If the first T and the second –P were deleted, the relative position of T and –P would change.
Sometimes things go crazy ….
Drosophila and Thrips are both insects
yet there are 30 breakpoints for only 37 genes
i.e. almost nothing in common.
OGRe contains gene orders as strings. This allows searching and comparison.
231 unique gene orders have been found in 858 species.
The standard vertebrate order is shared by 398 species (including humans). There are many other species with unique gene orders.
Some species conserve gene order over 100s of millions of years. Others get scrambled in a few million.
Still to do (new project) :
- estimate relative rates of different rearrangement processes
- predict most likely ancestral gene orders
- use gene order evidence in phylogenetics
Medium Scale – Sequence Alignments and Phylogenetics
Part of sequence alignment of Mitochondrial Small Sub-Unit rRNA
Full gene is length ~950
11 Primate species with mouse as outgroup
Mouse : Lemur : Tarsier : SakiMonkey : Marmoset : Baboon : Gibbon : Orangutan : Gorilla : PygmyChimp : Chimp : Human :
* 20 * 40 * 60 * CUCACCAUCUCUUGCUAAUUCAGCCUAUAUACCGCCAUCUUCAGCAAACCCUAAAAAGG-UAUUAAAGUAAGCAAAAGACUCACCACUUCUUGCUAAUUCAACUUAUAUACCGCCAUCCCCAGCAAACCCUAUUAAGGCCC-CAAAGUAAGCAAAAACCUUACCACCUCUUGCUAAUUCAGUCUAUAUACCGCCAUCUUCAGCAAACCCUAAUAAAGGUUUUAAAGUAAGCACAAGUCUUACCACCUCUUGCC-AU-CAGCCUGUAUACCGCCAUCUUCAGCAAACUCUA-UAAUGACAGUAAAGUAAGCACAAGUCUCACCACGUCUAGCC-AU-CAGCCUGUAUACCGCCAUCUUCAGCAAACUCCU-UAAUGAUUGUAAAGUAAGCAGAAGUCCCACCCUCUCUUGCU----UAGUCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGCUACGAAGUGAGCGCAAAUCUCACCAUCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGACAAAGGCUAUAAAGUAAGCACAAACCUCACCACCCCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGCCACGAAGUAAGCGCAAACCUCACCACCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGACGAAGGCCACAAAGUAAGCACAAGUCUCACCGCCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGUUACAAAGUAAGCGCAAGUCUCACCGCCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGUUACAAAGUAAGCGCAAGUCUCACCACCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGCUACAAAGUAAGCGCAAGUCucACC cuCUuGCu cAgccUaUAUACCGCCAUCuuCAGCAAACcCu A G aAAGUaAGC AA
: 78 : 78 : 79 : 76 : 76 : 75 : 75 : 75 : 75 : 75 : 75 : 75
69 Mammals with complete motochondrial genomes.
Used two models simulatneously
Total of 3571 sites
= 1637 single sites
+ 967 pairs
Hudelot et al. 2003
Afrotheria / Laurasiatheria
Striking examples of convergent evolution
0.1
TerebratulinaKatharina
LimulusHeptathela
OrnithoctonusHabronattus
VarroaCarios
Ornithodoros moubataOrnithodoros porcinus
RhipicephalusAmblyomma
HaemaphysalisIxodes holocyclus
Ixodes hexagonusIxodes persulcatus
ScutigeraLithobius
ThyropygusNarceus
SpeleonectesVargula
HutchinsoniellaTigriopus
ArmilliferArgulus
TetraclitaPollicipes
PenaeusCherax
PortunusPanulirus
PagurusArtemia
TriopsDaphnia
TetrodontophoraGomphiocephalus
TricholepidionLocusta
AleurodicusTriatoma
PhilaenusThrips
LepidopsocidHeterodoxus
PyrocoeliaTriboliumCrioceris
ApisMelipona
OstriniaAntheraeaBombyx
AnophelesDrosophilaChrysomya
Images coutesy of University of Nebraska, Dept.of Entomology.
http://entomology.unl.edu/images/
Arthropod phylogenetics
Very difficult due to strong variation in rates of evolution between species.
tRNA tree – branch lengths optimized on fixed consensus topology
Long branch species are problematic if tree is not fixed.
Images coutesy of University of Nebraska, Dept.of Entomology.
http://entomology.unl.edu/images/
protein tree – branch lengths optimized on fixed consensus topology
0.1
TerebratulinaKatharina
LimulusHeptathela
OrnithoctonusHabronattusVarroa
CariosOrnithodoros moubata
Ornithodoros porcinusRhipicephalus
AmblyommaHaemaphysalis
Ixodes holocyclusIxodes hexagonus
Ixodes persulcatusScutigera
LithobiusThyropygus
NarceusSpeleonectes
VargulaHutchinsoniella
TigriopusArmillifer
ArgulusTetraclitaPollicipes
PenaeusCherax
PortunusPanulirus
PagurusArtemia
TriopsDaphnia
TetrodontophoraGomphiocephalus
TricholepidionLocusta
AleurodicusTriatoma
PhilaenusThrips
LepidopsocidHeterodoxus
PyrocoeliaTribolium
CriocerisApis
MeliponaOstrinia
AntheraeaBombyx
AnophelesDrosophilaChrysomya
Same species are on long branches in proteins as in RNAs
Relative rate test for sequence evolution - Templeton
0 1 2
Three aligned sequences with 0 known to be the outgroup. Test whether rates of evolution in branch 1 and branch 2 are equal.
m1 = number of sites where 0 and 2 are the same and 1 is different.m2 = number of sites where 0 and 1 are the same and 2 is different.
)(
)(
21
2212
mm
mmm
Calculate:
Should follow a chi squared distribution with one degree of freedom.
Many pairs of related species found to have different rates in the mitochondrial sequences.
The gene order of the ancestral arthropod is thought to be the same as that of the horseshoe crab Limulus.Image courtesy of Marine Biology Lab, Woods Hole. www.mbl.edu/animals/Limulus
Gene Order sometimes gives evidence of phylogenetic relationships
The same translocation of tRNA-Leu is found in insects and crustaceans but not myriapods and chelicerates. Strong argument for the group Pancrustacea (= insects plus crustaceans)
Moderately rearranged
Completely scrambled
Breakpoints Inversions Dup/Del tRNA ProteinTigriopus japonicus 35 32 0 2.15 1.34Heterodoxus macropus 35 32 0 1.39 1.83Thrips imaginis 32 29 1 1.34 1.32Pollicipes polymerus 22 16 2 0.69 0.59Cherax destructor 20 16 0 0.54 0.57Tetraclita japonica 20 16 0 0.66 0.57Argulus americanus 20 18 0 0.72 1.12Speleonectes tulumensis 19 16 1 0.83 0.93Apis mellifera 19 16 0 0.84 1.50Hutchinsoniella macracantha 18 16 0 0.86 0.87Pagurus longicarpus 18 12 0 0.65 0.45Vargula hilgendorfii 17 15 0 0.79 1.41Lepidopsocid RS-2001 17 16 0 0.60 0.59Habronattus oregonensis 16 14 0 1.48 1.09Ornithoctonus huwena 15 13 0 1.95 1.23Scutigera coleoptrata 15 15 0 0.48 0.44Melipona bicolor 14 8 2 0.93 1.66Varroa destructor 14 12 0 0.83 1.09Armillifer armillatus 13 12 0 0.85 1.73Narceus annularus 9 9 0 0.63 0.58Thyropygus sp. 9 9 0 0.49 0.46Aleurodicus dugesii 8 5 1 1.04 1.54Anopheles gambiae 8 6 0 0.41 0.47Tetrodontophora bielanensis 8 6 0 0.77 0.70Artemia franciscana 7 5 0 0.63 0.64Rhipicephalus sanguineus 7 6 0 0.82 0.96Amblyomma triguttatum 7 6 0 0.88 1.00Haemaphysalis flava 7 6 0 0.82 0.96Locusta migratoria 6 5 0 0.38 0.52Bombyx mori 6 5 0 0.51 0.54Portunus trituberculatus 6 5 0 0.51 0.44Ostrinia furnacalis 6 5 0 0.49 0.48Tribolium castaneum 6 5 0 0.55 0.53Antheraea pernyi 6 5 0 0.50 0.54Chrysomya putoria 4 2 1 0.36 0.42Tricholepidion gertschi 3 2 0 0.44 0.39Daphnia pulex 3 2 0 0.62 0.51Pyrocoelia rufa 3 2 0 0.52 0.77Drosophila melanogaster 3 2 0 0.37 0.42Panulirus japonicus 3 2 0 0.58 0.53Triatoma dimidiata 3 2 0 0.59 0.50Lithobius forficatus 3 3 0 1.13 0.61Philaenus spumarius 3 2 0 0.69 0.58Gomphiocephalus hodgsoni 3 2 0 0.69 0.62Penaeus monodon 3 2 0 0.34 0.32Crioceris duodecimpunctata 3 2 0 0.55 0.58Triops cancriformis 3 2 0 0.42 0.40Limulus polyphemus 0 0 0 0.36 0.40Ixodes persulcatus 0 0 0 0.72 0.82Ixodes holocyclus 0 0 0 0.76 0.83Ixodes hexagonus 0 0 0 0.74 0.90Carios capensis 0 0 0 0.70 0.79Ornithodoros porcinus 0 0 0 0.67 0.86Heptathela hangzhouensis 0 0 0 0.76 0.87Ornithodoros moubata 0 0 0 0.68 0.88
Very High
High
Medium
Low
Species ranked according to breakpoint distance from ancestor.
R =0.99 R =0.59
R =0.53 R =0.69
min mean max min mean maxVery High 1.33 1.62 2.14 1.32 1.50 1.83High 0.48 0.86 1.94 0.44 0.99 1.73Moderate 0.38 0.63 1.04 0.43 0.69 1.54Low 0.34 0.60 1.13 0.32 0.62 0.90
tRNA only High 0.66 1.01 1.94 0.57 1.15 1.73tRNA only Mod/Low 0.34 0.60 1.13 0.32 0.63 1.54
tRNA distance protein distanceBreakpoint category
Highly rearranged genomes have highly divergent sequences.
Rates of sequence evolution and genome rearrangement are correlated.
Both are very non-clocklike.
There are many species where only tRNAs have changed position.
Species with highly reshuffled tRNAs have high rates of sequence evolution in both tRNAs and proteins.
Relative rate of genome rearrangement (Xu et al 2006)
0 1 2
Three gene orders with 0 known to be the outgroup. Test whether rates of rearrangement in branch 1 and branch 2 are equal.
n1 = number of gene couples in 0 and 2 but not in 1 – i.e. New breakpoint in 1n2 = number of gene couples in 0 and 1 but not in 2 – i.e. New breakpoint in 2
)(
)(
21
2212
nn
nnn
Calculate:
Should follow a chi squared distribution with one degree of freedom.
We took pairs where there was a significant difference in rearrangement rates (χn
2 was large) and showed that there was a significant difference in substitution rates too (χm
2 was large).
Good Guys
Credits:
Daniel Jameson/ Bin Tang – Database design and management
Daniel Urbina – Base and Amino Acid Frequencies
Wei Xu – Gene Order Analysis and Arthropod Phylogenies
Bad Guys
Gene order is sometimes a strong phylogenetic markerbut the Bad Guys are problematic in gene order analysis as well as phylogenetics.
Why does the evolutionary rate speed up in these isolated groups of species?Why to tRNA genes move more frequently?What are the relative rates of inversion and translocation?
Small Scale Evolution –
Variation in Frequencies of Bases and Amino Acids
G G C A A A A T
C C G T T T T A
The two strands of DNA are complementary.
Freq of A on one strand = Freq of T on the other
Freq of C on one strand = Freq of G on the other
If the two strands are subject to the same mutational processes then the freq of any base should be equal (statistically) on both strands.
This means that A = T and C = G on any one strand.
In this case base frequencies can be described by a single variable: G+C content.
BUT – mitochondrial genomes have an asymmetrical replication process. The two strands are not equivalent.
The frequencies of bases on the two strands are not equal.
On any one strand the frequencies of the four bases may vary independently.
Mitochondrial genome replication
Figure from
Faith & Pollock (2003) Genetics
Rank genes in order of increasing time spent single strandedCOI < COII < ATP8 < ATP6 < COIII < ND3 < ND4L < ND4 < ND1 < ND5 <ND2 < Cytb
ND6 is on the other strand
The Genetic Code maps the 64 DNA codons to the 20 amino acids.
(This version applies to Vertebrate Mitochondria) SECOND POSITION
T C A G THIRDPOSITION
FIRST
POSITION
T TTT F 1 TTC F
TCT STCC S 6TCA STCG S
TAT Y 10TAC Y
TGT C 17TGC C
T C A G
TTA L 2TTG L
TAA StopTAG Stop
TGA W 18TGG W
C CTT LCTC LCTA LCTG L
CCT PCCC P 7CCA PCCG P
CAT H 11CAC H
CGT RCGC R 19CGA RCGG R
T C A G
CAA Q 12CAG Q
A ATT I 3ATC I
ACT TACC T 8ACA TACG T
AAT N 13AAC N
AGT S 20AGC S
T C A G
ATA M 4ATG M
AAA K 14AAG K
AGA StopAGG Stop
G GTT VGTC V 5GTA VGTG V
GCT AGCC A 9GCA AGCG A
GAT D 15GAC D
GGT GGGC G 21GGA GGGG G
T C A G
GAA E 16GAG E
4-codon families where the third position is synonymous
Base frequencies at FFD sites in each gene (averaged over mammals)
Deamination: C to U and A to G on the heavy strand
Base frequencies at FFD sites are controlled by mutation.
Base frequencies at 1st and 2nd positions are influenced by mutation and selection
Model fitting (Data from Fish) – assume a fraction of fixed sites and a fraction of neutral sites.
Selection at 1st position is weaker than at 2nd
Mutation pressure is sufficient to cause change in amino acid frequencies.
Second Position
T C A G Third Pos.
First
Position
T F 1 F
SS 6SS
Y 10Y
C 17C
T C A G
LLL 2LLL
StopStop
W 18W
C PP 7PP
H 11H
RR 19RR
T C A G
Q 12Q
A I 3I
TT 8TT
N 13N
S 20S
T C A G
M 4M
K 14K
StopStop
G VV 5VV
AA 9AA
D 15D
GG 21GG
T C A G
E 16E
Slopes of the amino acid freq v base freq show the response of the amino acid to mutational pressure.
Black = fish
White = mammals
Amino acids in the first two columns of the code have larger slopes.
Physical Properties of Amino Acids
Vol.
Bulk. Polarity pI Hyd.1 Hyd.2 Surface Area
Fract.Area
Ala A 67 11.50 0.00 6.00 1.8 1.6 113 0.74
Arg R 148 14.28 52.00 10.76 -4.5 -12.3 241 0.64
Asn N 96 12.28 3.38 5.41 -3.5 -4.8 158 0.63
Asp D 91 11.68 49.70 2.77 -3.5 -9.2 151 0.62
Cys C 86 13.46 1.48 5.05 2.5 2.0 140 0.91
Gln Q 114 14.45 3.53 5.65 -3.5 -4.1 189 0.62
Glu E 109 13.57 49.90 3.22 -3.5 -8.2 183 0.62
Gly G 48 3.40 0.00 5.97 -0.4 1.0 85 0.72
His H 118 13.69 51.60 7.59 -3.2 -3.0 194 0.78 y2
y1
y3
Each Amino Acid is a point in 8-d space.
dij = Euclidean distance between a.a. i and j in 8-d space.
Principal Component Analysis Projects the 8-d space into the two ‘most important’ dimensions.
Big
Small
Hydrophobic Hydrophilic
Responsiveness measures how much an amino acid frequency varies in response to mutational pressure
= Root mean square of 8 slopes for each amino acid (i.e. 4 bases x 2 data sets)
Proximity measures how similar the neighbouring amino acids are in the genetic code = Mean of 1/d for accessible amino acids
e.g. Prox (T) =
02
2244622
24
1+
d+
d+
d+
d+
d+
d+
d TKTNTATPTSTMTI
E E
T C A G
GG GG
D D
AA AA
VV VV
G
StopStop
K K
M M
T C A G
S S
N N
TT TT
I I
A
Q Q
T C A G
RR RR
H H
PP PP
C
W W
StopStop
LLL LLL
T C A G
C C
Y Y
SS SS
F F
T FiRst
Position
Third Pos.
GACT
Second Position
An amino acid frequency responds to mutational pressure more easily if there are neighbouring amino acids with similar physical properties. Urbina et al. (2006) J. Mol. Evol.
Responsiveness and Proximity are highly correlated. R =0.87 (p < 10-6)
GAU
GAC
GAA
GAG
AAU
AAC
AAA
AAG
CAU
CAC
CAA
CAG
UAU
UAC
UAA
UAG
12
51
63
15
29
131
84
9
18
79
82
8
35
89
4
3
D
D
E
E
N
N
K
K
H
H
Q
Q
Y
Y
*
*
39
123
79
5
50
155
132
10
37
119
52
7
29
99
81
7
GCU
GCC
GCA
GCG
ACU
ACC
ACA
ACG
CCU
CCC
CCA
CCG
UCU
UCC
UCA
UCG
22
45
61
8
112
196
165
32
65
167
276
42
69
139
65
11
A
A
A
A
T
T
T
T
P
PPP
S
S
S
S
GUU
GUC
GUA
GUG
AUU
AUC
AUA
AUG
CUU
CUC
CUA
CUG
UUU
UUC
UUA
UUG
V
V
V
V
I
I
M
M
L
L
L
L
F
F
L
L
16
87
61
19
GGU
GGC
GGA
GGG
G
G
G
G
11
37
1
0
AGU
AGC
AGA
AGG
S
S
*
*
6
26
28
0
CGU
CGC
CGA
CGG
R
R
R
R
5
17
90
9
UGU
UGC
UGA
UGG
C
C
W
W
Homo sapiens Strand = + 3624 codons
1.654GG0.552CG1.433UG
1.027GA0.906CA1.136UA
1.005GC1.163CC0.743UC
0.763GU1.101CU0.939UU
Mammals - 23
1.891GG0.554CG1.274UG
1.145GA0.938CA1.030UA
0.878GC1.205CC0.756UC
0.605GU0.939CU1.250UU
Fish - 23
1.369GG1.293AG0.546CG0.856UG
0.776GA0.974AA0.945CA1.206UA
0.873GC0.797AC1.363CC0.994UC
1.115GU0.996AU1.082CU0.855UU
Mammals - 31
1.499GG1.228AG0.609CG1.049UG
0.758GA1.135AA0.849CA1.096UA
0.839GC0.739AC1.371CC0.918UC
0.911GU0.907AU1.162CU0.933UU
Fish - 31
Frequency ratios
))q(Yq(X
)Yp(X=)Yr(X
32
3232
Codon bias seems to be a dinucleotide mutational effect in mitochondria, rather than an effect of translational selection.
CpG effect.... (increased rate of C to U mutations in CG dinucleotides. Expect high UG and CA)
DNA binding proteins....
Asp TotalAnticodon GGCUGCCGCGUCUUGCUGGGGUGGCGGGGAUGACGA
Epsilon Proteobacteria
Campylobacter jejuni 1 3 0 2 1 0 0 1 0 1 1 0 42Helicobacter pylori J99 1 1 0 1 1 0 1 1 0 1 1 0 36Gamma Proteobacteria
Pseudomonas aeruginosa 2 4 0 4 1 0 1 1 1 1 1 1 63Vibrio parahaemolyticus 1 4 0 6 6 0 0 3 0 1 4 0 126Haemophilus influenzae 1 3 0 3 2 0 0 2 0 1 2 0 56Buchnera aphidicola 1 1 0 1 1 0 0 1 0 1 1 0 32#Blochmannia floridanus 0 1 0 1 1 0 0 1 0 0 1 1 36#Wigglesworthia glossinidia 1 1 0 1 1 0 0 1 0 1 1 0 34#Escherichia coli K12 2 3 0 3 2 2 1 1 1 2 1 1 86Alpha Proteobacteria
Agrobacterium tumefaciens 1 4 0 2 1 1 1 1 1 1 1 1 53Sinorhizobium meliloti 1 3 1 2 1 1 1 1 1 1 1 1 51Rickettsia prowazekii 1 1 0 1 1 0 0 1 0 1 1 0 33#Wolbachia (D. mel) 0 1 0 1 1 0 0 1 0 1 1 0 34#Caulobacter crescentus 1 2 1 2 1 0 1 1 1 1 1 1 51Mitochondria
Reclinomonas americana 0 1 0 1 1 0 0 1 0 0 1 0 26Homo sapiens 0 1 0 1 1 0 0 1 0 0 1 0 22
Pro Ser-UCNAla Gln
Changes in tRNA content of genomes from bacteria to mitochondria
Only one type of tRNA remains for each codon family in human mitochondria. Still need 2 tRNAs for Leu and Ser. Therefore 22 in total.
# denotes intracellular parasite or endosymbiont. Small size genomes in bacteria also have reduced numbers of tRNAs.
Evolution of the Genetic Code:Before and After the LUCA
1. The genetic code evolved to its canonical form before the Last Universal Common Ancestor of Archaea, Bacteria and Eukaryotes - >3 billion years ago. It appears to be highly optimized. How did it get to be this way?
2. Numerous small changes have occurred to the canonical code since then. What is the mechanism of codon reassignment?
Codon Reassignment – The Genetic code is variable in mitochondria (and also some cases of other types of genomes)
Second Position
U C A G Third Pos.
FirstPosition
U
F FLL
SS SS
Y YStopStop
C CStop W
U C A G
C
L LLL
PP PP
H HQ Q
RR RR
U C A G
A I II M
TT TT
N NK K
S SRR
U C A G
G VV V V
AA AA
D DE E
GG GG
U C A G
UGA Stop to Trp
AUA Ile to Met
CUN Leu to Thr
CGN Arg to unassigned
AGR Arg to Ser to Stop/Gly
etc.....
But how can this happen? It should be disadvantageous.
Porifera
Cnidaria
Arthropoda
Nematoda
Platyhelminthes
Lophotrochozoa
Echinodermata
Hemichordata
Urochordata
Cephalochordata
Craniata
AAA : Lys -> Asn
Loss of tRNA-Ile(CAU) but AUA remains Ile
Loss of tRNA-Arg(UCU) and AGR : Arg -> Ser
AGR : Ser -> Stop
AGR : Ser -> Gly
AUA : Ile -> Met
Loss of many tRNAs + import from cytoplasm
AAA : Lys -> unassigned
Reassignments in Metazoa
Example 1: AUA was reassigned from Ile to Met during the early evolution of the mitochondrial genome.
Before Codon Anticodon Notes
Ile
Ile
Ile
Met
AUU
AUC
AUA
AUG
GAU
k2CAU
CAU
G in the wobble position of the tRNA-Ile can pair with U and C in the third codon position Bacteria and some protist mitochondria possess another tRNA-Ile with a modified base that translates AUA only.
The tRNA-Met translates AUG only.
After Codon Anticodon Notes
Ile
Ile
Met
Met
AUU
AUC
AUA
AUG
GAU
UAU or
f5CAU
In animal mitochondria the k2CAU tRNA has been deleted.
There is a gain of function of the tRNA-Met by a mutation or a base modification
Example 2: UGA was reassigned from Stop to Trp many times (12 times in mitochondria).
Before Codon Anticodon Notes
Stop
Trp
UGA
UGG
RF
CCA
Release Factor recognizes UGA codon.
Normal tRNA-Trp translates only UGG codons.
After Codon Anticodon Notes
Trp
Trp
UGA
UGG
UCA In animal mitochondria (and elsewhere) there is a gain of function of the tRNA-Trp via mutation or base modification so that it translates both UGG and UGA.
The GAIN-LOSS framework
(Sengupta & Higgs, Genetics 2005)
LOSS = deletion or loss of function of a tRNA or RF
GAIN = gain of a new tRNA or a gain of function of an existing one.
Mutations in coding sequences
Initial Code.No Problem.
Ambiguous codon.Selective disadvantage.
Unassigned codon.Selective disadvantage.
New Code.Selective disadvantage because codons are used in wrong places
GAIN
GAINLOSS
LOSS
New Code.Codons now used in right places.No Problem.
Note – the strength of the selective disadvantage depends on the number of times the codon is used. There is no disadvantage if the codon disappears.
Four possible mechanisms of codon reassignment.
1. Codon Disappearance - The codon disappears. The order of the gain and loss is irrelevant.
For the other three mechanisms the codon does not disappear.
2. Ambiguous Intermediate – The gain happens before the loss. There is a period when the gain is fixed in the population and translation is ambiguous.
3. Unassigned Codon – The loss happens before the gain. There is a period when the loss is fixed in the population and the codon is unassigned.
4. Compensatory Change – The gain and loss are fixed in the population simultaneously (although they do not arise at the same time). There is no intermediate period between the old and the new codes. - cf. theory of compensatory substitutions in RNA helices.
Sengupta & Higgs (2005) showed that all four mechanisms work in a population genetics simulation
Codon reassignment
No. of
times
Can this be explained by
GCAU mutation pressure?
Change in No.
of tRNAs
Is mispairing important?
Mechanism
UAG: Stop Leu 2 G A at 3rd pos. +1 No CD
UAG: Stop Ala 1 G A at 3rd pos. +1 No CD
UGA: Stop Trp12
G A at 2nd pos. 0
Possibly. CA at 3rd pos.
CD
CUN: Leu Thr 1 C U at 1st pos. 0 No CD
CGN: Arg Unass 5 C A at 1st pos. -1 No CD
AUA: Ile Met or Unassigned
3 / 5No
-1Yes. GA at 3rd pos.
UC
AAA: Lys Asn 2
No0
Yes. GA at 3rd pos.
AI
AAA: Lys Unass1
No0
Possibly. GA at 3rd pos.
UC or AI
AGR: Arg Ser 1
No-1
Yes. GA at 3rd pos.
UC
AGR: Ser Stop 1 No 0 No AI(b)
AGR: Ser Gly 1 No +1 No AI(b)
UUA: Leu Stop 1 No 0 No UC or AI
UCA: Ser Stop 1 No 0 No UC or AI
Summary of Codon Reassignments in Mitochondria
CD mechanism explains disappearance of stop codons because they are rare initially. Only a few examples of CD for sense codons. UC and AI are important for sense codons.
Three examples in yeasts (Mutation pressure GC to AU)
Second Position
U C A G Third Pos.
FirstPosition
U
F FLL
SS SS
Y YStopStop
C CStop W
U C A G
C
L LLL
PP PP
H HQ Q
RR RR
U C A G
A I II M
TT TT
N NK K
S SRR
U C A G
G VV V V
AA AA
D DE E
GG GG
U C A G
CGN is rare (replaced by AGR)
CGN Arg codons become unassigned.
CUN is rare (replaced by UUR)
CUN Leu to Thr
AUA and AUU common and AUC is rare
Nevertheless AUA is reassigned to Met. Codon does not disappear
LeuCUN
Leu UUR
Arg CGN
Arg AGR
S 53 192 7 33
Y. 44 618 0** 75
C 3 279 12 29
C 132 397 47 26
C 66 547 39 45
P 25 714 18 67
K 0 286 0** 48
C 11* 294 1** 45
S 33* 333 7 49
S 19* 274 0** 40
S 22* 300 0** 46
Leu and Arg codons in yeasts
Codon Disappearance causes reassignments
* CUN = Thr. Unusual tRNA-Thr present instead of tRNA-Leu
** CGN = unassigned. tRNA-Arg is deleted
AUU AUC AUA AUG AUA is tRNAJ 133 40 32 48 Ile K2CAUO 161 34 0 57 Absent noneP 113 39 49 51 Ile K2CAU
Codon UsageAUA Ile to Met in Yeasts
AUU AUC AUA AUGC 119 81 229 100 Ile K2CAUC 303 32 193 117 Ile K2CAUP 274 18 562 105 Ile K2CAUK 213 16 7 63 ? noneC 207 21 16 73 Met C*AUS 239 31 60 73 Met C*AUS 203 7 101 56 Met C*AUS 218 11 95 70 Met C*AU
codon anticodon
AUU Ile GUA
AUC Ile “
AUA Ile K2CAU
AUG Met CAU
Evolution of the canonical code - Before the LUCA
The canonical code seems to be optimized to reduce the effects of translational and mutational errors.
Neighbouring codons code for similar amino acids.
5 7 9 11 13
C LI F WMY V PT A HQSG NKR E D
Woese’s polar requirement scale
Measure difference between amino acid properties by how far apart they are on this scale.
Cost function g(a,b) for replacing amino acid a by amino acid b
e.g. difference in Polar Requirement
i j
iji j
jiij raagrE /),(
rij = rate of mistaking codon i for codon j
= 1 for single position mistakes, 0 otherwise
E = measure of error associated with a code
Generate random codes by permuting the 20 amino acids in the code table
E is smaller for the canonical code than for almost all random codes.
E
p(E)Ereal
f
f ~ 10-6
one in a million codes is better (Freeland and Hurst)
Principal Component Analysis Projects the 8-d space into the two ‘most important’ dimensions.
Big
Small
Hydrophobic Hydrophilic
Modified codes show that the Canonical code could have changed as it evolved – not completely a frozen accident.
Possibility of competition between organisms with different codes – natural selection.
Early codes had <20 amino acids (???). Gradual increase in complexity.
Increased repertoire of amino acids gives more protein functions.
Order of addition –
Astrobiology - which amino acids were common on early Earth?
Prebiotic synthesis of amino acids
Amino acids are found in
• Meteorites
• Atmospheric chemistry experiments (Miller-Urey)
• Hydrothermal synthesis
• Icy dust grains in space
Rank amino acids in order of decreasing frequency in 12 observations. Derive mean ranking.
G A D E V S I L P T (found non-biologically - early amino acids)
K R H F Q N Y W C M (not found non-biologically – late amino acids)
Early and Late amino acids are determined by thermodynamics
Positions of early and late amino acids....
What does this mean?
Second Position
U C A G Third Pos.
FirstPosition
U
F FLL
SS SS
Y YStopStop
C CStop W
U C A G
C
L LLL
PP PP
H HQ Q
RR RR
U C A G
A I II M
TT TT
N NK K
S SRR
U C A G
G VV V V
AA AA
D DE E
GG GG
U C A G
M
FF
Maybe only 2nd position was relevant initially.
Late amino acids took over codons previously assigned to amino acids with similar properties.
Other points –
Column structure suggests that translational errors were more important than mutational errors (tRNA structure/RNA world)
Precursor-product pairs tend to be neighbours (but doubts over statistical significance). Maybe late amino acids took over codons previously assigned to their biochemical precursors.
Direct chemical interactions between RNA motifs and amino acids (“stereochemical theory”). In vitro selection experiments suggest binding sites of aptamers preferentially contain codon and anticodon sequences.