research on mitochondrial genomes lectures for 4y03 paul higgs dept. of physics, mcmaster...

59
Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs and BBSRC

Upload: amos-malone

Post on 03-Jan-2016

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Research on Mitochondrial GenomesLectures for 4Y03

Paul Higgs

Dept. of Physics, McMaster University, Hamilton, Ontario.

Supported by

Canada Research Chairs

and BBSRC

Page 2: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

1. Building a database for mitochondrial genomes.

2. Large scale - gene order evolution.

3. Medium scale – sequence evolution. Molecular phylogenetics.

4. Small scale – mutation and selection. Variation in base and amino acid frequencies. Codon usage.

5. Genetic code evolution People:

1. Wenli Jia, Bin Tang, Daniel Jameson

2. Howsun Jow, Magnus Rattray, Cendrine Hudelot, Vivek Gowri-Shankar, Xiaoguang Yang

3. Wei Xu, Daniel Jameson

4. Daniel Urbina, Wenli Jia.

5. Supratim Sengupta

Page 3: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Mitochondria are organelles inside eukaryotic cells.

They are the site of oxidative phosphorylation and ATP synthesis.

They contain their own genome distinct from the DNA in the nucleus.

Typical animal mitochondrial genomes are short and circular (~16,000 bases).

They usually contain:

2 rRNAs

22 tRNAs

13 proteins

Page 4: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

LOCUS NC_001922 16646 bp DNA circular VRT 20-SEP-2002DEFINITION Alligator mississippiensis mitochondrion, complete genome.ACCESSION NC_001922VERSION NC_001922.1 GI:5835540KEYWORDS .SOURCE mitochondrion Alligator mississippiensis (American alligator) ORGANISM Alligator mississippiensis Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archosauria; Crocodylidae; Alligatorinae; Alligator.REFERENCE 1 (bases 1 to 16646) AUTHORS Janke,A. and Arnason,U. TITLE The complete mitochondrial genome of Alligator mississippiensis and the separation between recent archosauria (birds and crocodiles) JOURNAL Mol. Biol. Evol. 14 (12), 1266-1272 (1997) MEDLINE 98066357 PUBMED 9402737FEATURES Location/Qualifiers source 1..16646 /organism="Alligator mississippiensis" /organelle="mitochondrion" /mol_type="genomic DNA" /db_xref="taxon:8496" /tissue_type="liver" /dev_stage="adult" rRNA 1..976 /product="12S ribosomal RNA" tRNA 977..1044 /product="tRNA-Val" /anticodon=(pos:1009..1011,aa:Val) rRNA 1046..2635 /product="16S ribosomal RNA" tRNA 2636..2710 /product="tRNA-Leu" /note="codons recognized: UUR" /anticodon=(pos:2672..2674,aa:Leu) gene 2711..3676 /gene="ND1" CDS 2711..3676 /gene="ND1"

1 caacagactt agtcctggtc ttttcattag ctagtactca acttatacat gcaagcatcc 61 gcgaaccagt gagaacaccc tacaagtctg acagacgaat ggagccggca tcaggcacat 121 caaccgatag cccaaaacgc ctagcccagc cacaccccca agggtctcag cagtgattaa 181 ccttaaacca taagcgaaag cttgatttag ttagagtaga tatagaggcg gtcaactctc 241 gtgccagcaa ccgcggttag acgaaaacct caagttaatt gacaaacggc gtaaattgtg 301 gctagaactc tatctccccc attagtgcag atacggtatc acagtagtga taaacttcat 361 cacaccgcaa acatcaacac aaaactggcc ctaatctcaa agatgtactc gattccacga 421 aagctgagaa acaaactggg attagatacc ccactatgct cagcccttaa cattggtgta 481 gtacacaaca gactaccctc gccagagaat tacgagcccc gcttaaaact caaaggactt 541 gacggcactt taaacccccc tagaggagcc tgtcctataa tcgacagtac acgttacacc 601 cgaccacctt tagcctactc agtctgtata ccgccgtcgc aagcccgtcc catttgaggg 661 aaacaaaacg cgcgcaacag ctcaaccgag ctaacacgtc aggtcaaggt gcagccaaca 721 aggtggaaga gatgggctac attttctcaa catgtagaaa tattcaacgg agagccctat 781 gaaatacagg actgtcaaag ccggatttag cagtaaactg ggaaagaata cctagttgaa 841 gtcggtaacg aagtgcgtac acaccgcccg tcaccctcct cgaacccaac aaaatgccca 901 aacaacaggc acaatgttgg gcaagatggg gaaagtcgta acaaggtaag cgtaccggaa 961 ggtgcacttg gaacatcaaa atgtagctta aatttaaagc attcagttta cacctgaaaa 1021 agtcccacca tcggaccatt ttgaaaccca tatctagccc tacctccttt caacatgctt

An example of a GenBank file

Complete mitochondrial genome of the Alligator

Page 5: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

OGRe (= Organellar Genome Retrieval) is a relational database.

available at http://ogre.mcmaster.ca

More than 800 complete animal mitochondrial genomes.

Efficient means of storage and retrieval of information. Uses PostgreSQL

Schema defines relationships between different types of information.

genome_codecodonstrandusage

codon_usage

group_nameparentalternative_parenthas_children

classification

genome_codeoffstfilename

fileindex

genome_codemedline_code

citations

species_codeclassificationgroup_namelatin_namecommon_name

species

feature_idstartstopstrand

feature_location

feature_idamino_acidanticodoncodon

trna

feature_namedescription

feature_descriptions

feature_idgenome_codetypefeature_namenotesdescriptionalignment_file

feature

genomegenome_codespecies_codegenome_typencbiacdescriptionncbidateaccession_datelastmodifiedlastmodifiedbynotesgenome_lengthgenetic_codea_contentc_contentg_contentt_contentfinalgene_ordergene_order_notrnarna_a_contentrna_c_contentrna_g_contentrna_t_content

Page 6: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

The OGRe front page: http://ogre.mcmaster.ca

Sequence information for OGRe is taken from GenBank. We aim to keep up to date with publicly available animal mitochondrial genomes.

Page 7: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Species may be selected individually from an alphabetical list

Or taxa may be selected from a hierarchy. Here the Arthropods have been expanded and the Myriapods and Crustaceans have been selected

Page 8: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

On the ogre web site, a visual comparison can be made of any two selected species. Colour is used to indicate conserved blocks of genes.

Alligator and Bird genomes differ by interchange of two tRNA genes (red and yellow)… …and by translocation of the two

genes in the blue block.

Large Scale – Evolution of Gene Order in Whole Genomes

Page 9: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Genome reshuffling mechanisms

Inversions:

A B C D A D

-C -B

A

BC

D

Translocations:

A (B C) D A D B C

Duplications and deletions

A B C B C DA B C D A C B D/ /

Page 10: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Example of an inversion

Example of a translocation

Page 11: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

The T and –F genes are duplicated in Cordylus warreni.

If the first T and the second –P were deleted, the relative position of T and –P would change.

Page 12: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Sometimes things go crazy ….

Drosophila and Thrips are both insects

yet there are 30 breakpoints for only 37 genes

i.e. almost nothing in common.

Page 13: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

OGRe contains gene orders as strings. This allows searching and comparison.

231 unique gene orders have been found in 858 species.

The standard vertebrate order is shared by 398 species (including humans). There are many other species with unique gene orders.

Some species conserve gene order over 100s of millions of years. Others get scrambled in a few million.

Still to do (new project) :

- estimate relative rates of different rearrangement processes

- predict most likely ancestral gene orders

- use gene order evidence in phylogenetics

Page 14: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Medium Scale – Sequence Alignments and Phylogenetics

Part of sequence alignment of Mitochondrial Small Sub-Unit rRNA

Full gene is length ~950

11 Primate species with mouse as outgroup

Mouse : Lemur : Tarsier : SakiMonkey : Marmoset : Baboon : Gibbon : Orangutan : Gorilla : PygmyChimp : Chimp : Human :

* 20 * 40 * 60 * CUCACCAUCUCUUGCUAAUUCAGCCUAUAUACCGCCAUCUUCAGCAAACCCUAAAAAGG-UAUUAAAGUAAGCAAAAGACUCACCACUUCUUGCUAAUUCAACUUAUAUACCGCCAUCCCCAGCAAACCCUAUUAAGGCCC-CAAAGUAAGCAAAAACCUUACCACCUCUUGCUAAUUCAGUCUAUAUACCGCCAUCUUCAGCAAACCCUAAUAAAGGUUUUAAAGUAAGCACAAGUCUUACCACCUCUUGCC-AU-CAGCCUGUAUACCGCCAUCUUCAGCAAACUCUA-UAAUGACAGUAAAGUAAGCACAAGUCUCACCACGUCUAGCC-AU-CAGCCUGUAUACCGCCAUCUUCAGCAAACUCCU-UAAUGAUUGUAAAGUAAGCAGAAGUCCCACCCUCUCUUGCU----UAGUCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGCUACGAAGUGAGCGCAAAUCUCACCAUCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGACAAAGGCUAUAAAGUAAGCACAAACCUCACCACCCCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGCCACGAAGUAAGCGCAAACCUCACCACCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGACGAAGGCCACAAAGUAAGCACAAGUCUCACCGCCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGUUACAAAGUAAGCGCAAGUCUCACCGCCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGUUACAAAGUAAGCGCAAGUCUCACCACCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGCUACAAAGUAAGCGCAAGUCucACC cuCUuGCu cAgccUaUAUACCGCCAUCuuCAGCAAACcCu A G aAAGUaAGC AA

: 78 : 78 : 79 : 76 : 76 : 75 : 75 : 75 : 75 : 75 : 75 : 75

Page 15: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

69 Mammals with complete motochondrial genomes.

Used two models simulatneously

Total of 3571 sites

= 1637 single sites

+ 967 pairs

Hudelot et al. 2003

Page 16: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Afrotheria / Laurasiatheria

Striking examples of convergent evolution

Page 17: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

0.1

TerebratulinaKatharina

LimulusHeptathela

OrnithoctonusHabronattus

VarroaCarios

Ornithodoros moubataOrnithodoros porcinus

RhipicephalusAmblyomma

HaemaphysalisIxodes holocyclus

Ixodes hexagonusIxodes persulcatus

ScutigeraLithobius

ThyropygusNarceus

SpeleonectesVargula

HutchinsoniellaTigriopus

ArmilliferArgulus

TetraclitaPollicipes

PenaeusCherax

PortunusPanulirus

PagurusArtemia

TriopsDaphnia

TetrodontophoraGomphiocephalus

TricholepidionLocusta

AleurodicusTriatoma

PhilaenusThrips

LepidopsocidHeterodoxus

PyrocoeliaTriboliumCrioceris

ApisMelipona

OstriniaAntheraeaBombyx

AnophelesDrosophilaChrysomya

Images coutesy of University of Nebraska, Dept.of Entomology.

http://entomology.unl.edu/images/

Arthropod phylogenetics

Very difficult due to strong variation in rates of evolution between species.

tRNA tree – branch lengths optimized on fixed consensus topology

Long branch species are problematic if tree is not fixed.

Page 18: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Images coutesy of University of Nebraska, Dept.of Entomology.

http://entomology.unl.edu/images/

protein tree – branch lengths optimized on fixed consensus topology

0.1

TerebratulinaKatharina

LimulusHeptathela

OrnithoctonusHabronattusVarroa

CariosOrnithodoros moubata

Ornithodoros porcinusRhipicephalus

AmblyommaHaemaphysalis

Ixodes holocyclusIxodes hexagonus

Ixodes persulcatusScutigera

LithobiusThyropygus

NarceusSpeleonectes

VargulaHutchinsoniella

TigriopusArmillifer

ArgulusTetraclitaPollicipes

PenaeusCherax

PortunusPanulirus

PagurusArtemia

TriopsDaphnia

TetrodontophoraGomphiocephalus

TricholepidionLocusta

AleurodicusTriatoma

PhilaenusThrips

LepidopsocidHeterodoxus

PyrocoeliaTribolium

CriocerisApis

MeliponaOstrinia

AntheraeaBombyx

AnophelesDrosophilaChrysomya

Same species are on long branches in proteins as in RNAs

Page 19: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Relative rate test for sequence evolution - Templeton

0 1 2

Three aligned sequences with 0 known to be the outgroup. Test whether rates of evolution in branch 1 and branch 2 are equal.

m1 = number of sites where 0 and 2 are the same and 1 is different.m2 = number of sites where 0 and 1 are the same and 2 is different.

)(

)(

21

2212

mm

mmm

Calculate:

Should follow a chi squared distribution with one degree of freedom.

Many pairs of related species found to have different rates in the mitochondrial sequences.

Page 20: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

The gene order of the ancestral arthropod is thought to be the same as that of the horseshoe crab Limulus.Image courtesy of Marine Biology Lab, Woods Hole. www.mbl.edu/animals/Limulus

Gene Order sometimes gives evidence of phylogenetic relationships

The same translocation of tRNA-Leu is found in insects and crustaceans but not myriapods and chelicerates. Strong argument for the group Pancrustacea (= insects plus crustaceans)

Page 21: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Moderately rearranged

Completely scrambled

Page 22: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Breakpoints Inversions Dup/Del tRNA ProteinTigriopus japonicus 35 32 0 2.15 1.34Heterodoxus macropus 35 32 0 1.39 1.83Thrips imaginis 32 29 1 1.34 1.32Pollicipes polymerus 22 16 2 0.69 0.59Cherax destructor 20 16 0 0.54 0.57Tetraclita japonica 20 16 0 0.66 0.57Argulus americanus 20 18 0 0.72 1.12Speleonectes tulumensis 19 16 1 0.83 0.93Apis mellifera 19 16 0 0.84 1.50Hutchinsoniella macracantha 18 16 0 0.86 0.87Pagurus longicarpus 18 12 0 0.65 0.45Vargula hilgendorfii 17 15 0 0.79 1.41Lepidopsocid RS-2001 17 16 0 0.60 0.59Habronattus oregonensis 16 14 0 1.48 1.09Ornithoctonus huwena 15 13 0 1.95 1.23Scutigera coleoptrata 15 15 0 0.48 0.44Melipona bicolor 14 8 2 0.93 1.66Varroa destructor 14 12 0 0.83 1.09Armillifer armillatus 13 12 0 0.85 1.73Narceus annularus 9 9 0 0.63 0.58Thyropygus sp. 9 9 0 0.49 0.46Aleurodicus dugesii 8 5 1 1.04 1.54Anopheles gambiae 8 6 0 0.41 0.47Tetrodontophora bielanensis 8 6 0 0.77 0.70Artemia franciscana 7 5 0 0.63 0.64Rhipicephalus sanguineus 7 6 0 0.82 0.96Amblyomma triguttatum 7 6 0 0.88 1.00Haemaphysalis flava 7 6 0 0.82 0.96Locusta migratoria 6 5 0 0.38 0.52Bombyx mori 6 5 0 0.51 0.54Portunus trituberculatus 6 5 0 0.51 0.44Ostrinia furnacalis 6 5 0 0.49 0.48Tribolium castaneum 6 5 0 0.55 0.53Antheraea pernyi 6 5 0 0.50 0.54Chrysomya putoria 4 2 1 0.36 0.42Tricholepidion gertschi 3 2 0 0.44 0.39Daphnia pulex 3 2 0 0.62 0.51Pyrocoelia rufa 3 2 0 0.52 0.77Drosophila melanogaster 3 2 0 0.37 0.42Panulirus japonicus 3 2 0 0.58 0.53Triatoma dimidiata 3 2 0 0.59 0.50Lithobius forficatus 3 3 0 1.13 0.61Philaenus spumarius 3 2 0 0.69 0.58Gomphiocephalus hodgsoni 3 2 0 0.69 0.62Penaeus monodon 3 2 0 0.34 0.32Crioceris duodecimpunctata 3 2 0 0.55 0.58Triops cancriformis 3 2 0 0.42 0.40Limulus polyphemus 0 0 0 0.36 0.40Ixodes persulcatus 0 0 0 0.72 0.82Ixodes holocyclus 0 0 0 0.76 0.83Ixodes hexagonus 0 0 0 0.74 0.90Carios capensis 0 0 0 0.70 0.79Ornithodoros porcinus 0 0 0 0.67 0.86Heptathela hangzhouensis 0 0 0 0.76 0.87Ornithodoros moubata 0 0 0 0.68 0.88

Very High

High

Medium

Low

Species ranked according to breakpoint distance from ancestor.

Page 23: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

R =0.99 R =0.59

R =0.53 R =0.69

Page 24: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

min mean max min mean maxVery High 1.33 1.62 2.14 1.32 1.50 1.83High 0.48 0.86 1.94 0.44 0.99 1.73Moderate 0.38 0.63 1.04 0.43 0.69 1.54Low 0.34 0.60 1.13 0.32 0.62 0.90

tRNA only High 0.66 1.01 1.94 0.57 1.15 1.73tRNA only Mod/Low 0.34 0.60 1.13 0.32 0.63 1.54

tRNA distance protein distanceBreakpoint category

Highly rearranged genomes have highly divergent sequences.

Rates of sequence evolution and genome rearrangement are correlated.

Both are very non-clocklike.

There are many species where only tRNAs have changed position.

Species with highly reshuffled tRNAs have high rates of sequence evolution in both tRNAs and proteins.

Page 25: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Relative rate of genome rearrangement (Xu et al 2006)

0 1 2

Three gene orders with 0 known to be the outgroup. Test whether rates of rearrangement in branch 1 and branch 2 are equal.

n1 = number of gene couples in 0 and 2 but not in 1 – i.e. New breakpoint in 1n2 = number of gene couples in 0 and 1 but not in 2 – i.e. New breakpoint in 2

)(

)(

21

2212

nn

nnn

Calculate:

Should follow a chi squared distribution with one degree of freedom.

We took pairs where there was a significant difference in rearrangement rates (χn

2 was large) and showed that there was a significant difference in substitution rates too (χm

2 was large).

Page 26: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Good Guys

Credits:

Daniel Jameson/ Bin Tang – Database design and management

Daniel Urbina – Base and Amino Acid Frequencies

Wei Xu – Gene Order Analysis and Arthropod Phylogenies

Bad Guys

Gene order is sometimes a strong phylogenetic markerbut the Bad Guys are problematic in gene order analysis as well as phylogenetics.

Why does the evolutionary rate speed up in these isolated groups of species?Why to tRNA genes move more frequently?What are the relative rates of inversion and translocation?

Page 27: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Small Scale Evolution –

Variation in Frequencies of Bases and Amino Acids

G G C A A A A T

C C G T T T T A

The two strands of DNA are complementary.

Freq of A on one strand = Freq of T on the other

Freq of C on one strand = Freq of G on the other

If the two strands are subject to the same mutational processes then the freq of any base should be equal (statistically) on both strands.

This means that A = T and C = G on any one strand.

In this case base frequencies can be described by a single variable: G+C content.

BUT – mitochondrial genomes have an asymmetrical replication process. The two strands are not equivalent.

The frequencies of bases on the two strands are not equal.

On any one strand the frequencies of the four bases may vary independently.

Page 28: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Mitochondrial genome replication

Figure from

Faith & Pollock (2003) Genetics

Rank genes in order of increasing time spent single strandedCOI < COII < ATP8 < ATP6 < COIII < ND3 < ND4L < ND4 < ND1 < ND5 <ND2 < Cytb

ND6 is on the other strand

Page 29: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

The Genetic Code maps the 64 DNA codons to the 20 amino acids.

(This version applies to Vertebrate Mitochondria) SECOND POSITION

T C A G THIRDPOSITION

FIRST

POSITION

T TTT F 1 TTC F

TCT STCC S 6TCA STCG S

TAT Y 10TAC Y

TGT C 17TGC C

T C A G

TTA L 2TTG L

TAA StopTAG Stop

TGA W 18TGG W

C CTT LCTC LCTA LCTG L

CCT PCCC P 7CCA PCCG P

CAT H 11CAC H

CGT RCGC R 19CGA RCGG R

T C A G

CAA Q 12CAG Q

A ATT I 3ATC I

ACT TACC T 8ACA TACG T

AAT N 13AAC N

AGT S 20AGC S

T C A G

ATA M 4ATG M

AAA K 14AAG K

AGA StopAGG Stop

G GTT VGTC V 5GTA VGTG V

GCT AGCC A 9GCA AGCG A

GAT D 15GAC D

GGT GGGC G 21GGA GGGG G

T C A G

GAA E 16GAG E

4-codon families where the third position is synonymous

Page 30: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Base frequencies at FFD sites in each gene (averaged over mammals)

Deamination: C to U and A to G on the heavy strand

Page 31: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Base frequencies at FFD sites are controlled by mutation.

Base frequencies at 1st and 2nd positions are influenced by mutation and selection

Model fitting (Data from Fish) – assume a fraction of fixed sites and a fraction of neutral sites.

Selection at 1st position is weaker than at 2nd

Page 32: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Mutation pressure is sufficient to cause change in amino acid frequencies.

Second Position

T C A G Third Pos.

First

Position

T F 1 F

SS 6SS

Y 10Y

C 17C

T C A G

LLL 2LLL

StopStop

W 18W

C PP 7PP

H 11H

RR 19RR

T C A G

Q 12Q

A I 3I

TT 8TT

N 13N

S 20S

T C A G

M 4M

K 14K

StopStop

G VV 5VV

AA 9AA

D 15D

GG 21GG

T C A G

E 16E

Page 33: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Slopes of the amino acid freq v base freq show the response of the amino acid to mutational pressure.

Black = fish

White = mammals

Amino acids in the first two columns of the code have larger slopes.

Page 34: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Physical Properties of Amino Acids

Vol.

Bulk. Polarity pI Hyd.1 Hyd.2 Surface Area

Fract.Area

Ala A 67 11.50 0.00 6.00 1.8 1.6 113 0.74

Arg R 148 14.28 52.00 10.76 -4.5 -12.3 241 0.64

Asn N 96 12.28 3.38 5.41 -3.5 -4.8 158 0.63

Asp D 91 11.68 49.70 2.77 -3.5 -9.2 151 0.62

Cys C 86 13.46 1.48 5.05 2.5 2.0 140 0.91

Gln Q 114 14.45 3.53 5.65 -3.5 -4.1 189 0.62

Glu E 109 13.57 49.90 3.22 -3.5 -8.2 183 0.62

Gly G 48 3.40 0.00 5.97 -0.4 1.0 85 0.72

His H 118 13.69 51.60 7.59 -3.2 -3.0 194 0.78 y2

y1

y3

Each Amino Acid is a point in 8-d space.

dij = Euclidean distance between a.a. i and j in 8-d space.

Page 35: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Principal Component Analysis Projects the 8-d space into the two ‘most important’ dimensions.

Big

Small

Hydrophobic Hydrophilic

Page 36: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Responsiveness measures how much an amino acid frequency varies in response to mutational pressure

= Root mean square of 8 slopes for each amino acid (i.e. 4 bases x 2 data sets)

Proximity measures how similar the neighbouring amino acids are in the genetic code = Mean of 1/d for accessible amino acids

e.g. Prox (T) =

02

2244622

24

1+

d+

d+

d+

d+

d+

d+

d TKTNTATPTSTMTI

E E

T C A G

GG GG

D D

AA AA

VV VV

G

StopStop

K K

M M

T C A G

S S

N N

TT TT

I I

A

Q Q

T C A G

RR RR

H H

PP PP

C

W W

StopStop

LLL LLL

T C A G

C C

Y Y

SS SS

F F

T FiRst

Position

Third Pos.

GACT

Second Position

Page 37: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

An amino acid frequency responds to mutational pressure more easily if there are neighbouring amino acids with similar physical properties. Urbina et al. (2006) J. Mol. Evol.

Responsiveness and Proximity are highly correlated. R =0.87 (p < 10-6)

Page 38: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

GAU

GAC

GAA

GAG

AAU

AAC

AAA

AAG

CAU

CAC

CAA

CAG

UAU

UAC

UAA

UAG

12

51

63

15

29

131

84

9

18

79

82

8

35

89

4

3

D

D

E

E

N

N

K

K

H

H

Q

Q

Y

Y

*

*

39

123

79

5

50

155

132

10

37

119

52

7

29

99

81

7

GCU

GCC

GCA

GCG

ACU

ACC

ACA

ACG

CCU

CCC

CCA

CCG

UCU

UCC

UCA

UCG

22

45

61

8

112

196

165

32

65

167

276

42

69

139

65

11

A

A

A

A

T

T

T

T

P

PPP

S

S

S

S

GUU

GUC

GUA

GUG

AUU

AUC

AUA

AUG

CUU

CUC

CUA

CUG

UUU

UUC

UUA

UUG

V

V

V

V

I

I

M

M

L

L

L

L

F

F

L

L

16

87

61

19

GGU

GGC

GGA

GGG

G

G

G

G

11

37

1

0

AGU

AGC

AGA

AGG

S

S

*

*

6

26

28

0

CGU

CGC

CGA

CGG

R

R

R

R

5

17

90

9

UGU

UGC

UGA

UGG

C

C

W

W

Homo sapiens Strand = + 3624 codons

Page 39: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

1.654GG0.552CG1.433UG

1.027GA0.906CA1.136UA

1.005GC1.163CC0.743UC

0.763GU1.101CU0.939UU

Mammals - 23

1.891GG0.554CG1.274UG

1.145GA0.938CA1.030UA

0.878GC1.205CC0.756UC

0.605GU0.939CU1.250UU

Fish - 23

1.369GG1.293AG0.546CG0.856UG

0.776GA0.974AA0.945CA1.206UA

0.873GC0.797AC1.363CC0.994UC

1.115GU0.996AU1.082CU0.855UU

Mammals - 31

1.499GG1.228AG0.609CG1.049UG

0.758GA1.135AA0.849CA1.096UA

0.839GC0.739AC1.371CC0.918UC

0.911GU0.907AU1.162CU0.933UU

Fish - 31

Frequency ratios

))q(Yq(X

)Yp(X=)Yr(X

32

3232

Codon bias seems to be a dinucleotide mutational effect in mitochondria, rather than an effect of translational selection.

CpG effect.... (increased rate of C to U mutations in CG dinucleotides. Expect high UG and CA)

DNA binding proteins....

Page 40: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Asp TotalAnticodon GGCUGCCGCGUCUUGCUGGGGUGGCGGGGAUGACGA

Epsilon Proteobacteria

Campylobacter jejuni 1 3 0 2 1 0 0 1 0 1 1 0 42Helicobacter pylori J99 1 1 0 1 1 0 1 1 0 1 1 0 36Gamma Proteobacteria

Pseudomonas aeruginosa 2 4 0 4 1 0 1 1 1 1 1 1 63Vibrio parahaemolyticus 1 4 0 6 6 0 0 3 0 1 4 0 126Haemophilus influenzae 1 3 0 3 2 0 0 2 0 1 2 0 56Buchnera aphidicola 1 1 0 1 1 0 0 1 0 1 1 0 32#Blochmannia floridanus 0 1 0 1 1 0 0 1 0 0 1 1 36#Wigglesworthia glossinidia 1 1 0 1 1 0 0 1 0 1 1 0 34#Escherichia coli K12 2 3 0 3 2 2 1 1 1 2 1 1 86Alpha Proteobacteria

Agrobacterium tumefaciens 1 4 0 2 1 1 1 1 1 1 1 1 53Sinorhizobium meliloti 1 3 1 2 1 1 1 1 1 1 1 1 51Rickettsia prowazekii 1 1 0 1 1 0 0 1 0 1 1 0 33#Wolbachia (D. mel) 0 1 0 1 1 0 0 1 0 1 1 0 34#Caulobacter crescentus 1 2 1 2 1 0 1 1 1 1 1 1 51Mitochondria

Reclinomonas americana 0 1 0 1 1 0 0 1 0 0 1 0 26Homo sapiens 0 1 0 1 1 0 0 1 0 0 1 0 22

Pro Ser-UCNAla Gln

Changes in tRNA content of genomes from bacteria to mitochondria

Only one type of tRNA remains for each codon family in human mitochondria. Still need 2 tRNAs for Leu and Ser. Therefore 22 in total.

# denotes intracellular parasite or endosymbiont. Small size genomes in bacteria also have reduced numbers of tRNAs.

Page 41: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Evolution of the Genetic Code:Before and After the LUCA

1. The genetic code evolved to its canonical form before the Last Universal Common Ancestor of Archaea, Bacteria and Eukaryotes - >3 billion years ago. It appears to be highly optimized. How did it get to be this way?

2. Numerous small changes have occurred to the canonical code since then. What is the mechanism of codon reassignment?

Page 42: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Codon Reassignment – The Genetic code is variable in mitochondria (and also some cases of other types of genomes)

Second Position

U C A G Third Pos.

FirstPosition

U

F FLL

SS SS

Y YStopStop

C CStop W

U C A G

C

L LLL

PP PP

H HQ Q

RR RR

U C A G

A I II M

TT TT

N NK K

S SRR

U C A G

G VV V V

AA AA

D DE E

GG GG

U C A G

UGA Stop to Trp

AUA Ile to Met

CUN Leu to Thr

CGN Arg to unassigned

AGR Arg to Ser to Stop/Gly

etc.....

But how can this happen? It should be disadvantageous.

Page 43: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Porifera

Cnidaria

Arthropoda

Nematoda

Platyhelminthes

Lophotrochozoa

Echinodermata

Hemichordata

Urochordata

Cephalochordata

Craniata

AAA : Lys -> Asn

Loss of tRNA-Ile(CAU) but AUA remains Ile

Loss of tRNA-Arg(UCU) and AGR : Arg -> Ser

AGR : Ser -> Stop

AGR : Ser -> Gly

AUA : Ile -> Met

Loss of many tRNAs + import from cytoplasm

AAA : Lys -> unassigned

Reassignments in Metazoa

Page 44: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Example 1: AUA was reassigned from Ile to Met during the early evolution of the mitochondrial genome.

Before Codon Anticodon Notes

Ile

Ile

Ile

Met

AUU

AUC

AUA

AUG

GAU

k2CAU

CAU

G in the wobble position of the tRNA-Ile can pair with U and C in the third codon position Bacteria and some protist mitochondria possess another tRNA-Ile with a modified base that translates AUA only.

The tRNA-Met translates AUG only.

After Codon Anticodon Notes

Ile

Ile

Met

Met

AUU

AUC

AUA

AUG

GAU

UAU or

f5CAU

In animal mitochondria the k2CAU tRNA has been deleted.

There is a gain of function of the tRNA-Met by a mutation or a base modification

Page 45: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Example 2: UGA was reassigned from Stop to Trp many times (12 times in mitochondria).

Before Codon Anticodon Notes

Stop

Trp

UGA

UGG

RF

CCA

Release Factor recognizes UGA codon.

Normal tRNA-Trp translates only UGG codons.

After Codon Anticodon Notes

Trp

Trp

UGA

UGG

UCA In animal mitochondria (and elsewhere) there is a gain of function of the tRNA-Trp via mutation or base modification so that it translates both UGG and UGA.

Page 46: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

The GAIN-LOSS framework

(Sengupta & Higgs, Genetics 2005)

LOSS = deletion or loss of function of a tRNA or RF

GAIN = gain of a new tRNA or a gain of function of an existing one.

Mutations in coding sequences

Initial Code.No Problem.

Ambiguous codon.Selective disadvantage.

Unassigned codon.Selective disadvantage.

New Code.Selective disadvantage because codons are used in wrong places

GAIN

GAINLOSS

LOSS

New Code.Codons now used in right places.No Problem.

Note – the strength of the selective disadvantage depends on the number of times the codon is used. There is no disadvantage if the codon disappears.

Page 47: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Four possible mechanisms of codon reassignment.

1. Codon Disappearance - The codon disappears. The order of the gain and loss is irrelevant.

For the other three mechanisms the codon does not disappear.

2. Ambiguous Intermediate – The gain happens before the loss. There is a period when the gain is fixed in the population and translation is ambiguous.

3. Unassigned Codon – The loss happens before the gain. There is a period when the loss is fixed in the population and the codon is unassigned.

4. Compensatory Change – The gain and loss are fixed in the population simultaneously (although they do not arise at the same time). There is no intermediate period between the old and the new codes. - cf. theory of compensatory substitutions in RNA helices.

Sengupta & Higgs (2005) showed that all four mechanisms work in a population genetics simulation

Page 48: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Codon reassignment

No. of

times

Can this be explained by

GCAU mutation pressure?

Change in No.

of tRNAs

Is mispairing important?

Mechanism

UAG: Stop Leu 2 G A at 3rd pos. +1 No CD

UAG: Stop Ala 1 G A at 3rd pos. +1 No CD

UGA: Stop Trp12

G A at 2nd pos. 0

Possibly. CA at 3rd pos.

CD

CUN: Leu Thr 1 C U at 1st pos. 0 No CD

CGN: Arg Unass 5 C A at 1st pos. -1 No CD

AUA: Ile Met or Unassigned

3 / 5No

-1Yes. GA at 3rd pos.

UC

AAA: Lys Asn 2

No0

Yes. GA at 3rd pos.

AI

AAA: Lys Unass1

No0

Possibly. GA at 3rd pos.

UC or AI

AGR: Arg Ser 1

No-1

Yes. GA at 3rd pos.

UC

AGR: Ser Stop 1 No 0 No AI(b)

AGR: Ser Gly 1 No +1 No AI(b)

UUA: Leu Stop 1 No 0 No UC or AI

UCA: Ser Stop 1 No 0 No UC or AI

Summary of Codon Reassignments in Mitochondria

CD mechanism explains disappearance of stop codons because they are rare initially. Only a few examples of CD for sense codons. UC and AI are important for sense codons.

Page 49: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Three examples in yeasts (Mutation pressure GC to AU)

Second Position

U C A G Third Pos.

FirstPosition

U

F FLL

SS SS

Y YStopStop

C CStop W

U C A G

C

L LLL

PP PP

H HQ Q

RR RR

U C A G

A I II M

TT TT

N NK K

S SRR

U C A G

G VV V V

AA AA

D DE E

GG GG

U C A G

CGN is rare (replaced by AGR)

CGN Arg codons become unassigned.

CUN is rare (replaced by UUR)

CUN Leu to Thr

AUA and AUU common and AUC is rare

Nevertheless AUA is reassigned to Met. Codon does not disappear

Page 50: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

LeuCUN

Leu UUR

Arg CGN

Arg AGR

S 53 192 7 33

Y. 44 618 0** 75

C 3 279 12 29

C 132 397 47 26

C 66 547 39 45

P 25 714 18 67

K 0 286 0** 48

C 11* 294 1** 45

S 33* 333 7 49

S 19* 274 0** 40

S 22* 300 0** 46

Leu and Arg codons in yeasts

Codon Disappearance causes reassignments

* CUN = Thr. Unusual tRNA-Thr present instead of tRNA-Leu

** CGN = unassigned. tRNA-Arg is deleted

Page 51: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

AUU AUC AUA AUG AUA is tRNAJ 133 40 32 48 Ile K2CAUO 161 34 0 57 Absent noneP 113 39 49 51 Ile K2CAU

Codon UsageAUA Ile to Met in Yeasts

AUU AUC AUA AUGC 119 81 229 100 Ile K2CAUC 303 32 193 117 Ile K2CAUP 274 18 562 105 Ile K2CAUK 213 16 7 63 ? noneC 207 21 16 73 Met C*AUS 239 31 60 73 Met C*AUS 203 7 101 56 Met C*AUS 218 11 95 70 Met C*AU

codon anticodon

AUU Ile GUA

AUC Ile “

AUA Ile K2CAU

AUG Met CAU

Page 52: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Evolution of the canonical code - Before the LUCA

The canonical code seems to be optimized to reduce the effects of translational and mutational errors.

Neighbouring codons code for similar amino acids.

5 7 9 11 13

C LI F WMY V PT A HQSG NKR E D

Woese’s polar requirement scale

Measure difference between amino acid properties by how far apart they are on this scale.

Page 53: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Cost function g(a,b) for replacing amino acid a by amino acid b

e.g. difference in Polar Requirement

i j

iji j

jiij raagrE /),(

rij = rate of mistaking codon i for codon j

= 1 for single position mistakes, 0 otherwise

E = measure of error associated with a code

Generate random codes by permuting the 20 amino acids in the code table

E is smaller for the canonical code than for almost all random codes.

E

p(E)Ereal

f

f ~ 10-6

one in a million codes is better (Freeland and Hurst)

Page 54: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Principal Component Analysis Projects the 8-d space into the two ‘most important’ dimensions.

Big

Small

Hydrophobic Hydrophilic

Page 55: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Modified codes show that the Canonical code could have changed as it evolved – not completely a frozen accident.

Possibility of competition between organisms with different codes – natural selection.

Early codes had <20 amino acids (???). Gradual increase in complexity.

Increased repertoire of amino acids gives more protein functions.

Order of addition –

Astrobiology - which amino acids were common on early Earth?

Page 56: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Prebiotic synthesis of amino acids

Amino acids are found in

• Meteorites

• Atmospheric chemistry experiments (Miller-Urey)

• Hydrothermal synthesis

• Icy dust grains in space

Rank amino acids in order of decreasing frequency in 12 observations. Derive mean ranking.

G A D E V S I L P T (found non-biologically - early amino acids)

K R H F Q N Y W C M (not found non-biologically – late amino acids)

Page 57: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Early and Late amino acids are determined by thermodynamics

Page 58: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Positions of early and late amino acids....

What does this mean?

Second Position

U C A G Third Pos.

FirstPosition

U

F FLL

SS SS

Y YStopStop

C CStop W

U C A G

C

L LLL

PP PP

H HQ Q

RR RR

U C A G

A I II M

TT TT

N NK K

S SRR

U C A G

G VV V V

AA AA

D DE E

GG GG

U C A G

M

FF

Maybe only 2nd position was relevant initially.

Late amino acids took over codons previously assigned to amino acids with similar properties.

Page 59: Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs

Other points –

Column structure suggests that translational errors were more important than mutational errors (tRNA structure/RNA world)

Precursor-product pairs tend to be neighbours (but doubts over statistical significance). Maybe late amino acids took over codons previously assigned to their biochemical precursors.

Direct chemical interactions between RNA motifs and amino acids (“stereochemical theory”). In vitro selection experiments suggest binding sites of aptamers preferentially contain codon and anticodon sequences.