Патогенные мутации и
компенсаторная эволюция
Evolutionary Genomics Lab
Centre for Genomic Regulation
Barcelona, Spain
Fyodor A. Kondrashov
Genotype
Phenotype (fitness)
Polymeropoulos et al., Science, 1997
Macaca mulatta
Macaca fascicularis
Erythrocebus patas
Homo sapiens
Pongo pygmaeus abelii
Saguinus labiatus
Ateles geoffroyi
Lagothrix lagotricha
Rattus norvegicus
Mus musculus
Gallus gallus
Xenopus laevis
91
26
94
77
72
66
89
30
91
A
A
A
A
T
A
T
T
T
T
T
T
T
A
T
T
A
A
A
T
T
T
Site 53 of
alpha-synuclein T -> A
X -> Z that made A better than T
U A G A U U G A
A G C C A
G U U G A
U U A G G G U
G
C U U A G
C U G U U
A A
C U A A G U
G U U U
G U G G G U U
U A
A G U
C C C A U U G G U C U A G
Acceptor stem
D-stem/ loop
Anticodon stem/loop
TYC- stem/loop
Homo sapiens tRNAAsn
5’
3’
C C A G U U
G A
U U A G G G
U U A G
C U G U U
A A
C U A A
Pan troglodytes (chimpanzee) tRNAAsn
D-stem/ loop
Anticodon stem/loop
U A G A U U G A
A G
U
G
U A U G U U U
G U G G G U U
U A
A G U
C C C A U U A A U C U A G 5’
3’
Acceptor stem
TYC- stem/loop
C G
G
G
Kern and Kondrashov, Nat Genet 2004
Acceptor stem
D-stem/ loop
Anticodon stem/loop
TYC- stem/loop
5’
3’
C A U C G U G A
A G C U G A
A C A G C
A
U U A A C
C U
U U U A A
G U U A A A
G A A
U G G A A G
U A C U A
A C C U U C C
C A C A A U G A
Cynocephalus variegatus (Malayan flying lemur) tRNALys
U U
A A
G C U
A
U
C
U U U
G
C
A
C
A G G A A U U U
A G G U U
A C A C
C A G A C C
A
A G G A C
C U
U C A A A
G C C C U A
A G
C A A G U A
C A A
A C U A C U U
A A U U C C U G
Ceratotherium simum (white rhinoceros) tRNATrp
Acceptor stem
D-stem/ loop
Anticodon stem/loop
TYC- stem/loop
5’
3’
A G
A U
A
UA
U
C
U G
C
Acceptor stem
D-stem/ loop
Anticodon stem/loop
TYC- stem/loop
5’
3’
G A G A G A G
A C A U A
G U G G U
U A U G A U A U U G G
C U
U G A A A
C C A A U
U C C A
G A G G G U U
C G
A C U
C C U U C C U U U C U U A
Ursus maritimus (polar bear) tRNASer(UCN)
A
A U
C
G U A
C
G G
G
G C
U U
G
G
U
Acceptor stem
D-stem/ loop
Anticodon stem/loop
TYC- stem/loop
5’
3’
A G A A A C A U
G U C C G A
U A
A C A G A
G
U U A C U
U U
G A U A G
A G U G A A U
A A
U A G A G G
U C A A
G C C C U C U
U G U U U C U A
Spalax ehrenbergi (Ehrenberg's mole-rat) tRNAIle
U
A
A U A
A
A G C UU
Acceptor stem
D-stem/ loop
Anticodon stem/loop
TYC- stem/loop
5’
3’
A G A A A U A U
G U C C G A
C A
A A G G A A
U U A C U
U U
G A U A G
A G U A A A
A C A
U A G A G G
U C A C
A G C
C C U C U U A U U U C U A
Tamandua tetradactyla (southern tamandua) tRNAIle
U
A G
A U
U
U
A G C U
U
A
C
C A
A
Acceptor stem
D-stem/ loop
Anticodon stem/loop
TYC- stem/loop
5’
3’
G U U G G G G U
G G C A G
A G U C U G G
C A A C U G U
A
U A A A A
C U
U A A
A C
U U U U A C
A C C
C A G A G G
U U
A U C C U C U
C C U C A A C A
Hyperoodon ampullatus (northern bottlenose whale) tRNALeu(UUR)
C C U U
A
A
A C
C
A
C
G U
U
U
U
A U U A A G G U
G A C A G
A G A C
C G
G C A A
U U G U
G
U A A A A
C U
U A A G C
U U U U A U
A A
U C A G A G G
U U C A
A A U
C C U C U C C U U A A U A
Tachyglossus aculeatus (Australian echidna) tRNALeu(UUR)
5’ 3’
G C
C C
G
Acceptor stem
D-stem/ loop
Anticodon stem/loop
TYC- stem/loop
U
A C
A
A
A
C
C
G
U
U
A G C C C U G A
G G U G G
G U A C
U A A C A U
A U U G A A
U U G C A
A A
U U C A A A G
A A
G C A G C U U
C A
A C U C U G C
C G G G G C U U
Oryctolagus cuniculus (rabbit) tRNACys
5’
3’
U
U U
A
G
C C
A
Wittenhagen & Kelley Nat Struct Biol (2002) andTrends Biochem. Sci. (2003),
Molecular basis of the A3243G mt disease mutation
G U U A G G G
U G C A G
G G C C C G G
U A A C U G C
G
U A A A A
C U
U A A A C
C U U U A C U
A U
C A G A G G
U U C A
A U U C C U C U
C C C U A A C A
Canis familiaris (dog) tRNALeu(UUR)
Acceptor stem
D-stem/ loop
Anticodon stem/loop
TYC- stem/loop
5’
3’
A
U C
A
A
U
U
G
A
A
G A
[gttaaga]tg[gcag]agcccggtaa[tcgc]a[taaaa]cttaaaa[cttta]cagtc[agagg]ttcaatt[cctct|tcttaac]a [.......]..[....].......C..[.t..].[.....].......[.....]tta..[.....].....c.[.....|.......]. [......g]..[....]........c.[.t..]c[.....].....g.[t....]Agta.[...a.].....a.[.....|c......]. [......g]..[....].......C..[.t..].[.....].......[.....]t.a..[.....].....c.[.....|c......]. [.c.....]..[....]..........[.t..].[C..g.]......c[.C..g]A.tc.[.....].....c.[.....|.....G.]. [a......]..[....].......C..[.t..].[...g.]t......[.C...].ta..[.....].....c.[.....|......T]. [a......]..[....]..-.......[.t..].[C..g.]......c[.C..g].t...[.....]......C[.....|......T]. [....g..]..[....].......C..[.t..].[.....].....gc[t....]t.a..[.....].....c.[.....|..c....]. [.....ag]..[....]..a.......[.t..].[.....].....g.[t....]g..c.[.....].....c.[.....|ct....T]. [....g.g]..[....]........G.[.t..].[.....]......c[.....]t.c..[.....].....a.[.....|c.c....]. [....g.g].-[....]..........[.t..].[.....]......c[t....].ccc.[.....].....c.[.....|c.c...T]. [....g..]..[....]..........[ct..].[.....]......c[.....]t.ac.[.....].....c.[.....|..c....]. [...gg..]..[....]..ta...C..[.t..].[.....]......c[.....]t.cc.[.....].....a.[.....|..cc...]. [...g..g]..[....]..t....C..[.t.T].[.....]......c[t....]..c..[.....].....a.[.....|c.cc...]. [a...g.g]..[....]..a.......[.t..]g[.....]......c[.....]ttac.[.....].....c.[.....|c.c...T]. [.......]..[....]..a.a.....[.t..].[...g.]......c[.....]ttac.[.....].....a.[.....|.......]. [......g]..[....]..........[.t..].[.....].....gc[t....]..ac.[.....].......[.....|c......]. [.......].a[...a].aatt...c.[ct..].[.....].....gc[t....].tca.[G....].....c.[.....|.......]. [a.....g]..[....]..-....C..[.t..].[.....]......c[.....]t.a..[.....].....a.[.....|c.....T]. [a.....g]..[....]..-.a.....[.t..].[.....].....gc[.....]..ac.[.....].....aC[.....|c.....T]. [a.....g]..[a...]..a.......[.t.T]g[.....].....gc[t....]t....[.....].....a.[.....|c.....T]. [a.....g]..[a...]..a....C..[.t.T]g[.....].....gc[t....]t.a..[.....].....a.[.....|c.....T].
Homo sapiens (human)
Tarsius bancanus (western tarsier)
Tupaia belangeri (northern tree shrew)
Lepus europaeus (European hare)
Jaculus jaculus (lesser Egyptian jerboa)
Sciurus vulgaris (Eurasian red squirrel)
Echinops telfairi (small Madagascar hedgehog)
Pteropus scapulatus (little red flying fox)
Pipistrellus abramus (Japanese house bat)
Ursus maritimus (polar bear)
Odobenus rosmarus rosmarus (Atlantic walrus)
Rhinoceros unicornis (greater Indian rhinoceros)
Monodon monoceros (narwhal)
Platanista minor (Indus River dolphin)
Sus scrofa (pig)
Dasypus novemcinctus (nine-banded armadillo)
Orycteropus afer (aardvark)
Elephas maximus (Asiatic elephant)
Macropus robustus (wallaroo)
Vombatus ursinus (common wombat)
Ornithorhynchus anatinus (platypus)
Tachyglossus aculeatus (Australian echidna)
[actcttt]ta[gtat]aaat--a[gtac]c[gttaa]cttccaa[ttaac]tagt[tttga]c-aacat[tcaaa|aaagagt]a [.......]..[....]..G.--.[....].[.....].......[.....]....[.....].-.....[.....|.......]. [.......]..[....]..Gc--.[....].[.....].......[.....]....[.....].-.....[.....|.......]. [.......]..[....]..t.--.[....].[.....].......[.....]c...[....g]t-.gt.c[c....|.......]. [.......]..[....]..Gc--.[....].[.....].......[.....]c...[.....].-....c[.....|.......]. [.......]..[....]...c--.[....].[.....].......[.....]....[.....].-...Gc[c....|.......]. [.......]..[....]..t.--.[....]a[A..g.].......[.c..t]c..c[.....].-..t..[.....|.......]. [.......]..[....]..cc--.[....]a[A..g.].......[.c..t]c...[.....].-.....[.....|.......]. [.......]..[....]...c--.[....]t[.....].......[.....]c..c[..c..]t-...Gc[..g..|.......]. [.t...c.]..[....]...c--.[....]a[A..g.].......[....t]ag.c[c....]t-..-.c[c...g|.g...a.]. [.t.....]..[....]cg.ccc.[a...]a[A..g.].......[....t]..ac[..c.g]tg..-.a[c.gg.|.....a.]. [g......]..[...c]..c.--.[....]a[A..g.].......[.c..t]ag.a[....g]ta..t.a[c....|.g....c]. [gt..c..]..[....]c...t-.[....]a[A..g.].......[.c..t]...c[cc.ag]tac.at.[ct.gg|..g..ac].
Homo sapiens (human)
Pan troglodytes (chimpanzee)
Pan paniscus (pygmy chimpanzee)
Gorilla gorilla (gorilla)
Pongo pygmaeus (orangutan)
Pongo pygmaeus abelii (Sumatran orangutan)
Papio hamadryas (hamadryas baboon)
Macaca sylvanus (Barbary ape)
Hylobates lar (common gibbon)
Cebus albifrons (white-fronted capuchin)
Lemur catta (ring-tailed lemur)
Nycticebus coucang (slow loris)
Tarsius bancanus (western tarsier)
1 2 2’ 3 3’ 4 4’ 1’
1 2 2’ 3 3’ 4 4’ 1’
A G
A A U U U
A G G U U
A A A
U
C A G A C C
A
A G A G C
C U
U C A A A
G C C C U
C A G
U A A G U
U
U A C U U
A A U U U C U G
Homo sapiens
(human) tRNATrp
5’
3’
A G
C
A A
A G
A G
A
A
U
C
U
A
A|C
G
C G
11 Total CPDs and 7 different types of CPDs in 10 species
Kondrashov et al. PNAS 2002
Predicted compensatory interactions
MVYPEPWCMPRM
VVYPEPWCMPRL
MVYPEPWHMPRL
MTFPEDYCMPRL
TTFPHDWCMPRL
TTFPEDWCMPRL
MVYPEPWCMPRL
MVYPEPWCMPGL
MVYPEPYCMPRL
MVYKERWHMPRL
MVYKEPWHMPRL
MVFPEDWCIPRL
MTFPEDWCIPRL
MTFPEDWCMPRL
MTFPYDWCMPRL
MTFPHDWQMPRL
MTYPHDLCMPRL
MTFPHDFCMPRL
MTFPHDLCMPRL
MMYPHDFCMPRL
Studying amino acid diversity in proteins
MVYPEPWCMPRM
VVYPEPWCMPRL
MVYPEPWHMPRL
MTFPEDYCMPRL
TTFPHDWCMPRL
TTFPEDWCMPRL
MVYPEPWCMPRL
MVYPEPWCMPGL
MVYPEPYCMPRL
MVYKERWHMPRL
MVYKEPWHMPRL
MVFPEDWCIPRL
MTFPEDWCIPRL
MTFPEDWCMPRL
MTFPYDWCMPRL
MTFPHDWQMPRL
MTYPHDLCMPRL
MTFPHDFCMPRL
MTFPHDLCMPRL
MMYPHDFCMPRL
MVYPEPWCMPRM
VVYPEPWCMPRM
TVYPEPWCMPRM
MTYPEPWCMPRM
MVYPYPWCMPRM
MVYPEDWCMPRM
MVYPEPYCMPRM
MVYPEPLCMPRM
MVYPEPFCMPRM
MVFPEPWCMPRM
MVYPHPWCMPRM
MVYKEPWCMPRM
MVYPEPWQMPRM
MVYPEPWHMPRM
MVYPEPWCIPRM
MVYPEPWCMPGM
MVYPEPWCMPRL
Number of sequences
(species)
Average Amino Acid Usage
AVERAGE 3538 9.5
ATP6 3021 10.0
ATP8 1244 11.2
COX1 4450 7.4
COX2 4204 10.6
COX3 2191 9.4
CYTB 7954 12.0
ND1 2056 10.0
ND2 5963 11.2
ND3 2852 10.5
ND4 2041 10.2
ND4L 1785 11.5
ND5 949 8.9
ND6 1015 10.8
Elongation factor 1743 3.7
Histone 3 1228 5.2
RuBisCO 13912 9.2
Amino acid usage predicts dn/ds
An average site in a protein can accept ~8 amino acid states. The non-epistatic expected dn/ds ratio of an average protein should be (u-1)/19, where u is the expected amino acid usage. 7/19 ~ 0.35
Short-term evolution rate
AAT CTC AAG CAT GGA
N L K H G
AGT CTA AAA TAT GGG
S L K Y G
Kn = Number of nonsynonymous substitutions/Number of nonsynonymous sites
Ks = Number of synonymous substitutions/Number of synonymous sites
Kn/Ks = 2/35 / 3/10 = 0.19
GGG
AGG GAG GCG
TGG
CGG GTG GGC
GGT
GGA
Number of pairwise comparisons
Pairwise dn/ds
Average dn/ds = 0.03
Fraction of clade-specific evolution
Gene Corrected Usage Expected dn/ds Observed dn/ds Fraction of non-
epistatic evolution
AVERAGE 8.387 0.389 0.059 0.15
0
2
4
6
8
10
12
0.6-0.7 0.7-0.8 0.8-0.9 0.9-1
Fraction of epistatic evolution
Nu
mb
er
of
ge
ne
s
MVYPEDWCMPRM
VVYPEDWCMPRL
MVYPEDWHMPRL
MVYPEDYCMPRL
MVYPHDWCMPRL
MVYPEDWCMPRL
MVYPEDWCMPRL
MVYPEDWCMPGL
MVYPEDYCMPRL
MVYPEDWCMPRL
MVYKEPWCMPRL
MVYPEDWCIPRL
MVYPEDWCIPRL
MVYPEDWCMPRL
MVYPYDWCMPRL
MVFPEDWQMPRL
MVYPEDWCMPRL
MTYPEDWCMPRL
MTYPEDWCMPRL
MMYPEDWCMPRL
MVYPEPWCMPRM
VVYPEPWCMPRL
MVYPEPWHMPRL
MTFPEDYCMPRL
TTFPHDWCMPRL
TTFPEDWCMPRL
MVYPEPWCMPRL
MVYPEPWCMPGL
MVYPEPYCMPRL
MVYKERWHMPRL
MVYKEPWHMPRL
MVFPEDWCIPRL
MTFPEDWCIPRL
MTFPEDWCMPRL
MTFPYDWCMPRL
MTFPHDWQMPRL
MTYPHDLCMPRL
MTFPHDFCMPRL
MTFPHDLCMPRL
MMYPHDFCMPRL
Expected protein divergence: fi,j is the frequency of amino acid
i at site j
L is the protein length
hemoglobin subunit beta [Macaca mulatta]
Sequence ID: ref|NP_001157900.1|Length: 147Number of Matches: 1
Score Expect Method Identities Positives Gaps
288 bits(736) 3e-97 139/147(95%) 143/147(97%) 0/147(0%)
Query 1 MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPK 60
MVHLTPEEK+AVT LWGKVNVDEVGGEALGRLLVVYPWTQRFF+SFGDLS+PDAVMGNPK
Sbjct 1 MVHLTPEEKTAVTTLWGKVNVDEVGGEALGRLLVVYPWTQRFFDSFGDLSSPDAVMGNPK 60
Query 61 VKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG 120
VKAHGKKVLGAFSDGL HLDNLKGTFA LSELHCDKLHVDPENF+LLGNVLVCVLAHHFG
Sbjct 61 VKAHGKKVLGAFSDGLNHLDNLKGTFAQLSELHCDKLHVDPENFKLLGNVLVCVLAHHFG 120
Query 121 KEFTPPVQAAYQKVVAGVANALAHKYH 147
KEFTP VQAAYQKVVAGVANALAHKYH
Sbjct 121 KEFTPQVQAAYQKVVAGVANALAHKYH 147
beta-globin [Mus musculus]
Score Expect Method Identities Positives Gaps
164 bits(414) 2e-48 118/147(80%) 131/147(89%) 0/147(0%)
Query 1 MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPK 60
MVHLT EKSAV+ LW KVN DEVGGEALGRLLVVYPWTQR+F+SFGDLS+ A+MGNPK
Sbjct 1 MVHLTDAEKSAVSCLWAKVNPDEVGGEALGRLLVVYPWTQRYFDSFGDLSSASAIMGNPK 60
Query 61 VKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG 120
VKAHGKKV+ AF++GL +LDNLKGTFA+LSELHCDKLHVDPENFRLLGN +V VL HH G
Sbjct 61 VKAHGKKVITAFNEGLKNLDNLKGTFASLSELHCDKLHVDPENFRLLGNAIVTVLGHHLG 120
Query 121 KEFTPPVQAAYQKVVAGVANALAHKYH 147
K+FTP QAA+QKVVAGVA ALAHKYH
Sbjct 121 KDFTPAAQAAFQKVVAGVATALAHKYH 147
beta-globin epsilon-m [Didelphis virginiana]
Score Expect Method Identities Positives Gaps
168 bits(425) 4e-50 108/147(73%) 132/147(89%) 0/147(0%)
Query 1 MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPK 60
MVH TPE+K+ +T++W KV+V++VGGE+L RLLVVYPWTQRFF+SFG+LS+ AVMGNPK
Sbjct 1 MVHFTPEDKTNITSVWTKVDVEDVGGESLARLLVVYPWTQRFFDSFGNLSSASAVMGNPK 60
Query 61 VKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG 120
VKAHGKKVL +F +G+ ++DNLKGTFA LSELHCDKLHVDPENFRLLGNVL+ VLA FG
Sbjct 61 VKAHGKKVLTSFGEGVKNMDNLKGTFAKLSELHCDKLHVDPENFRLLGNVLIIVLASRFG 120
Query 121 KEFTPPVQAAYQKVVAGVANALAHKYH 147
KEFTP VQA++QK+V+GV++AL HKYH
Sbjct 121 KEFTPEVQASWQKLVSGVSSALGHKYH 147
hemoglobin subunit rho [Gallus gallus]
Score Expect Method Identities Positives Gaps
165 bits(418) 4e-49 97/147(66%) 125/147(85%) 0/147(0%)
Query 1 MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPK 60
MVH + EEK +T++W KVNV+E G EAL RLL+VYPWTQRFF++FG+LS+P A++GNPK
Sbjct 1 MVHWSAEEKQLITSVWSKVNVEECGAEALARLLIVYPWTQRFFDNFGNLSSPTAIIGNPK 60
Query 61 VKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG 120
V+AHGKKVL +F + + +LDN+K T+A LSELHC+KLHVDPENFRLLGN+L+ VLA HF
Sbjct 61 VRAHGKKVLSSFGEAVKNLDNIKNTYAKLSELHCEKLHVDPENFRLLGNILIIVLAAHFT 120
Query 121 KEFTPPVQAAYQKVVAGVANALAHKYH 147
K+FTP QA +QK+V+ VA+ALA+KYH
Sbjct 121 KDFTPTCQAVWQKLVSVVAHALAYKYH 147
Sequence divergence beyond accumulation of deleterious alleles
0
100
200
300
400
500
0 10 20 30 40 50 60 70 80 90 100
Percent of pairwise sequence comparions beyond the theoretical divergence
Num
ber
of gene f
am
ilies
Expected protein divergence:
fi,j is the frequency of amino acid
i at site j
L is the protein length
1773 gene families
Small dn/ds but large sequence divergence -> epistasis (or frequent adaptation)
Wright’s Fitness Landscape
Wright, S 1931
"Functional proteins must form a continuous network which can be traversed by unit mutational steps without passing through nonfunctional intermediates"
Evolution of sequences
WORD <-> WORE <-> GORE <-> GONE <->GENE
- John Maynard Smith, Nature, 1970
Total sequence space
Non-epistatic fitness landscape
Epistatic fitness landscape
Microevolution
Accumulation of many allele replacements
Macroevolution
MDGHTSKLRG
MD HT K RG
MDSHTVKFRG
causation
Complicated, stochastic, dynamic world with a detailed underlying theory
Observation-based insights with almost no theory, which is necessarily based in the microevolutionary world
confirmation of theory
molecular biology
Our Institute