supporting online material for and pol genes translated from separately expressed mrnas and (iv)...

17
www.sciencemag.org/cgi/content/full/325/5947/1512/DC1 Supporting Online Material for Macroevolution of Complex Retroviruses Aris Katzourakis,* Robert J. Gifford,* Michael Tristem, M. Thomas P. Gilbert, Oliver G. Pybus *To whom correspondence should be addressed. E-mail: [email protected] (A.K.); [email protected] (R.J.G.) Published 18 September 2009, Science 325, 1512 (2009) DOI: 10.1126/science.1174149 This PDF file includes: Materials and Methods Figs. S1 to S4 Table S1 References

Upload: hoangcong

Post on 20-May-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

www.sciencemag.org/cgi/content/full/325/5947/1512/DC1

Supporting Online Material for

Macroevolution of Complex Retroviruses

Aris Katzourakis,* Robert J. Gifford,* Michael Tristem, M. Thomas P. Gilbert, Oliver G. Pybus

*To whom correspondence should be addressed. E-mail: [email protected]

(A.K.); [email protected] (R.J.G.)

Published 18 September 2009, Science 325, 1512 (2009) DOI: 10.1126/science.1174149

This PDF file includes:

Materials and Methods Figs. S1 to S4 Table S1 References

1

Supporting Online Material for:

Macroevolution of complex retroviruses

Aris Katzourakis1,2*¶, Robert J. Gifford1*¶, Michael Tristem3, M. Thomas P. Gilbert4, Oliver G. Pybus1

1Zoology Department and 2Institute for Emergent Infections, University of Oxford, UK 3Division of

Biology, Imperial College London, UK 4Natural History museum of Denmark, Copenhagen University,

Denmark ¶These authors contributed equally to this work. *Email: [email protected],

[email protected].

This PDF file includes:

Materials and Methods

Supplementary figures 1, 2, 3 and 4

Supplementary table 1

Supplementary references

2

Materials and Methods

Screening & Consensus Genome Construction

Vertebrate genomes were screened for endogenous foamy viruses (FVs) using tBLASTn and FV

reverse transcriptase (RT) query sequences derived from EU010385, Y07725, U04327,

AJ544579, X54482, M74895, U94514, AF201902 and Y08851. Sequences highly similar to FV

proteins discovered within the C. hoffmanni whole genome shotgun (WGS) assembly were

aligned with the abovementioned FV viruses, generating a ~11.5kb SloEFV consensus genome

that exhibited (i) long LTRs (~1150bp), (ii) lysine-1,2 tRNA-utilizing primer binding site, (iii)

gag and pol genes translated from separately expressed mRNAs and (iv) glycine-arginine boxes

in the N-terminus of Gag. Paralogous SloEFV elements were identified via shared host sequences

flanking the 3’ LTRs of non-identical insertions. We used BLAST to ascertain the number of

SloEFV elements with and without viral coding regions (the latter are “solo LTRs”). Reciprocal

comparison of the flanking regions of solo LTRs determined whether they represent distinct

insertions, or paralogs generated by host genome duplication events.

PCR & Cloning of SloEFV

DNA was extracted from tissue samples of C. didactylus, Bradypus pygmaeus, B. variegatus,

Myrmecophaga tridactyla and Dasypus novemcinctus using the Qiagen DNEasy extraction kit.

Extracts were targeted for SloEFV using the following pol gene PCR primers: Forward= SlothF1

5’-AAAGTAGTGGAAAGGTGGAAAGG-3’; Reverse= SlothR1 5’-

CTCTCCTGGACGAATTGGCC-3’, SlothR2 5’-CTCTCCTGGATGAATTGGCC-3’, SlothR3

5’- CTCTCCTGGACCAATTGGCC-3’, SlothR4 5’-TCTGTACCCTCTCCTGGACG-3’,

SlothR5 5’-TCTGTACCCTCTCCTGGATG-3’. The primers amplify an approximately 340bp

fragment, and were designed to bind to as conserved a region as possible between the aligned

samples, and the forward primer was designed to pair with 5 slightly varying reverse primers in

order to sample as wide a variation as possible of the SloEFV in the samples.

PCR was performed in 25µl reactions containing 0.1µl Amplitaq Gold, 0.1mM mixed dNTPs,

400nM of primer, 1xPCR buffer and 2.5mM MgCl2. The PCR reactions were cycled under the

following conditions: enzyme activation 95°C for 5 minutes, 40 cycles of (95°C for 15 seconds,

56°C for 15 seconds, 72°C for 30 seconds), final elongation 72°C for 7 minutes. Four of five

primer combinations amplified from all sloth species; none amplified from the anteater and

armadillo. BLAST screening of the armadillo WGS assembly with FV RT proteins confirmed the

3

absence of endogenous FV elements in D. novemcinctus. Amplicons from the 3 sloth species

were cloned using the Invitrogen Topo TA kit; insert-containing colonies were reamplified using

vector-specific primers and up to 16 clones sequenced at Macrogen (Seoul).

Phylogenetic Analysis

Maximum likelihood (ML) and Bayesian phylogenies estimated from a 156 amino acid alignment

of the conserved RT domain established that SloEFV grouped significantly within the extant FVs

(using RAXML and MrBayes; Supp. Fig. 1) (S1). The main text phylogeny was estimated from

concatenated alignments of FV Gag, Pol and Env proteins (172, 1075 and 714 amino acids,

respectively) that included 10 FV reference sequences. Best-fitting models for each gene were

Gag: rtREV+Γ+F, Pol: rtREV+Γ+F, Env: WAG+Γ+F. Phylogenetic analysis was performed

using gene-specific model partitions in RAXML and MrBayes (S2, S3). We also constructed a pol

gene nucleotide alignment (~340bp), containing the PCR-derived SloEFV amplicons from B.

variegatus, B. pygmaeus and C. didactylus, plus SloEFV sequences from the C. hoffmanni WGS

assembly, from which we estimated an ML phylogeny under the GTR+Γ substitution model using

RAXML (Supp. Fig. 1).

We estimated a neutral evolutionary rate for Xenarthan nuclear genes using a previously

published alignment of three nuclear genes. This alignment included the ADRA2B, VWF and

BRCA1 genes (totalling 5112bp) from 13 Xenarthan species. (S4). Synonymous substitution rates

were estimated using a codon model-based extension (S5) of the molecular clock approach in

BEAST (S6). A normal prior (mean=65Mya, st. dev.=3) was used to calibrate the phylogeny root

(S7) and trees were sampled under the GTR+Γ model and an uncorrelated lognormal relaxed

clock. Branch lengths within the posterior set of trees were then partitioned into synonymous and

nonsynonymous components using HYPHY (S8). We estimated a neutral substitution rate for

xenarthran nuclear genes of 1.83x10-9 substitutions/site/year (95% highest posterior density

intervals, HPDs: 1.6 - 2.1x10-9). This rate is amongst the lowest obtained for a mammal (typically

~2.2x10-9) but is consistent with sloths’ low metabolic rate, a hypothesized correlate of

substitution rate (S9).

The resulting synonymous substitution rate was applied to a strict clock ML phylogeny of 8

paralogous SloEFV elements (see main text), estimated in PAUP using the best-fitting GTR+Γ

model (S10). A likelihood ratio test failed to reject the strict clock hypothesis (p>0.05). We

conclude that the earliest duplication occurred ~39 Mya (95% HPDs: 34-45 Mya). This is a

4

conservative, minimum estimate of the origin of SloEFV, since the earliest duplication within the

host genome that we examined could have occurred substantially later than the SloEFV germline

invasion.

Supplementary Figure 1

(a) Phylogeny including SloEFV, exogenous foamy viruses (FV) and class III retroviral elements,

estimated from reverse transcriptase protein sequences. HFV=human FV, BFV=bovine FV,

EFV=equine FV, FFV=feline FV, SFV=simian FV (cpz=chimpanzee, spm=spider monkey,

agm=African green monkey, mac=macaque, orang=orangutan); ERV=endogenous retrovirus,

HERV=human ERV, MuERV= murine ERV; SnRV=snakehead retrovirus; SpEV=Sphenodon

ERV. Other retroviruses (RV) and ERVs are labeled according to their host species. Measures of

statistical support for phylogenetic groups are denoted using asterisks: * = Bayesian posterior

probability >0.9 and ** = maximum likelihood bootstrap score >70% and Bayesian posterior

probability >0.9. Maximum statistical support was obtained for the grouping of SloEFV with the

foamy viruses.

(b) Maximum likelihood phylogeny of SloEFV elements recovered from various sloth species.

The phylogeny includes both C. hoffmanni genome contigs and cloned sequences obtained from

C. didactylus, B. variegatus and B. pygmaeus. Non-sloth foamy virus sequences were used as an

outgroup. See (a) for sequence names and for explanation of clade support values.

5

Supplementary figure 2

A plot of the significant correlation (R2=0.74, p<0.0001) between foamy virus divergence and

mammalian host divergence times. The correlation is robust to the exclusion of the sloth branch

(square symbol; R2=0.66, p<0.0004), whose long term divergence rate may not be directly

comparable with other foamy viruses. Furthermore, the correlation is also robust to the exclusion

of the branches corresponding to the Old World primates (triangle symbols; R2=0.71, p=0.002), a

group within which codivergence has previously been shown to occur (S11). The discovery of

SloEFV makes possible the inclusion of a far broader range of mammalian taxa in cophylogenetic

analysis. In addition to SloEFV, our analysis also includes FFV, BFV and EFV, thus extending

the codivergence model to five mammalian orders that share a common ancestor during the early

diversification of mammals.

6

Supplementary Figure 3

Schematic of consensus SloEFV genome structure. The large size of the SloEFV genome

(11,435bp) and LTR regions (~1150bp) are typical of foamy viruses (S12). The following

additional characteristic foamy virus features of the SloEFV genome are indicated; (i) a primer

binding site that utilizes a lysine-1,2 tRNA; (iii) a characteristic and highly conserved ‘GGGTG’

motif at the PPT/U3 border (S13) (iv) gag and pol ORFs that are translated from separately

expressed mRNAs; (v) glycine-arginine (GR) ‘boxes’ in the N-terminal region of the Gag protein

(and no cysteine-histidine box) (S12). Putative gene boundaries within open reading frames were

determined by comparison with their known locations in other foamy viruses. LTR: Long

terminal repeat, PR protease, RT: reverse transcriptase; IN: integrase; SP: signal peptide; SU:

surface glycoprotein; TM; transmembrane.

LTR

Supplementary figure 4. SloEFV consensus genome sequence

1 TATCACAGGATAATATCTCTACTAAGAAATCTGATAATGAGAATGCTGATTCCTTCTATCCATGCTGCTCTCATATACCA 80 81 CCTGATGTTTTCTGTCCTGATAGTAAAGAACCAGTTATTACTCATGATATAGAAGCATATTTAACAGAACTGTTACCTAA 160 161 AAACACTCAAGTACAATTTCCAAATGACAAAAAAGTGGAAAGGCCTTCCTACATTAACTGGGACAGATGATTCATATCCT 240 241 TCAAGTCCCATATTAGAACTTGAAGGAGAAGCTAAACAGTTAAAAAAACAGTGAATATTTTCAAGAAAGCAGAAAGGCCC 320 321 TCTCAACCTTCTTACATATAGATAACATGTAATCACCTAGTCCTCTGTGAAAACAGGATGGAGCTGCAGCTGTTACTTTA 400 401 ACCATAAATAAGTCATAAATAATGTAGCAAGGTTAAGTACAGTCCTAATCAAATATATGTTTATCAATATGATGAAATGG 480 481 CTAAATAATCATAGAATGAAATGAAAAAACTATATTAATAAGTTAGAAACAAGTAACAGTAATCTCTTTTCCAATTTTTA 560 561 GTCTTTATGCAAGTAAAAGTATATTATATTGCTCTTACAGAAATCATAGCTTAATTAGTAAAAATATTATCAGTAGAGTT 640 641 TTATAGGCATTCAAGGCTAAGTCTCTGATAATGTGGATACCCTCCTTTGATCTTGACACTCATGGATCAAGATGCTTCAA 720 721 ATGAGATATTCATGTTCATGAAGCATCATTTAATGTAAAAATAGAATATAACTTAAAGTTACTCTAAATCAAGATGCTAT 800

801 AAACTGTAACTCAAACAAAAAGGAGCTCTCTCTAGTCTCAAAACCAGCTGTGTCTGGAGGAGTGGGGGCTCCCTCTCTTG 880

881 GTAATGTATCAATTAATTTTTACATTGTAAACTTGTTTCTTTTAGCTTAAGTGCTCCTTTAGTAAAAATGTGTTAAAGGT 960

961 GAATTCTTTTCTTAAGTGCTCTTTACTTTTAAACTATTGCCTGTCCCTAATACACTTAATTGAGTGGGAGGTAACTTGAT 1040

1041 GAAGCCCAAACCTATTAAATCAGGACCTGATTAGGCTCAGGCCTTTTCTAATGGAGATCATAGGCATGAGGTGTAAAGAA 1120

1121 ATGAACCCCTGACAATTGGCACCCAATGTGGGGCTCAAGACAAGAATAATTTTGATTTTGTACCTTGGTTAATTTTCCCT 1200

M A Q P Q N L D V A G L Q V L I G L

1201 AGGGACTTTAGGCACTGTCTAAAGTAATGGCTCAACCTCAAAATTTAGATGTTGCAGGATTACAAGTGTTAATAGGGCTG 1280

S G A R T P G H Q D I L T V R V D A G P W G V G S R F

1281 TCAGGAGCTAGGACTCCAGGGCATCAGGATATTCTTACTGTAAGAGTTGATGCAGGGCCATGGGGAGTAGGCAGCAGGTT 1360

A R I Q I D L Q D G N R Q P L P Q P T Y T P A P G P

1361 TGCTAGGATTCAAATTGATCTTCAGGATGGAAACAGACAGCCATTACCACAACCAACCTATACACCTGCACCTGGACCTG 1440

V D L T T D I L L N V S Y A Q L M A K L G D V P F L T

1441 TTGATTTAACTACAGATATTTTACTGAATGTTTCTTATGCACAATTAATGGCCAAACTAGGGGATGTACCATTTTTAACT 1520

G V Y R H G P L Y T G H W L P G D D L S H H F L P I T

1521 GGAGTATATAGACATGGACCATTATATACTGGTCATTGGCTGCCAGGGGATGATTTAAGCCATCATTTCCTACCCATAAC 1600

M A E L A L L N N Q Q I Q E E V A L L R E I I C R F

1601 AATGGCTGAATTAGCACTATTAAATAACCAGCAGATACAAGAAGAAGTAGCACTTCTGAGAGAAATAATATGTAGATTTC 1680

Q Q W Q A M G A P Q P Q N P P L Q P Q V I I Q P Q P G 1681 AACAGTGGCAAGCTATGGGGGCACCTCAGCCACAAAATCCTCCCCTTCAGCCACAGGTTATTATACAACCTCAACCTGGG 1760

PBS

Gag start

A L A L P M N H L R A V V G A T P N D P Q A I A L W L

1761 GCATTAGCCTTGCCTATGAATCATCTTCGAGCTGTTGTAGGAGCTACACCTAATGACCCCCAGGCCATAGCATTGTGGCT 1840

G R N V Q A I E G V M P I N N G P M R K Q V V N A L

1841 TGGACGAAATGTACAGGCTATAGAAGGTGTAATGCCTATTAACAATGGACCTATGCGAAAACAGGTTGTTAATGCTTTAT 1920

L A S H A T L H V T D Q E A Q D W N S T I A A I Y Q R

1921 TAGCTTCACATGCCACATTGCATGTCACAGACCAAGAAGCACAGGATTGGAATAGCACCATAGCTGCTATTTATCAAAGA 2000

A H G T L A L H H L P T V L K D I A N S D G V I V A F

2001 GCTCATGGAACTTTGGCATTACATCATTTGCCCACTGTGTTAAAGGATATTGCTAATTCTGATGGTGTTATAGTGGCTTT 2080

T M G M M F S N D D Y A L V S G I I R P L L P G Q A

2081 TACTATGGGGATGATGTTCTCTAATGATGACTATGCCCTAGTTAGTGGCATAATTCGACCACTGCTTCCAGGGCAAGCAG 2160

A V V A V Q A Q L D I L P D D N A K A S A F P E I V T

2161 CAGTAGTTGCTGTTCAAGCCCAATTAGATATACTTCCAGATGATAATGCTAAAGCAAGTGCTTTTCCTGAAATTGTCACT 2240

E V Y Q T L G L N I L G Q P M Q H P P G S Q Q C S N Q

2241 GAAGTATATCAAACCCTTGGACTTAACATTCTTGGACAACCTATGCAACATCCACCAGGGTCTCAACAATGTTCTAATCA 2320

Q G G N G R R Q Q Q T G N Q N K G G F R S S H P S Q

2321 ACAGGGAGGCAATGGAAGAAGACAACAACAAACAGGGAATCAAAATAAAGGAGGCTTCAGATCCTCTCATCCATCCCAAC 2400

H P P Q S P Q A P P I P R N R G N F G G N R G K Q D Q

2401 ATCCTCCACAATCTCCACAAGCACCCCCTATTCCTAGAAATAGAGGGAACTTTGGAGGTAATAGAGGAAAACAGGACCAA 2480

E G S Q G Q N H Q Q P Q N H G E D P P R Q L R H Y N L

2481 GAAGGATCACAAGGCCAAAACCACCAACAACCTCAGAATCATGGAGAAGATCCTCCCAGACAACTGAGACACTATAATCT 2560

R P N V N H P R W F N K P E G S S P N P Y R Q R E H

2561 GAGACCCAATGTAAATCATCCTCGATGGTTTAATAAACCAGAGGGAAGCTCTCCAAATCCCTATAGACAACGAGAACATC 2640

Q E Q Q Q Q D Q S K D R G V Q S N S Q P Q R S G G N N

2641 AAGAACAACAACAACAAGACCAGTCAAAAGACAGAGGAGTTCAAAGTAATTCACAGCCCCAAAGGTCAGGAGGAAATAAT 2720

R T V H L V Q Q V H S P P S P T S S H D G T S G P A Q M E L Q V Q H 2721 AGGACTGTGCACTTAGTGCAGCAAGTACATTCACCTCCCTCTCCTACTTCATCTCATGATGGAACTTCAGGTCCAGCACA 2800

G Q N I R G K I F D I M T Q W Q R F L V S Q K N F N N D N P V 2801 GGGGCAAAATATTTGACATTATGACTCAATGGCAGAGGTTTCTTGTTTCCCAAAAGAATTTTAATAATGATAATCCAGTA 2880

D Y E E I Q T I H G K Q K M P M Y Y L T F K V D G H K2881 GATTATGAGGAAATACAAACTATTCATGGTAAACAAAAAATGCCTATGTACTATTTAACCTTTAAAGTGGATGGACATAA 2960

F V G Q V I P T E L D Y A L I T A K V V P W I K L K 2961 GTTTGTAGGGCAAGTCATCCCCACAGAATTAGATTATGCTTTAATAACTGCTAAGGTTGTGCCTTGGATTAAACTAAAAA 3040

Pol start

Gag end

GR box I GR box II

GR box III

N C E V E L T I K L P L E E Y K N D I I K S S N I S E 3041 ATTGTGAAGTAGAACTTACTATTAAATTGCCACTTGAGGAATATAAAAATGACATTATTAAGAGTTCTAACATCTCTGAA 3120 Q G K D K L R L L L H K Y D S L W Q K W E K Q V S H R3121 CAAGGGAAGGATAAACTAAGGTTACTTTTACATAAATATGATTCCTTATGGCAAAAGTGGGAAAAACAGGTCAGTCAYAG 3200 K I P P H H I A T G T V A P K P Q K Q Y H I N Y K A 3201 AAAAATTCCTCCTCATCACATTGCAACAGGAACTGTTGCACCAAAGCCACAGAAGCAGTATCACATAAATTACAAGGCAA 3280

K L A I Q T V I N D L I K Q G V L L H Q N S S M N T P 3281 AACTTGCCATACAGACTGTAATAAATGATTTAATAAAACAAGGAGTGTTACTCCATCAGAATAGTTCTATGAATACTCCC 3360

I Y P V P K T N G S W R M V L N F R A V N K V I P L I3361 ATCTATCCAGTTCCCAAAACTAATGGATCTTGGAGGATGGTTTTAAACTTTAGAGCAGTTAATAAAGTTATTCCTTTAAT 3440

A V Q N Q Y S I E I L T Q M Q R E Q Y K T T L D L S 3441 AGCAGTTCAAAACCAGTATTCAATAGAGATATTAACACAAATGCAAAGAGAACAATATAAAACTACACTGGATCTATCAA 3520

N G F W A H P I R K E S Y W L M A F T W E G K Q L V W 3521 ATGGATTCTGGGCACATCCCATTAGAAAAGAAAGTTACTGGTTAATGGCATTTACTTGGGAAGGAAAACAATTAGTGTGG 3600

T R L P Q G F I N S P A L F T A N I V D I L K E I P D3601 ACAAGGTTACCTCAAGGTTTCATTAATAGTCCAGCATTATTTACTGCTAATATAGTGGATATATTAAAAGAAATACCAGA 3680

V E V Y V N D I Y F S N V T E E Q H L I T L K Q V L 3681 TGTAGAGGTTTATGTGAATGATATATATTTTTCTAATGTAACAGAAGAACAACATTTAATTACACTAAAACAAGTGCTTA 3760

K I L L K S G Y I V S L K K S E I A K E E V T F L S F 3761 AAATCTTATTAAAGAGTGGGTATATTGTTTCTTTAAAAAAGTCTGAGATAGCAAAAGAAGAAGTTACATTTCTCAGTTTT 3840

N I T K E G H G L T A K F R E K L L N I S A P K T L K3841 AATATTACTAAGGAAGGTCATGGTTTAACTGCAAAGTTTAGAGAAAAACTTCTAAATATCTCAGCTCCTAAAACCTTAAA 3920

Q L Q S I L G L L N F A H N V I T D F A E L T K P L 3921 ACAATTACAAAGTATATTGGGGCTTTTAAACTTTGCTCACAATGTTATTACAGATTTTGCTGAATTGACTAAACCTTTGT 4000

Y L V I S R A E G Q H I Q W M E K E G M A L Q E I I K 4001 ATCTGGTTATTAGCAGAGCAGAGGGGCAACATATTCAATGGATGGAGAAAGAAGGGATGGCATTACAAGAAATAATTAAA 4080

K L N N A S Y L E N R D I Q K P L I I K L N S S P T A4081 AAATTGAATAATGCCTCTTATTTAGAAAACAGGGATATTCAGAAACCCTTAATAATTAAACTTAACAGTTCACCAACAGC 4160

G Y I R M Y N K G G K K P I Q Y V N F I F T P A E I 4161 AGGATATATTCGAATGTATAATAAAGGAGGTAAAAAACCTATTCAATATGTTAATTTTATATTTACCCCTGCTGAAATAA 4240

K F K P T E K L L T T M H K A I I K G L D L S Q G A Q4241 AGTTTAAGCCTACTGAAAAGTTACTCACAACAATGCATAAAGCAATAATTAAGGGTCTAGACTTGTCACAAGGTGCACAA 4320

V H I Y S P L A S P T H I Q K T P L P E R K G L H S Q4321 GTACATATTTACTCACCTTTAGCCTCACCCACACATATTCAAAAAACACCCTTGCCAGAAAGAAAAGGTTTGCATTCCCA 4400

W I T W M T H F K N P Q L I F H H D P T L P D I Q N 4401 ATGGATTACTTGGATGACACATTTTAAAAACCCTCAACTAATATTTCACCATGACCCTACACTTCCAGACATTCAGAATT 4480

L P Q P L S E D N M Q I P P K Y D L T S Y T A V Y Y T 4481 TACCACAGCCTTTGTCAGAAGATAATATGCAGATACCCCCAAAATATGATTTAACTTCATATACAGCAGTATATTATACT 4560

D G S A I K N P N P Q K T H S A G I G I V K G K F D P4561 GATGGGTCAGCTATTAAAAATCCTAATCCTCAGAAGACACATTCTGCTGGTATAGGAATAGTGAAAGGTAAATTTGACCC 4640

N F S I I K Q W R F P L G D H T A Q Y A E I S A L E4641 TAATTTCTCCATTATTAAACAATGGAGATTTCCCTTAGGGGATCATACTGCACAATATGCTGAAATAAGTGCCCTAGAAT 4720

RT catalytic site

F A V K K A M M D K G P I L I V T N S M Y L A K S F N 4721 TTGCTGTCAAAAAAGCTATGATGGACAAAGGGCCAATTTTAATAGTTACAAACAGCATGTATTTAGCTAAAAGCTTTAAT 4800 E E L D I W I S N G F V N N K K K P L Q H I S K W K V4801 GAAGAATTGGATATTTGGATATCCAATGGGTTTGTTAATAATAAGAAAAAACCTCTGCAACATATAAGTAAATGGAAAGT 4880 I A N C K Q N K P S I H M V H E P G H Q K Q G T S I 4881 TATTGCTAATTGTAAGCAAAATAAGCCCAGTATCCACATGGTACATGAGCCAGGGCACCAGAAACAAGGTACTTCTATTC 4960 H T K G N L L A D Q L A V Q S S H M V G M V T K L P S 4961 ATACTAAAGGTAACCTTTTAGCAGACCAGTTGGCTGTACAAAGCAGTCATATGGTAGGAATGGTTACAAAATTACCAAGC 5040 L D K E L E Q V L D S K S P N P K G Y P V K I Y I Y I5041 CTGGACAAGGAGCTGGAACAAGTCTTGGATTCAAAGAGTCCTAACCCTAAGGGGTATCCTGTAAAAATATATATATATAT 5120 Y L L E N G N V I I E Q D E G K R I I P P V M E R V 5121 ATATCTTTTGGAAAATGGTAATGTTATTATTGAACAAGATGAAGGGAAAAGGATTATTCCTCCAGTTATGGAAAGAGTAA 5200 K L A Q Q A H N T F G T I H G G W E A T L I K L K N K 5201 AATTGGCCCAACAGGCCCACAATACATTTGGAACTATTCATGGAGGATGGGAAGCCACCTTAATTAAACTAAAAAATAAA 5280 Y W W P N M I K T V R S V V A N C E K C Q V T N A S S5281 TACTGGTGGCCTAATATGATTAAAACAGTCAGATCTGTTGTAGCCAATTGTGAGAAATGCCAAGTAACTAATGCTTCCTC 5360 Q I P T P P K T I I H P D K P F E K F Y M D Y I G P 5361 CCAAATTCCTACTCCTCCAAAAACAATTATTCATCCTGATAAACCTTTTGAAAAGTTTTATATGGATTATATTGGTCCTT 5440 L P S S H G H K H I L V V D D A R M G Y C W L F P T K 5441 TACCCTCATCTCATGGCCATAAACACATTCTTGTTGTTGATGATGCTAGAATGGGATACTGTTGGTTATTCCCAACCAAG 5520 A Q N A N A T V K A L N F L S G T A I P K V L H S D Q5521 GCCCAAAATGCTAATGCAACTGTTAAAGCTCTCAACTTTTTATCAGGTACTGCAATTCCTAAGGTGCTGCATTCTGATCA 5600 G S A F T S A T L Q Q W T K D R G I Q L E F S T P Y 5601 AGGATCAGCATTCACTTCTGCCACCTTGCAACAGTGGACCAAGGACAGAGGTATACAGTTGGAATTTAGTACCCCTTACC 5680 H P Q S S G K V E R K N G K I K R V L T K L L Y G W P 5681 ACCCCCAAAGTAGTGGAAAGGTGGAAAGGAAAAATGGTAAAATAAAACGAGTCTTAACTAAACTGTTGTATGGATGGCCT 5760 Q K W Y P L I P F V Q L S I N N I P S S Q T H Q T P H5761 CAAAAGTGGTATCCACTTATCCCTTTTGTTCAGCTTTCCATCAATAATATACCATCCTCACAGACACATCAAACACCTCA 5840 K L M F G V D S N L P F A N V D D A N L S R E E Q L 5841 TAAGTTAATGTTTGGTGTAGATTCTAATTTGCCTTTTGCAAATGTAGATGATGCTAATTTGTCTAGAGAAGAACAATTAT 5920 S L L Q E L R E E L T P A A S S S C T S G W K P F I G 5921 CTTTACTGCAGGAACTCAGAGAGGAGCTTACACCTGCTGCATCATCCTCCTGCACTTCTGGTTGGAAACCTTTCATTGGC 6000 Q F I Q E R V Q K Y T P L C P R W K K P T K I L T V F6001 CAATTCATCCAGGAGAGGGTACAGAAGTATACTCCTCTATGTCCACGATGGAAAAAGCCTACTAAAATTCTTACAGTATT 6080 D D H T V E I L D P L G Q R R K V S I D N L K P T A 6081 TGATGATCATACTGTTGAGATATTGGACCCTCTTGGCCAACGACGAAAGGTTAGTATTGATAATTTGAAGCCCACTGCTC 6160

M E K Y T P V L N L Q D W M V W N R A K D M E F F M H Y G K I Y T S P Q P S G L D G L E Y G K R Y G I F Y 6161 ATTATGGAAAAATATACACCAGTCCTCAACCTTCAGGATTGGATGGTTTGGAATATGGCAAAAGATATGGAATTTTTTAT 6240

R K D P N I D L E D P G W E S E E I S K P T A Q V R E E R S Q Y R S R G P R M G K 6241 GAGGAAAGATCCCAATATCGATCTAGAGGACCCAGGATGGGAAAGTGAAGAAATCTCAAAACCCACTGCACAGGTGAGGT 6320

Env start

Pol end

Forward primer F1

Reverse primers R1-3

Reverse primers R4-5

F R Y F L Y T L C A T S T Q I L C W F F F G L I I I G 6321 TCAGATATTTTTTGTATACCTTGTGTGCCACATCTACACAAATACTTTGCTGGTTCTTTTTTGGACTAATAATTATTGGA 6400

L I L G F I L S A V F R L Q W K N A I H H P G P I I S 6401 CTTATCCTAGGGTTTATTCTATCAGCAGTTTTTAGATTGCAATGGAAAAATGCTATACATCATCCTGGTCCAATAATATC 6480

W N L T S V T P M T D I P V A L P Q Y H R E R R A I 6481 TTGGAATTTAACTAGTGTCACACCAATGACTGATATTCCTGTTGCTCTCCCACAATATCATCGAGAACGACGAGCCATTC 6560

H P A P R N V H L E I C G L Q Q G M F W E Q F P K P I 6561 ATCCTGCACCTAGGAATGTTCATTTAGAAATATGTGGTCTTCAACAAGGTATGTTTTGGGAACAATTCCCTAAGCCTATT 6640

I H K K R T L G I S Q I L L I D T P L V W H K I I R Y 6641 ATACACAAGAAAAGAACTCTGGGTATCTCACAAATCCTCTTGATAGATACACCTCTAGTTTGGCACAAGATAATCAGATA 6720

I P L K D K K I L T Q L I D N E F A Q L Q E I V L P 6721 TATACCACTCAAGGACAAAAAAATCTTGACTCAGTTAATTGATAATGAATTTGCCCAATTGCAAGAAATTGTCTTGCCTT 6800

F T L P L D Q P Y T Q E Q Y Q Q K G C F Q E F G H C Y 6801 TTACTTTACCTTTAGACCAGCCATACACTCAGGAACAATATCAACAAAAGGGATGTTTTCAAGAATTTGGGCATTGCTAT 6880

L V K Y N S E R I W L T S K I I Q D H C L I P P S S G 6881 TTAGTAAAATATAATTCAGAAAGAATTTGGCTGACTTCTAAAATTATCCAGGATCATTGTCTTATACCTCCATCTAGTGG 6960

I S D T T R L N A W K Y Y I Q P Q I M R P R N W T I 6961 CATCTCTGATACCACCAGACTGAATGCCTGGAAGTATTACATTCAACCACAAATAATGAGACCAAGAAATTGGACTATTG 7040

A D Q N Y A C I C A Y S K P R G N A T Y K Q P S F C S 7041 CTGATCAAAATTATGCATGTATTTGTGCATATAGTAAGCCTAGGGGTAATGCTACATATAAACAGCCTAGCTTTTGTTCC 7120

T N M Y N S G R L L E L E L N K N K F K K E I G L L K 7121 ACCAATATGTATAACAGTGGCAGGTTGCTAGAACTAGAACTGAACAAGAACAAGTTCAAAAAAGAAATTGGCCTATTGAA 7200

D C A L P S E W K Q N N I S N R E P L E R L F K S P 7201 GGATTGTGCCCTGCCTTCAGAGTGGAAACAGAATAATATTTCTAATAGAGAACCTTTAGAAAGGTTATTCAAATCTCCCA 7280

T I T Q F C N H P E L I Y F L N T T Y T T Y S L W E G 7281 CTATTACTCAATTTTGTAATCATCCAGAGTTAATATACTTCCTAAATACCACATATACTACTTATTCCTTATGGGAAGGA 7360

D C G Y F Q K N V S S I L P E C V N F I K T K N I H P 7361 GATTGTGGATATTTTCAAAAGAATGTAAGCAGCATATTACCTGAATGTGTTAACTTTATTAAAACTAAGAATATTCATCC 7440

Y T C Q F W R Q F P D P K N I Q E E K V K C Q Y E S 7441 TTATACATGTCAATTTTGGAGACAATTCCCTGACCCAAAAAATATACAAGAAGAAAAAGTTAAATGTCAATATGAATCAC 7520

Q F S P G E F C L Y Y A K Q T V S E I K R D W R Q L A 7521 AGTTTTCTCCAGGTGAATTCTGCCTTTATTATGCAAAACAGACTGTCTCTGAAATTAAAAGGGATTGGAGACAATTAGCA 7600

Y S K K F P A P I C K Q E R K I V V P K Y K V K S L Y 7601 TACTCTAAAAAGTTCCCTGCACCAATATGTAAACAGGAAAGGAAAATTGTAGTGCCTAAATATAAGGTAAAATCACTATA 7680

Q K C I A K A K K H H V E S V Q L L N E L F I T Q I 7681 TCAGAAATGCATAGCAAAAGCTAAAAAACATCATGTAGAGAGTGTGCAGCTATTAAATGAACTTTTTATAACTCAAATAG 7760

E S N T I N I K E L P T E D K R W G L V N E V N M S N 7761 AGAGTAATACTATTAATATTAAAGAACTTCCAACTGAAGATAAGAGATGGGGGTTAGTTAATGAAGTCAATATGTCTAAT 7840

I Q I K S T K G S K S L S L K M R M K K P N T Q R L E 7841 ATACAAATTAAGAGTACAAAAGGAAGTAAATCCTTGTCCCTTAAAATGAGGATGAAAAAGCCCAACACTCAAAGACTTGA 7920

K V S L M M A N S T A T V S K L S D L N E Y L F A D 7921 GAAAGTAAGCCTTATGATGGCTAATTCAACAGCAACAGTATCTAAACTTTCTGACTTAAATGAGTATCTGTTTGCTGATG 8000

G L H I L K D H V V T L L E A N M K D T Q H I D E L T 8001 GATTACATATATTAAAAGACCATGTGGTGACTCTCTTGGAAGCCAACATGAAGGACACTCAGCATATAGATGAATTAACT 8080

T A M L I L S Y I Q N F R I P S T E G R I D W R I L N 8081 ACAGCCATGTTGATACTATCTTATATACAAAATTTCAGAATACCCTCCACAGAAGGGAGAATTGACTGGAGAATATTAAA 8160

G T W I N E G L N I P H H G M Q I V K R M S C S N I 8161 TGGAACATGGATTAATGAAGGTCTTAATATTCCTCATCATGGCATGCAAATAGTTAAGCGAATGTCATGCAGCAATATAT 8240

Y D I K K T Y T S P I K S I W E I G I Y Y Q I I L P N 8241 ATGATATAAAGAAAACTTACACATCTCCAATTAAAAGTATCTGGGAAATAGGGATTTATTATCAAATAATTCTACCTAAC 8320

K V F Y T N W Q V L N I G H L V K T G T Q L T L T K I 8321 AAAGTGTTCTATACTAACTGGCAAGTATTAAACATTGGACATTTAGTAAAAACAGGAACTCAGCTTACTTTAACAAAGAT 8400

H Q P Y T H I S Q E C S E L Y Y L E P K G C E Q R D 8401 ACATCAACCTTATACACATATTAGTCAAGAATGTTCTGAGTTATATTATTTAGAGCCTAAAGGATGTGAACAAAGAGATT 8480

Y L I C E E I N L H Q T C G N K T G S K C P V T G K A 8481 ATCTTATTTGTGAAGAAATTAATCTACATCAAACTTGTGGCAATAAAACTGGAAGTAAATGTCCAGTAACTGGAAAAGCA 8560

V S S P Y L E F I P L K N G S Y V V M S Y T I D C N I 8561 GTTTCATCTCCTTACTTAGAATTTATTCCTTTAAAGAATGGCAGTTATGTAGTTATGTCATATACTATAGACTGTAATAT 8640

P P Y Q S S I F T I N D T V T C F E K I L K K H L P 8641 ACCTCCATATCAATCCTCAATTTTCACTATCAATGATACAGTAACATGCTTTGAAAAAATATTAAAGAAACATTTGCCTA 8720

K E Q T V V L G N F H I P K I Q L R L P H L V G I L A 8721 AAGAGCAAACTGTTGTTTTGGGTAACTTTCATATACCAAAAATACAACTGAGGTTACCACACCTGGTTGGAATCTTAGCT 8800

K L K K I E V K A T D T W A S I E E Q I E D T K S D L 8801 AAGCTGAAGAAGATTGAAGTCAAGGCCACTGATACATGGGCTAGCATTGAAGAGCAGATAGAAGATACAAAGTCTGACCT 8880

L R L E L H K G D T P E W I K Q L G E A L E D V W P 8881 TCTCAGACTGGAATTACATAAAGGGGACACTCCAGAGTGGATCAAGCAATTAGGAGAAGCATTGGAGGATGTCTGGCCTG 8960

A A A S A T K T I A S F V S S A T K G I F G G I I D I 8961 CTGCTGCCTCAGCCACTAAAACAATTGCCAGCTTTGTGAGTTCTGCTACCAAAGGCATCTTTGGTGGAATAATTGATATA 9040

L T Y T K P I V I L I I I T I L I V L I F R I L K W L M A 9041 TTAACTTATACTAAGCCCATAGTCATCCTAATAATTATTACCATACTTATAGTATTGATTTTCAGGATTCTGAAATGGCT 9120

P N S E K K K E Q S K F S E K E G T V S D N G E R Q G N T A V L Q H L H9121 TCCAAATTCTCAGAAAAAGAAGGAACAGTAAGTGACAATGGAGAGAGACAAGGCAATACTGCTGTTTTACAGCATCTTCA 9200 K I L L Q G I C S E E C K W K L Q S A L L T L Y T L 9201 TAAAATCTTACTACAAGGCATTTGTAGTGAAGAATGTAAGTGGAAACTTCAATCTGCTCTATTAACTCTGTACACTCTCA 9280 I L L G M S C I I L H Y L F V A K T N K L S L K Y M C 9281 TACTGTTAGGAATGTCATGCATAATTTTGCATTATCTTTTTGTAGCTAAGACTAACAAACTTAGCCTGAAATATATGTGT 9360

M I P I Q I L S A S F Q K I C A E K F L D R K Y L F Y D T N P N S E C F L P I9361 AAAATATGTGCAGAAAAGTTCCTGGATAGAAAATATTTGTTTTATGATACCAATCCAAATTCTGAGTGCTTCCTTCCAAT 9440

* K Q N I A P R L Q F K S * R D I * * I V I M Y T L P K T E Y S T Q I T I Q E L K R H L I N C H N V H F A 9441 AAAAACAGAATATAGCACCCAGATTACAATTCAAGAGTTAAAGAGACATTTAATAAATTGTCATAATGTACACTTTGCCT 9520

C H H L F L L V R A I G D T R T A N Q A Q S F A N E H L P S P I P S S K S Y R R Y K N S Q P G T E L C K * A 9521 TGCCATCACCTATTCCTTCTAGTAAGAGCTATAGGAGATACAAGAACAGCCAACCAGGCACAGAGCTTTGCAAATGAGCA 9600

D W H L I L S M Y I P A T T L F P L M S K V P C L L * L A P D I E H V Y P S N N S V S T D E Q G T M S A E9601 TGACTGGCACCTGATATTGAGCATGTATATCCCAGCAACAACTCTGTTTCCACTGATGAGCAAGGTACCATGTCTGCTGA 9680

S I L I C Q N T H A A R K E I L Y P S P W R R S S R M H S D L P E H P C S Q E G D P V P I T M E E I F Q D 9681 GCATTCTGATCTGCCAGAACACCCATGCAGCCAGGAAGGAGATCCTGTACCCATCACCATGGAGGAGATCTTCCAGGATG 9760

T T S I T F N S P G L G D V V M I L S P H G T L I Y L D N L N N L Q L T R T W G C G Y D F E S T W N S D L S 9761 ACAACCTCAATAACCTTCAACTCACCAGGACTTGGGGATGTGGTTATGATTTTGAGTCCACATGGAACTCTGATTTATCT 9840

C C K N T T W C G G K F K S K C N P K Q C F A V Y Y L L 9841 TTGCTGTAAAAACACCACTTGGTGTGGTGGAAAGTTTAAATCTAAATGTAATCCTAAACAATGTTTTGCTGTATACTATA 9920

N Y W F W N F P K E G N K V I I W H L D F N H * I K *

9921 ATTACTGGTTTTGGAACTTCCCCAAAGAGGGGAACAAAGTAATTATATGGCATTTGGACTTTAATCACTGAATTAAATAA 10000

T S N R I L * I S F R F * S R Y K N S T V R T Y G * C

10001 ACCAGTAACAGAATACTGTAAATATCCTTTAGATTTTAATCCAGGTATAAGAATTCTACTGTCAGAACATATGGATAATG 10080

I C K P I F * I N G L L T L I E Y I P T H G Y G Q F

10081 CATATGCAAGCCTATATTTTGAATAAATGGGCTACTGACCCTAATTGAATACATTCCTACCCATGGGTATGGCCAATTTC 10160

Tas/Bel1 start

Env end

Bet/Bel2 (start uncertain)

Tas/Bel1 end

L K G Q E F Q R A M L I G P Y G M N I G E Q G E P D I

10161 TGAAAGGGCAAGAATTTCAGAGAGCAATGTTGATTGGCCCTTATGGAATGAACATTGGAGAACAAGGAGAGCCAGATATT 10240

N E I N D G M S Y I E F G E R N Q W E R V S Q D N I S

10241 AATGAAATTAATGATGGAATGAGTTATATAGAATTTGGAGAGAGAAATCAGTGGGTGAGGGTATCACAGGATAATATCTC 10320

T K K S D N E N A D S F Y P C C S H I P P D V F C P

10321 TACTAAGAAATCTGATAATGAGAATGCTGATTCCTTCTATCCATGCTGCTCTCATATACCACCTGATGTTTTCTGTCCTG 10400

D S K E P V I T H D I E A Y L T E L L P K N T Q V Q F

10401 ATAGTAAAGAACCAGTTATTACTCATGATATAGAAGCATATTTAACAGAACTGTTACCTAAAAACACTCAAGTACAATTT 10480

P N D K K V E R P S Y I N W D R

10481 CCAAATGACAAAAAAGTGGAAAGGCCTTCCTACATTAACTGGGACAGATGATTCATATCCTTCAAGTCCCATATTAGAAC 10560

10561 TTGAAGGAGAAGCTAAACAGTTAAAAAAACAGTGAATATTTTCAAGAAAGCAGAAAGGCCCTCTCAACCTTCTTACATAT 10640

10641 AGATAACATGTAATCACCTAGTCCTCTGTGAAAACAGGATGGAGCTGCAGCTGTTACTTTAACCATAAATAAGTCATAAA 10720

10721 TAATGTAGCAAGGTTAAGTACAGTCCTAATCAAATATATGTTTATCAATATGATGAAATGGCTAAATAATCATAGAATGA 10800

10801 AATGAAAAAACTATATTAATAAGTTAGAAACAAGTAACAGTAATCTCTTTTCCAATTTTTAGTCTTTATGCAAGTAAAAG 10880

10881 TATATTATATTGCTCTTACAGAAATCATAGCTTAATTAGTAAAAATATTATCAGTAGAGTTTTATAGGCATTCAAGGCTA 10960

10961 AGTCTCTGATAATGTGGATACCCTCCTTTGATCTTGACACTCATGGATCAAGATGCTTCAAATGAGATATTCATGTTCAT 11040

11041 GAAGCATCATTTAATGTAAAAATAGAATATAACTTAAAGTTACTCTAAATCAAGATGCTATAAACTGTAACTCAAACAAA 11120

11121 AAGGAGCTCTCTCTAGTCTCAAAACCAGCTGTGTCTGGAGGAGTGGGGGCTCCCTCTCTTGGTAATGTATCAATTAATTT 11200

11201 TTACATTGTAAACTTGTTTCTTTTAGCTTAAGTGCTCCTTTAGTAAAAATGTGTTAAAGGTGAATTCTTTTCTTAAGTGC 11280

11281 TCTTTACTTTTAAACTATTGCCTGTCCCTAATACACTTAATTGAGTGGGAGGTAACTTGATGAAGCCCAAACCTATTAAA 11360

11356 TCAGGACCTGATTAGGCTCAGGCCTTTTCTAATGGAGATCATAGGCATGAGGTGTAAAGAAATGAACCCCTGACA 11435

Supplementary Figure 4. Consensus SloEFV genome sequence. Locations of the proteins encoded by the

gag, pol, env, bel-1 and bel-2 genes were determined via homology to representative spumaviruses, and by

searches against the pFAM (S14) and GyDb (S15) databases. The inverted repeats at the ends of both LTRs

and the putative promoter and polyadenylation signals are indicated in bold type. Black lines adjacent to

the corresponding nucleotide sequences indicate the primer binding site (PBS) and poly-purine tract (PPT)

sequences (see Supp. Figure 3 for further details). The consensus sequence contained 34 in-frame stop

codons, at these positions, the inferred pre-substitution nucleotide (indicated in red) and amino acid

(indicated in bold) are shown.

PPT

Bet/Bel2 end

Supplementary table 1

Contig Number Size (bp) Similarity (%) Genomic regions present Contig98416.1 7542 95.34 env-tas-bet-3'LTR-flank Contig82444.3 4302 87.96 pol-env-tas-bet-3'LTR Contig81266.2 5660 89.4 pol-env Contig76864.4 2398 88.9 5'LTR-gag Contig76864.2 4163 91.49 pol-env-tas-bet Contig74930.2 6560 91.28 pol-env-tas-bet-3'LTR-flank Contig74589.4 3594 89.03 5'LTR-gag-pol Contig72963.3 4777 95.86 env-tas-bet-3'LTR-flank Contig68829.2 6063 89.91 env Contig65439.3 5292 95.95 env-tas-bet-3'LTR-flank Contig63331.2 5387 94.20 pol-env-tas-bet-3'LTR Contig57346.4 6841 85.34 pol-env Contig47383.1 6470 92.61 pol-env Contig46384.5 7157 95.09 env-tas-bet-3'LTR-flank Contig43436.2 4311 90.38 gag-pol Contig42555.3 7178 91.38 env-tas-bet-3'LTR-flank Contig4216.4 9623 90.16 env Contig36410.1 7185 94.14 env Contig34779.1 7172 91.65 gag-pol-env-tas-bet Contig338141.1 2199 89.99 5'LTR-gag-pol Contig32864.3 3955 87.22 pol-env-tas-bet-3'LTR Contig27836.3 11776 96.23 env-tas-bet-3'LTR-flank Contig258146.1 1977 96.10 env-tas-bet Contig25151.5 6552 92.76 pol-env Contig24015.2 8766 96.01 env-tas-bet-3'LTR-flank Contig215902.1 1995 95.43 env-tas-bet Contig20260.6 3848 90.69 pol-env Contig161454.1 1339 92.47 gag Contig135600.2 2347 91.29 pol-env-tas-bet Contig129349.1 1784 94.26 5'LTR-gag-pol Contig113851.3 1904 88.67 5'LTR-gag Contig11379.5 4670 89.21 pol-env Contig111332.1 3175 88.45 5'LTR-gag-pol Contig109963.2 3023 96.68 env-tas Contig100104.1 2991 92.32 gag-pol

A table of the contigs that were used to build the consensus sequence, indicating the contig name,

contig size, pairwise distance of each contig to the assembled consensus, and viral regions

contained within each contig. Contigs that have a ‘-flank’ indicated at the 3’ end of the viral

genome contain homologous flanking region, and are part of the 8 elements contained within the

genomic duplication that we used to provide a minimum estimate of the age of SloEFV. The

contigs were taken from the C. hoffmanni genome assembly provided by the Washington

University School of Medicing Genome Sequencing Centre (WUSTL GSC).

Supplementary References

S1. E. Herniou et al., J Virol 72, 5955 (1998). S2. A. Stamatakis, Bioinformatics 22, 2688 (2006). S3. F. Ronquist, J. P. Huelsenbeck, Bioinformatics 19, 1572 (2003). S4. F. Delsuc et al., Molecular Biology And Evolution 19, 1656 (2002). S5. P. Lemey et al., Plos Computational Biology 3, 282 (2007). S6. A. J. Drummond, S. Y. W. Ho, M. J. Phillips, A. Rambaut, Plos Biology 4, 699

(2006). S7. F. Delsuc, S. F. Vizcaino, E. J. P. Douzery, BMC Evolutionary Biology 4 (2004). S8. S. L. K. Pond, S. D. W. Frost, S. V. Muse, Bioinformatics 21, 676 (2005). S9. J. F. Gillooly, A. P. Allen, G. B. West, J. H. Brown, Proceedings Of The National

Academy Of Sciences Of The United States Of America 102, 140 (2005). S10. D. Posada, K. A. Crandall, Bioinformatics 14, 817 (1998). S11. W. M. Switzer et al., Nature 434, 376 (2005). S12. C. H. Lecellier, A. Saib, Virology 271, 1 (2000). S13. O. Delelis, J. Lehmann-Che, A. Saib, Curr Opin Microbiol 7, 400 (2004). S14. R. D. Finn et al., Nucleic Acids Research 34, D247 (2006). S15. C. Llorens, R. Futami, D. Bezemer, A. Moya, Nucleic Acids Res 36, D38 (2008).