islandpath: a computational aid for identifying genomic islands

1
IslandPath: A computational aid for identifying genomic islands that may play a role in microbial pathogenicity William Hsiao 1 *, Nancy Price 2 , Ivan Wan 3 , Steven J. Jones 3 , and Fiona S. L. Brinkman 1 . 1 Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, 2 Department of Medical Genetics, University of British Columbia, Vancouver, and 3 Genome Sequence Centre, B.C. Cancer Agency, British Columbia, Canada Abstract As more genomes from bacterial pathogens are sequenced, it is becoming apparent that a significant proportion of virulence factors are encoded in clusters of genes, termed Pathogenicity Islands (reviewed in 1 ). These islands and other genomic islands, tend to have atypical guanine and cytosine content (%G+C), contain mobility genes (e.g. transposases and integrases), and are associated with tRNA sequences. We have developed a web-based computational tool, IslandPath, to aid the visualization of these features in a full genome display in order to facilitate the identification of genes in new genome sequences that may be involved in virulence or have horizontal origins. The ability to visualize these features within the genomic context can facilitate better detection of the genomic island borders and neighbouring genes. Atypical %G+C by itself is not indicative of the horizontal origin of the sequence involved, however, the predictive power increases when such regions are associated with mobile elements, direct repeats, or contain genes with similarity to known virulence factors. Therefore, we are incorporating into IslandPath algorithms to detect partial tRNAs in new genomic sequences that are likely to be the reminiscent of phage insertion events, and are also comparing the genomic sequences to a custom-built database of a subset of known virulence factors. Preliminary results are encouraging through our investigation of the ability of IslandPath to visualize known Pathogenicity Islands as distinct regions within the genomes. This computational tool also permitted us to perform a more in-depth analysis of %G+C variance in genomes and enabled us to detect correlations not previously reported. As more and more genome data become available, tools like IslandPath, which can be updated in an automated fashion, will become valuable for genomic research. Acknowledgements This project is funded by the Peter Wall Institute for Advanced Studies. We wish to thank Tatiana Tatusov of NCBI for providing helpful files for IslandPath and acknowledge the efforts of the many genome projects that have made our analysis possible. www.pathogenomics.bc.ca/brinkman Methods: Core scripts written in Perl and CGI/Perl Sequence Data: NCBI Genome FTP site Potential mobility elements: COG analysis 2,3 plus keyword scan RNA locations: NCBI data plus tRNAscan-SE 4 %G+C calculated for each ORF Mean and Std. Dev. for all ORFs in genome calculated File containing all ORF information used to generate a graphical representation Virulence Gene Subset (VGS) database developed through literature analysis of genes identified as virulence factors using the “Molecular Koch’s Bacterial Pathogens Primary Diseases Cellular Localizati on # of ORFs %G+C Mean (ORFs >300bp ) %G+C S.D. (ORFs >300bp ) Neisseria meningitidis serogroup B strain MC58 meningitis extracellular 2025 52.4 6.9 Neisseria meningitidis serogroup A strain Z2491 meningitis extracellular 2121 52.6 6.5 Xylella fastidiosa Citrus variegated chlorosis extracellular 2766 53.4 5.4 Escherichia coli O157:H7 (E. coli O157:H7_EDL933) diarrhoea facultative intracellular 5361 (5349) 51.1 (51.9) 5.3 (5.3) Mycoplasma pneumoniae M129 mycoplasmal pneumonia ("walking pneumonia") extracellular 677 40.3 4.9 Yersinia pestis strain CO92 bubonic plague and Pneumonic plague facultative intracellular 3885 48.3 4.7 Streptococcus pneumoniae TIGR4 (S. pneumoniae R6) bacterial pneumonia, meningitis, sepsis, and otitis media extracellular 2094 (2043) 40.3 (40.4) 4.4 (4.3) Treponema pallidum Nichols syphilis extracellular 1031 51.4 4.2 Mycoplasma pulmonis murine respiratory mycoplasmosis extracellular 782 27.2 3.8 Pseudomonas aeruginosa PAO1 variety of mucosal infections (opportunistic) extracellular 5565 67.0 3.8 Rickettsia conorii Malish 7 Mediterranean spotted fever obligate intracellular 1374 32.4 3.8 Ureaplasma urealyticum serovar 3 urethritis extracellular 613 25.8 3.8 Vibrio cholerae N16961 cholera extracellular I: 2736 II: 1092 I: 48.1 II: 46.9 I: 3.7 II: 4.3 Borrelia burgdorferi B31 Lyme disease facultative intracellular 851 28.7 3.6 Streptococcus pyogenes scarlet fever, toxic shock like syndrome extracellular 1696 38.9 3.6 Mycoplasma genitalium G37 urethritis (opportunistic, usually HIV patients) extracellular 484 31.4 3.5 Campylobacter jejuni NCTC11168 gastroenteritis extracellular 1654 30.6 3.5 Helicobacter pylori 26695 (H. pylori J99) peptic ulcers and gastritis extracellular 1566 (1491) 39.4 (39.7) 3.4 (3.3) Haemophilus influenzae Rd-KW20 upper respiratory infection meningitis extracellular 1709 38.5 3.4 Mycobacterium tuberculosis CDC1551 (M. tuberculosis H37Rv) tuberculosis facultative intracellular 4187 (3918) 65.5 (65.6) 3.3 (3.3) Pasteurella multocida PM70 fowl cholera, cattle septicemia, etc. extracellular 2014 40.8 3.3 Rickettsia prowazekii Madrid E epidemic typhus obligate intracellular 834 30.1 3.3 Staphylococcus aureus Mu50 (S. aureus N315) food poisoning, toxic shock syndrome, necrotizing fascitis extracellular 2714 (2595) 33.3 (32.2) 3.0 (3.0) Mycobacterium leprae Leprosy obligate intracellular 2720 60.0 2.9 c:2721 l:1833 59.7 l: 2.9 Chlamydophila pneumoniae AR39 (C. pneumoniae J138) chlamydial pneumonia obligate intracellular 1110 (1070) 41.1 (41.1) 2.6 (2.6) %G+C Analysis for Complete Genome Sequences: Non-pathogens # of ORFs %G+C Mean (ORFs >300bp) %G+C S.D. (ORFs >300bp) Escherichia coli K12 4289 51.3 4.7 Discussion: IslandPath appears to be an effective automated tool to visualize and detect genomic islands. Previous reports have expressed concern about the use of %G+C to detect HGT; however, these reports were examining %G+C for individual genes. We propose that %G+C analysis is effective if clusters of genes containing motifs associated with mobility elements are considered. Foreign genes with similar %G+C to the organism’s genome are not detected, and due to gene amelioration, only “recent” HGT can be detected. This tool represents one approach that can be complemented with others, to prioritize particular genomic islands that merit further research. Future developments: Virulence factor homology search (based on comparison to our VGS dataset) Alternative DNA signatures (e.g. codon usage) Allow users to input their own sequences for analysis %G+C Analysis General Observations: High %G+C variance is associated with species with evidence of recent horizontal gene transfers (e.g. N. meningitidis). Low %G+C variance is associated with highly clonal species and species with no evidence of horizontal gene transfers (e.g. Chlamydia species, which are obligate intracellular microbes thought to have been ecologically isolated from other bacteria for a longer period than other obligate intracellular bacteria). %G+C variance is similar for single species, with the exception of the two V. cholerae chromosomes and two E. coli strains. However, chromosome II of V. cholerae appears to have originated from a megaplasmid captured by Vibrio 5 . For E. coli, pathogenic strain O175:H7 has higher %G+C variance. This might be due to the presence of PAI and other potentially horizontally transferred genetic elements. Frequencies of ORF %G+C in Genomes: Histograms of frequencies of %G+C were plotted for several organism Observations: Lowest kurtosis occurs most commonly with a mode of 33.33% for %G+C values of ORFs in a genome (e.g. M. jannaschii DSM2661) This G+C value corresponds to maximum A/T in synonymous sites for the standard codon usage table. Long tails in the frequency plots occur more frequently downward (e.g. H. pylori J99 and N. meningitidis) than upward These observations likely reflect either a bias in gene identification in high G+C genomes, or a selection to higher A+T content. Detection of Proposed or Potential Genomic Islands: Escherichia coli O157:H7: Area displayed in white rectangle is ~ 28kb in size (from 3708kbp to 3736kbp) and contains Type III Secretion proteins Epr’s, Epa’s, and Eiv’s; and numerous hypothetical proteins with unknown functions Vibrio cholerae chromosome I: Area displayed in red rectangle is ~ 34kb in size (from 1896kbp to 1930kbp) and contains a tRNA-ser in the same orientation as the phage integrase downstream of it. The ORFs contain one putative helicase, one chemotaxis protein MotB-related protein, one putative type I restriction enzyme HsdR, one putative DNA methylase, one putative N-acetylneuraminate lyase, one C4-dicarboxylate-binding periplasmic protein, and numerous hypothetical proteins and conserved hypothetical proteins. tRNA when adjacent to an abnormal %G+C region is often observed to be in the same orientation as the stretch. This might be an artefact of phage insertion and excision events as 3’ end of tRNA are common phage attachment (att) sites. Horizontal Gene Transfer and Bacterial Pathogenicity: Several types of mobile elements have been shown to carry virulence factors: Transposons: ST enterotoxin genes in E. coli Prophages: Shiga-like toxins in EHEC Diptheria toxin gene Cholera toxin Botulinum toxins Plasmids: Shigella, Salmonella, Yersinia Pathogenicity Islands: Uro/Entero-pathogenic E. coli Salmonella typhimurium Yersinia spp. Helicobacter pylori Vibrio cholerae References 1 Hacker J and Kaper JB, 2000, Annu Rev Microbiol. 54:641-79 2 Tatusov RL, et al., 1997, Science 278(5338):631-7 3 Tatusov RL, et al., 2001, Nucleic Acids Res. 29(1)22-8 4 Lowe TM and Eddy SR, 1997, Nucleic Acids Res. 25(5):955-64 5 Heidelberg JF, et al., 2000, Nature 406:477-84 Whole Genome (predicted) ORF Display: Genome ORFs are displayed to allow interesting regions (rich in mobility genes, abnormal %G+C, close to structural RNAs) to be viewed in a genome context. E.g. H. Pylori 26695 Genome Several low %G+C regions can be seen in the graphic display: = CAG island = plasticity zone (contain different genes for J99 and 26695) = region contains virB homologues; not present in strain J99 Detection of Known Pathogenicity Islands: Vibrio cholerae chromosome I: VPI (toxin regulated pili) VPI delineated as a stretch of low %G+C region flanked by mobility genes %GC S.D. Location Orientation Product 56.48 +1 2140840..2142861 - pesticin/yersiniabactin receptor protein 58.81 +2 2142992..2144569 - yersiniabactin siderophore biosynthetic protein 58.33 +2 2144573..2145376 - yersiniabactin biosynthetic protein YbtT 60.40 +2 2145373..2146473 - yersiniabactin biosynthetic protein YbtU 60.79 +2 2146470..2155961 - yersiniabactin biosynthetic protein 60.15 +2 2156049..2162156 - yersiniabactin biosynthetic protein 56.35 +1 2162347..2163306 - transcriptional regulator YbtA 57.29 +1 2163473..2165275 + lipoprotein inner membrane ABC-transporter 58.62 +2 2165262..2167064 + inner membrane ABC- transporter YbtQ 59.48 +2 2167057..2168337 + putative signal transducer 55.25 +1 2168365..2169669 + putative salicylate synthetase 52.65 2169863..2171125 - integrase Yersinia pestis strain CO92: High Pathogenicity Island core (in red rectangle) Mean: 47.9 STD DEV: 4.9 IslandPath Graphical Display: Each dot in a graphic corresponds to a predicted protein-coding ORF in the genome. Dot colours indicate if an ORF has a higher or lower %G+C than cutoffs you set (default settings are +/- 3.48* of the mean %G+C). You may click on a dot to view a portion of an annotation table presented below the graphic. •3.48 = 1.5 S.D. of the mean for Chlamydia genomes, which are proposed to have undergone no recent horizontal gene transfer (data not shown).

Upload: anka

Post on 15-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Whole Genome (predicted) ORF Display:. Genome ORFs are displayed to allow interesting regions (rich in mobility genes, abnormal %G+C, close to structural RNAs) to be viewed in a genome context. E.g. H. Pylori 26695 Genome. %G+C Analysis General Observations:. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: IslandPath: A computational aid for identifying genomic islands

IslandPath: A computational aid for identifying genomic islands that may play a role in microbial pathogenicity

William Hsiao1*, Nancy Price2, Ivan Wan3, Steven J. Jones3, and Fiona S. L. Brinkman1.1Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, 2Department of Medical Genetics, University of British Columbia, Vancouver, and

3Genome Sequence Centre, B.C. Cancer Agency, British Columbia, Canada

AbstractAs more genomes from bacterial pathogens are sequenced, it is becoming apparent that a significant proportion of virulence factors are encoded in clusters of genes, termed Pathogenicity Islands (reviewed in 1). These islands and other genomic islands, tend to have atypical guanine and cytosine content (%G+C), contain mobility genes (e.g. transposases and integrases), and are associated with tRNA sequences. We have developed a web-based computational tool, IslandPath, to aid the visualization of these features in a full genome display in order to facilitate the identification of genes in new genome sequences that may be involved in virulence or have horizontal origins. The ability to visualize these features within the genomic context can facilitate better detection of the genomic island borders and neighbouring genes. Atypical %G+C by itself is not indicative of the horizontal origin of the sequence involved, however, the predictive power increases when such regions are associated with mobile elements, direct repeats, or contain genes with similarity to known virulence factors. Therefore, we are incorporating into IslandPath algorithms to detect partial tRNAs in new genomic sequences that are likely to be the reminiscent of phage insertion events, and are also comparing the genomic sequences to a custom-built database of a subset of known virulence factors. Preliminary results are encouraging through our investigation of the ability of IslandPath to visualize known Pathogenicity Islands as distinct regions within the genomes. This computational tool also permitted us to perform a more in-depth analysis of %G+C variance in genomes and enabled us to detect correlations not previously reported. As more and more genome data become available, tools like IslandPath, which can be updated in an automated fashion, will become valuable for genomic research.

AcknowledgementsThis project is funded by the Peter Wall Institute for Advanced Studies.We wish to thank Tatiana Tatusov of NCBI for providing helpful files for IslandPath and acknowledge the efforts of the many genome projects that have made our analysis possible.

www.pathogenomics.bc.ca/brinkman

Methods:Core scripts written in Perl and CGI/Perl

Sequence Data: NCBI Genome FTP site

Potential mobility elements: COG analysis2,3 plus keyword scan

RNA locations: NCBI data plus tRNAscan-SE4

%G+C calculated for each ORF

Mean and Std. Dev. for all ORFs in genome calculated

File containing all ORF information used to generate a graphical representation

Virulence Gene Subset (VGS) database developed through literature analysis of genes identified as virulence factors using the “Molecular Koch’s Postulates” (i.e. gene knockout affects virulence)

Bacterial Pathogens

Primary DiseasesCellular

Localization# of

ORFs

%G+C Mean(ORFs

>300bp)

%G+C S.D.(ORFs

>300bp)

Neisseria meningitidis serogroup B strain MC58

meningitis extracellular 2025 52.4 6.9

Neisseria meningitidis serogroup A strain Z2491

meningitis extracellular 2121 52.6 6.5

Xylella fastidiosa Citrus variegated chlorosis

extracellular 2766 53.4 5.4

Escherichia coli O157:H7

(E. coli O157:H7_EDL933)

diarrhoea facultative intracellular

5361

(5349)

51.1

(51.9)

5.3

(5.3)

Mycoplasma pneumoniae M129

mycoplasmal pneumonia ("walking pneumonia")

extracellular 677 40.3 4.9

Yersinia pestis strain CO92 bubonic plague and Pneumonic plague

facultative intracellular

3885 48.3 4.7

Streptococcus pneumoniae TIGR4

(S. pneumoniae R6)

bacterial pneumonia, meningitis, sepsis, and otitis media

extracellular 2094

(2043)

40.3

(40.4)

4.4

(4.3)

Treponema pallidum Nichols

syphilis extracellular 1031 51.4 4.2

Mycoplasma pulmonis murine respiratory mycoplasmosis

extracellular 782 27.2 3.8

Pseudomonas aeruginosa PAO1

variety of mucosal infections (opportunistic)

extracellular 5565 67.0 3.8

Rickettsia conorii Malish 7 Mediterranean spotted fever

obligate intracellular

1374 32.4 3.8

Ureaplasma urealyticum serovar 3

urethritis extracellular 613 25.8 3.8

Vibrio cholerae N16961 cholera extracellular I: 2736

II: 1092

I: 48.1

II: 46.9

I: 3.7

II: 4.3

Borrelia burgdorferi B31 Lyme disease facultative intracellular

851 28.7 3.6

Streptococcus pyogenes scarlet fever, toxic shock like syndrome

extracellular 1696 38.9 3.6

Mycoplasma genitalium G37

urethritis (opportunistic, usually HIV patients)

extracellular 484 31.4 3.5

Campylobacter jejuni NCTC11168

gastroenteritis extracellular 1654 30.6 3.5

Helicobacter pylori 26695

(H. pylori J99)

peptic ulcers and gastritis extracellular 1566

(1491)

39.4

(39.7)

3.4

(3.3)

Haemophilus influenzae Rd-KW20

upper respiratory infection

meningitis

extracellular 1709 38.5 3.4

Mycobacterium tuberculosis CDC1551

(M. tuberculosis H37Rv)

tuberculosis facultative intracellular

4187

(3918)

65.5

(65.6)

3.3

(3.3)

Pasteurella multocida PM70

fowl cholera, cattle septicemia, etc.

extracellular 2014 40.8 3.3

Rickettsia prowazekii Madrid E

epidemic typhus obligate intracellular

834 30.1 3.3

Staphylococcus aureus Mu50

(S. aureus N315)

food poisoning, toxic shock syndrome, necrotizing fascitis

extracellular 2714

(2595)

33.3

(32.2)

3.0

(3.0)

Mycobacterium leprae Leprosy obligate intracellular

2720 60.0 2.9

Agrobacterium tumefacien C58 (Cereon)

crown gall (in plants) Extracellular c:2721

l:1833

c: 59.8

l: 59.7

c: 2.7

l: 2.9

Chlamydophila pneumoniae AR39

(C. pneumoniae J138)

[C. pneumoniae CWL029]

chlamydial pneumonia obligate intracellular

1110

(1070)

[1052]

41.1

(41.1)

[41.1]

2.6

(2.6)

[2.6]

Chlamydia trachomatis D chlamydia obligate intracellular

894 41.5 2.3

Chlamydia muridarum MoPn

chlamydia obligate intracellular

909 40.8 2.2

%G+C Analysis for Complete Genome Sequences:

Non-pathogens # of ORFs %G+C Mean(ORFs >300bp)

%G+C S.D.(ORFs >300bp)

Escherichia coli K12 4289 51.3 4.7

Discussion:IslandPath appears to be an effective automated tool to visualize and detect genomic islands. Previous reports have expressed concern about the use of %G+C to detect HGT; however, these reports were examining %G+C for individual genes. We propose that %G+C analysis is effective if clusters of genes containing motifs associated with mobility elements are considered.Foreign genes with similar %G+C to the organism’s genome are not detected, and due to gene amelioration, only “recent” HGT can be detected. This tool represents one approach that can be complemented with others, to prioritize particular genomic islands that merit further research. Future developments:

Virulence factor homology search (based on comparison to our VGS dataset)

Alternative DNA signatures (e.g. codon usage)Allow users to input their own sequences for analysis

%G+C Analysis General Observations:High %G+C variance is associated with species with evidence of recent horizontal gene transfers (e.g. N. meningitidis).

Low %G+C variance is associated with highly clonal species and species with no evidence of horizontal gene transfers (e.g. Chlamydia species, which are obligate intracellular microbes thought to have been ecologically isolated from other bacteria for a longer period than other obligate intracellular bacteria).

%G+C variance is similar for single species, with the exception of the two V. cholerae chromosomes and two E. coli strains. However, chromosome II of V. cholerae appears to have originated from a megaplasmid captured by Vibrio5. For E. coli, pathogenic strain O175:H7 has higher %G+C variance. This might be due to the presence of PAI and other potentially horizontally transferred genetic elements.

Frequencies of ORF %G+C in Genomes:Histograms of frequencies of %G+C were plotted for several organisms.

Observations:

Lowest kurtosis occurs most commonly with a mode of 33.33% for %G+C values of ORFs in a genome (e.g. M. jannaschii DSM2661) This G+C value corresponds to maximum A/T in synonymous sites for the standard codon usage table.

Long tails in the frequency plots occur more frequently downward (e.g. H. pylori J99 and N. meningitidis) than upward

These observations likely reflect either a bias in gene identification in high G+C genomes, or a selection to higher A+T content.

Detection of Proposed or Potential Genomic Islands:

Escherichia coli O157:H7:Area displayed in white rectangle is ~ 28kb in size (from 3708kbp to 3736kbp) and contains Type III Secretion proteins Epr’s, Epa’s, and Eiv’s; and numerous hypothetical proteins with unknown functions

Vibrio cholerae chromosome I:

Area displayed in red rectangle is ~ 34kb in size (from 1896kbp to 1930kbp) and contains a tRNA-ser in the same orientation as the phage integrase downstream of it. The ORFs contain one putative helicase, one chemotaxis protein MotB-related protein, one putative type I restriction enzyme HsdR, one putative DNA methylase, one putative N-acetylneuraminate lyase, one C4-dicarboxylate-binding periplasmic protein, and numerous hypothetical proteins and conserved hypothetical proteins.

tRNA when adjacent to an abnormal %G+C region is often observed to be in the same orientation as the stretch. This might be an artefact of phage insertion and excision events as 3’ end of tRNA are common phage attachment (att) sites.

Horizontal Gene Transfer and Bacterial Pathogenicity:Several types of mobile elements have been shown to carry virulence factors:

Transposons: ST enterotoxin genes in E. coli

Prophages:Shiga-like toxins in EHECDiptheria toxin gene Cholera toxinBotulinum toxins

Plasmids:Shigella, Salmonella, Yersinia

Pathogenicity Islands:Uro/Entero-pathogenic E. coliSalmonella typhimuriumYersinia spp.Helicobacter pyloriVibrio cholerae

References1 Hacker J and Kaper JB, 2000, Annu Rev Microbiol. 54:641-79

2 Tatusov RL, et al., 1997, Science 278(5338):631-7

3 Tatusov RL, et al., 2001, Nucleic Acids Res. 29(1)22-8

4 Lowe TM and Eddy SR, 1997, Nucleic Acids Res. 25(5):955-64

5 Heidelberg JF, et al., 2000, Nature 406:477-84

Whole Genome (predicted) ORF Display:Genome ORFs are displayed to allow interesting regions (rich in mobility genes, abnormal %G+C, close to structural RNAs) to be viewed in a genome context. E.g. H. Pylori 26695 Genome

Several low %G+C regions can be seen in thegraphic display:

= CAG island

= plasticity zone (contain different genes for J99 and 26695)= region contains virB homologues; not present in strain J99

Detection of Known Pathogenicity Islands:

Vibrio cholerae chromosome I: VPI (toxin regulated pili)

VPI delineated as a stretch of low %G+C region flanked by mobility genes

%GC S.D. Location Orientation Product56.48 +1 2140840..2142861 - pesticin/yersiniabactin receptor protein58.81 +2 2142992..2144569 - yersiniabactin siderophore biosynthetic protein58.33 +2 2144573..2145376 - yersiniabactin biosynthetic protein YbtT60.40 +2 2145373..2146473 - yersiniabactin biosynthetic protein YbtU60.79 +2 2146470..2155961 - yersiniabactin biosynthetic protein60.15 +2 2156049..2162156 - yersiniabactin biosynthetic protein56.35 +1 2162347..2163306 - transcriptional regulator YbtA57.29 +1 2163473..2165275 + lipoprotein inner membrane ABC-transporter58.62 +2 2165262..2167064 + inner membrane ABC-transporter YbtQ59.48 +2 2167057..2168337 + putative signal transducer55.25 +1 2168365..2169669 + putative salicylate synthetase52.65 2169863..2171125 - integrase

Yersinia pestis strain CO92: High Pathogenicity Island core(in red rectangle)

Mean: 47.9 STD DEV: 4.9

IslandPath Graphical Display:

Each dot in a graphic corresponds to a predicted protein-coding ORF in the genome. Dot colours indicate if an ORF has a higher or lower %G+C than cutoffs you set (default settings are +/- 3.48* of the mean %G+C). You may click on a dot to view a portion of an annotation table presented below the graphic. •3.48 = 1.5 S.D. of the mean for Chlamydia genomes, which are proposed to have undergone no recent horizontal gene transfer (data not shown).