macro-array and bioinformatic analyses reveal ... · pdf fileproposed evolutionary scenario of...

14
Macro-array and bioinformatic analyses reveal mycobacterial ‘core’ genes, variation in the ESAT-6 gene family and new phylogenetic markers for the Mycobacterium tuberculosis complex Magali Marmiesse, 1 Priscille Brodin, 1 Carmen Buchrieser, 2 Christina Gutierrez, 3 Nathalie Simoes, 2 Veronique Vincent, 3 Philippe Glaser, 2 Stewart T. Cole 1 and Roland Brosch 1 Correspondence Roland Brosch [email protected] Unite ´ de Ge ´ne ´ tique Mole ´ culaire Bacte ´ rienne 1 , Laboratoire de Ge ´ nomique des Micro-organismes Pathoge ` nes 2 , Centre National de Re ´fe ´ rence des Mycobacte ´ ries 3 , Institut Pasteur, 25–28 rue du Docteur Roux, 75724 Paris Cedex 15, France Received 22 July 2003 Revised 30 September 2003 Accepted 2 October 2003 To better understand the biology and the virulence determinants of the two major mycobacterial human pathogens Mycobacterium tuberculosis and Mycobacterium leprae, their genome sequences have been determined recently. In silico comparisons revealed that among the 1439 genes common to both M. tuberculosis and M. leprae, 219 genes code for proteins that show no similarity with proteins from other organisms. Therefore, the latter ‘core’ genes could be specific for mycobacteria or even for the intracellular mycobacterial pathogens. To obtain more information as to whether these genes really were mycobacteria-specific, they were included in a focused macro-array, which also contained genes from previously defined regions of difference (RD) known to be absent from Mycobacterium bovis BCG relative to M. tuberculosis. Hybridization of DNA from 40 strains of the M. tuberculosis complex and in silico comparison of these genes with the near-complete genome sequences from Mycobacterium avium, Mycobacterium marinum and Mycobacterium smegmatis were undertaken to answer this question. The results showed that among the 219 conserved genes, very few were not present in all the strains tested. Some of these missing genes code for proteins of the ESAT-6 family, a group of highly immunogenic small proteins whose presence and number is variable among the genomically highly conserved members of the M. tuberculosis complex. Indeed, the results suggest that, with few exceptions, the ‘core’ genes conserved among M. tuberculosis H37Rv and M. leprae are also highly conserved among other mycobacterial strains, which makes them interesting potential targets for developing new specific anti-mycobacterial drugs. In contrast, the genes from RD regions showed great variability among certain members of the M. tuberculosis complex, and some new specific deletions in Mycobacterium canettii, Mycobacterium microti and seal isolates were identified and further characterized during this study. Together with the distribution of a particular 6 or 7 bp micro-deletion in the gene encoding the polyketide synthase pks15/1, these results confirm and further extend the revised phylogenetic model for the M. tuberculosis complex recently presented. INTRODUCTION The enormous efforts in the field of mycobacterial geno- mics in the past few years have resulted in the availability of the whole-genome sequences of Mycobacterium tuber- culosis and Mycobacterium leprae, the aetiological agents of human tuberculosis and leprosy (Cole et al., 1998, 2001). In addition, the genome sequences of several other myco- bacterial species have been accomplished recently (Garnier et al., 2003; Fleischmann et al., 2002) or are in the finishing phase (for an overview see http://www.pasteur.fr/recherche/ unites/Lgmb/mycogenomics.html). These sequences are providing a huge body of information, whose analysis can provide deep insights into the biology and evolution of mycobacterial pathogens and related organisms. One effi- cient way for extracting relevant biological information from sequence data is comparative genomics, a discipline Abbreviations: BAC, bacterial artificial chromosome; RD, region of difference. Representative sequences of the junction regions reported in this article have been deposited in the EMBL database under accession numbers AJ583832, AJ583833 and AJ583834. 0002-6662 G 2004 SGM Printed in Great Britain 483 Microbiology (2004), 150, 483–496 DOI 10.1099/mic.0.26662-0

Upload: tranthuy

Post on 17-Mar-2018

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Macro-array and bioinformatic analyses reveal ... · PDF fileproposed evolutionary scenario of the M. tuberculosis com-plextobetestedfurtherandconsolidated(Brosch etal.,2002). METHODS

Macro-array and bioinformatic analyses revealmycobacterial ‘core’ genes, variation in the ESAT-6gene family and new phylogenetic markers for theMycobacterium tuberculosis complex

Magali Marmiesse,1 Priscille Brodin,1 Carmen Buchrieser,2

Christina Gutierrez,3 Nathalie Simoes,2 Veronique Vincent,3

Philippe Glaser,2 Stewart T. Cole1 and Roland Brosch1

Correspondence

Roland Brosch

[email protected]

Unite de Genetique Moleculaire Bacterienne1, Laboratoire de Genomique des Micro-organismesPathogenes2, Centre National de Reference des Mycobacteries3, Institut Pasteur, 25–28 rue

du Docteur Roux, 75724 Paris Cedex 15, France

Received 22 July 2003

Revised 30 September 2003

Accepted 2 October 2003

To better understand the biology and the virulence determinants of the two major mycobacterial

human pathogens Mycobacterium tuberculosis and Mycobacterium leprae, their genome

sequences have been determined recently. In silico comparisons revealed that among the 1439

genes common to both M. tuberculosis and M. leprae, 219 genes code for proteins that show

no similarity with proteins from other organisms. Therefore, the latter ‘core’ genes could be

specific for mycobacteria or even for the intracellular mycobacterial pathogens. To obtain more

information as to whether these genes really were mycobacteria-specific, they were included in a

focused macro-array, which also contained genes from previously defined regions of difference

(RD) known to be absent from Mycobacterium bovis BCG relative to M. tuberculosis. Hybridization

of DNA from 40 strains of the M. tuberculosis complex and in silico comparison of these genes

with the near-complete genome sequences from Mycobacterium avium, Mycobacterium marinum

and Mycobacterium smegmatis were undertaken to answer this question. The results showed

that among the 219 conserved genes, very few were not present in all the strains tested. Some

of these missing genes code for proteins of the ESAT-6 family, a group of highly immunogenic

small proteins whose presence and number is variable among the genomically highly conserved

members of the M. tuberculosis complex. Indeed, the results suggest that, with few exceptions,

the ‘core’ genes conserved among M. tuberculosis H37Rv and M. leprae are also highly

conserved among other mycobacterial strains, which makes them interesting potential targets

for developing new specific anti-mycobacterial drugs. In contrast, the genes from RD regions

showed great variability among certain members of the M. tuberculosis complex, and some new

specific deletions in Mycobacterium canettii, Mycobacterium microti and seal isolates were

identified and further characterized during this study. Together with the distribution of a

particular 6 or 7 bp micro-deletion in the gene encoding the polyketide synthase pks15/1,

these results confirm and further extend the revised phylogenetic model for the M. tuberculosis

complex recently presented.

INTRODUCTION

The enormous efforts in the field of mycobacterial geno-mics in the past few years have resulted in the availabilityof the whole-genome sequences of Mycobacterium tuber-culosis and Mycobacterium leprae, the aetiological agents of

human tuberculosis and leprosy (Cole et al., 1998, 2001). Inaddition, the genome sequences of several other myco-bacterial species have been accomplished recently (Garnieret al., 2003; Fleischmann et al., 2002) or are in the finishingphase (for an overview see http://www.pasteur.fr/recherche/unites/Lgmb/mycogenomics.html). These sequences areproviding a huge body of information, whose analysis canprovide deep insights into the biology and evolution ofmycobacterial pathogens and related organisms. One effi-cient way for extracting relevant biological informationfrom sequence data is comparative genomics, a discipline

Abbreviations: BAC, bacterial artificial chromosome; RD, region ofdifference.

Representative sequences of the junction regions reported in thisarticle have been deposited in the EMBL database under accessionnumbers AJ583832, AJ583833 and AJ583834.

0002-6662 G 2004 SGM Printed in Great Britain 483

Microbiology (2004), 150, 483–496 DOI 10.1099/mic.0.26662-0

Page 2: Macro-array and bioinformatic analyses reveal ... · PDF fileproposed evolutionary scenario of the M. tuberculosis com-plextobetestedfurtherandconsolidated(Brosch etal.,2002). METHODS

that is based on bioinformatics and various hybridizationtechniques. In the present study, we have developed andemployed a focused macro-array for the analysis of strainsbelonging to the M. tuberculosis complex, a tight-knitcomplex of slow-growing mycobacteria comprising M.tuberculosis, the causative agent in the vast majority ofhuman tuberculosis cases; Mycobacterium canettii andMycobacterium africanum, both agents of human tubercu-losis in sub-Saharan Africa;Mycobacteriummicroti, an agentof tuberculosis in voles; and Mycobacterium bovis, whichinfects a wide variety of mammalian species includinghumans. The members of the complex share great geneticsimilarity, seen by homology at the DNA level of greaterthan 99?9%, but differ by some particular phenotypiccharacteristics including different host preferences thathave led researchers to retain the traditional species namesof these bacteria (Brosch et al., 2000). For easier datahandling and because the variability of house-keeping genesis negligible, the number of genes included in the arraywas restricted to 500 specially selected genes from M.tuberculosis and M. bovis. The selection of genes was basedon several criteria, including their presence in the genomesof M. tuberculosis and M. leprae relative to genomes ofother organisms available in public databases, their potent-ial to be involved in virulence, their genomic location closeto the highly conserved IS1081 insertion elements or theiraffiliation to certain gene families. Of particular interestwere the ‘core’ mycobacterial genes, of which there are 219.These were first identified on the basis of their restrictionto M. leprae and M. tuberculosis, and their conservationby the former, despite the massive gene decay that hasoccurred, strongly suggests that these gene functions areessential for M. leprae and possibly other mycobacteria.Genes that were variable between M. tuberculosis and thehighly related vaccine strain M. bovis BCG (Mahairaset al., 1996; Gordon et al., 1999; Behr et al., 1999) werealso included. The hybridization of genomic DNAs from alarge number of mycobacterial strains to the so-designedarrays, combined with bioinformatic analyses of the se-quence data available for mycobacteria, made it possible tosimultaneously identify genes that were conserved through-out the mycobacteria, and to determine the genes thatwere missing from one or more strains. Whereas highlyconserved genes that are confined only to mycobacteriaencode potential new drug targets, variable genes maybe implicated in the virulence or host range of the myco-bacterium concerned. Furthermore, some of the newlyidentified and analysed variable regions in M. canettii, M.microti and seal isolates, as well as the occurrence of aparticular micro-deletion of 6 or 7 bp in the polyketidesynthase pks15/1 gene of certain strains, enable the recentlyproposed evolutionary scenario of the M. tuberculosis com-plex to be tested further and consolidated (Brosch et al., 2002).

METHODS

Bacterial strains. The 40 M. tuberculosis complex strains werecomposed of 18 M. tuberculosis strains and 11 M. bovis strains

isolated from different organs of humans and animals, originatingfrom different countries. Two M. bovis BCG vaccine strains(Birkhaug and Merieux), two strains that were listed as M. africa-num (001, 940946) in the collection of the Institut Pasteur as wellas four M. microti (ATCC 35782, 94/2272, 005004, OV254), twoM. canettii (14000059, 990161) and one M. canettii-like clinicalisolate (990263) were included. The strains have been extensivelycharacterized by reference typing methods, i.e. IS6110-RFLP typingand spoligotyping. Other tested mycobacterial species were Myco-bacterium avium, Mycobacterium marinum and Mycobacteriumsmegmatis. For the investigation of the micro-deletion in the pks15/1locus, 21 selected genomic DNAs were taken from a collection pre-viously used for the study of the evolution of the M. tuberculosiscomplex (Brosch et al., 2002) in addition to 15 DNAs from themacro-array hybridization study.

In silico analyses of mycobacterial species. The completegenome sequences of M. tuberculosis and M. leprae were shown todiffer extensively in size and number of genes. The genome ofM. tuberculosis comprises 4 411 532 bp and 3993 protein-codinggenes (Cole et al., 1998; Camus et al., 2002), whereas M. lepraecontains 3 268 203 bp and only 1605 genes, but numerous pseudo-genes (Cole et al., 2001). In silico comparison of the predicted pro-teins shared by M. tuberculosis and M. leprae was done employingBLAST and FASTA alignment programs against public databases andpartial genome sequences (available at web sites http://www.sanger.ac.uk/Projects/M_marinum/ and http://www.tigr.org). To be listedas a conserved mycobacterial protein, 40% identity at the proteinlevel between M. tuberculosis and M. leprae was used as the cut-offlevel (Cole, 2002). The presence or absence of these genes inM. avium, M. marinum and M. smegmatis was determined in a simi-lar manner by using 40% identity over at least 70% of the completelength of the tested protein. For selected cases, to determine ifproteins corresponded to orthologous proteins in two species, thebi-directional best-hit method was applied, by comparing a givenprotein of M. tuberculosis with the sequence of another species,e.g. M. marinum. The protein sequence from M. marinum, whichshowed the highest similarity, was then compared back to theM. tuberculosis database, and, in the case of an orthologous protein,showed its best hit with the protein with which the initial compari-son was started. This method was particularly useful when genes orproteins from multi-gene families were compared, as high scoresdue to cross-hybridization may appear.

PCR. PCR amplification was used for making probes, for confirma-tion of the absence of genes that were suggested to be absent incertain tested strains by macro-array results, and for generating theDNA fragments of junction regions of deleted regions. Accordingto the type of application, different volumes were used. For theproduction of probes, which were spotted on the macro-array,PCRs were performed in 96-well plates containing 12?5 ml of106 PCR buffer [600 mM Tris/HCl pH 8?8, 20 mM MgCl2, 170 mM(NH4)2SO4, 100 mM b-mercaptoethanol], 12?5 ml of 20 mM nucleo-tide mix, 25 ml each primer at 2 mM, 10 ng template DNA, 10%DMSO, 2 U Taq polymerase (Gibco-BRL) and sterile water to 125 ml.Amplification of junction regions and evaluation of the presence orabsence of genes that showed no or weak hybridization by macro-array experiments with genomic DNA from a given strain wereperformed in a total reaction volume of 12?5 ml, as described pre-viously (Brosch et al., 2002). Thermal cycling was performed on aPTC-100 amplifier (MJ) with an initial denaturation step of 90 s at95 uC, followed by 35 cycles of 30 s at 95 uC, 1 min at 58 uC and4 min at 72 uC.

Macro-arrays. The selection of 500 genes for the focused macro-array included 219 genes common to both M. tuberculosis andM. leprae that code for proteins that did not show any similaritywith proteins from other organisms in the public databases. Genes

484 Microbiology 150

M. Marmiesse and others

Page 3: Macro-array and bioinformatic analyses reveal ... · PDF fileproposed evolutionary scenario of the M. tuberculosis com-plextobetestedfurtherandconsolidated(Brosch etal.,2002). METHODS

that were classified in the M. tuberculosis H37Rv genome as poten-tially involved in virulence (Cole et al., 1998) as well as genes thatbelonged to certain multi-gene families (Cole et al., 1998; Tekaiaet al., 1999) were also included in the selection. Differences in thecopy number of the insertion element IS1081 in the M. tuberculosiscomplex are almost entirely restricted to M. canettii. To determine ifgenes that flank IS1081 elements in the genome of M. tuberculosisH37Rv were conserved throughout the members of the M. tuberculosiscomplex, these genes were also selected for the construction of themacro-arrays. The selection also included genes that were variablebetween M. tuberculosis and the highly related vaccine strain M.bovis BCG (Mahairas et al., 1996; Gordon et al., 1999; Behr et al.,1999). Several house-keeping genes and other genes for which oligo-nucleotides were available in the laboratory were used for controlpurposes. The sequences of selected genes from M. tuberculosisH37Rv and M. bovis BCG Pasteur were downloaded by using thecomplete genome sequence (http://genolist.pasteur.fr/TubercuList/)displayed by the ARTEMIS software (Rutherford et al., 2000) or byusing in-house databases. The design of primer pairs for the amplifi-cation of ~500 bp portions of these genes was done using thePRIMER 3 software (available via http://www-genome.wi.mit.edu/cgi-bin/primer/primer3_www.cgi). The oligonucleotides used for theamplification of the probes were designed to have annealingtemperatures in the range 58–60 uC. PCRs were performed in a finalvolume of 125 ml as described above. Fifty-five microlitres of the125 ml of the PCR product were transferred from the 96-well platesin 384-well plates using a pipetting robot (Tecan). After estimationof the amount of amplification product by gel electrophoresis andethidium-bromide staining, each PCR product was then depositedin duplicate on a 22622 cm Q-filter N+222 mm membrane(Genetix) by a gridding robot (QPIX). Probes were fixed and dena-tured by putting the freshly spotted membranes on Whatman papersoaked with fixation solution (0?5 M NaOH, 1?5 M NaCl) and leav-ing them for 15 min. The reaction was stopped by using distilledwater. Membranes were stored wet at 220 uC for further use.Membranes were pre-hybridized and hybridized in 10 ml of a solu-tion containing SSPE buffer (750 mM NaCl, 50 mM NaH2PO4,5 mM EDTA) with 1% SDS, Denhardt’s reagent composed of0?01% Ficoll, 0?01% polyvinylpyrrolidone and 0?01% BSA andsonicated salmon DNA at a final concentration of 100 mg ml21. Pre-hybridization was performed during 1 h at 65 uC. Genomic DNAsfrom the various mycobacterial strains were labelled by randomincorporation of [a-33P]dCTP into the synthesized complementarystrand using the Prime-it II kit (Stratagene). Unincorporatednucleotides were removed by exclusion chromatography with theQIAquick Nucleotide Removal kit (Qiagen). Membranes were hybri-dized overnight at 65 uC with the labelled probe followed by fourwashing steps in 10 ml of 0?56 SSPE, 0?2% SDS solution. The firsttwo washes were done at room temperature for 5 min followed bytwo washes at 65 uC for 20 min. Membranes were sealed in Saranwrap, exposed to a screen for 2 days and scanned on a STORMphosphorimager (Molecular Dynamics); signals were quantified andvisualized using the IMAGE QUANT (Molecular Dynamics) software.The hybridization signals of each spot hybridized with the genomicDNAs from the various strains were compared to a control mem-brane hybridized with the reference strain M. tuberculosis H37Rvusing the ARRAY VISION software (Imaging Research). For normaliza-tion purposes, the intensities from the central and the surroundingarea of each spot were calculated. The intensity from the surround-ing area, due to non-specific background hybridization signals, wassubtracted from the spot intensity. To compare spot intensities fromdifferent membranes, a mean background intensity was calculatedfor each membrane, which was then used for establishing a correc-tion factor to normalize the spot intensities from individual mem-branes. The log10 ratios between the normalized intensities of eachspot from a tested strain compared to the reference strain M. tuber-culosis H37Rv were used to estimate whether a gene was present or

absent in a given strain. The cut-off was determined using aGaussian model involving the mean and the standard deviation. Forconfirmation purposes, the genes which were found absent by thisapproach were re-tested by PCR analysis in the correspondingstrain.

Sequencing of junction regions. For genes that were missingfrom certain strains, PCR confirmation was done as describedabove. Then, new primers that were situated in the flanking regionsof the missing gene(s) were designed. After amplification of thefragment containing the junction region of a given new deletedregion, the fragment was purified by using a QIAquick PCR purifi-cation kit. For sequencing, we used 500 ng purified amplificationproduct, 3 ml Big Dye sequencing mix (Applied Biosystems), 2 ml(2 mM) flanking primer and 3 ml of 56 buffer (5 mM MgCl2,200 mM Tris/HCl pH 8?8). Thermal cycling was performed on aPTC-100 amplifier (MJ), with an initial denaturation step of 1 minat 96 uC, followed by 35 cycles of 30 s at 96 uC, 15 s at 56 uC and4 min at 60 uC). The products were then precipitated with 80 ml of76% ethanol, centrifuged, washed with 70% ethanol and dried.Then, 2 ml of formamide/EDTA buffer were added and, after dena-turation, the samples were loaded onto 4% polyacrylamide gels(48 cm). Electrophoresis lasted for 10–12 h on a model 377 auto-mated DNA sequencer (Applied Biosystems). Obtained sequenceswere compared to the genome sequence of M. tuberculosis H37Rvusing the TubercuList server at the Institut Pasteur, allowing the sizeand exact location of deleted regions to be determined. Represent-ative sequences of the junction regions were deposited in the EMBLdatabase under accession numbers AJ583832, AJ583833 and AJ583834.

RESULTS

Bioinformatic analysis

Comparative genome analysis was carried out by screeningall proteins in public databases for similarities with puta-tive proteins from M. tuberculosis and M. leprae usingthe BLASTP and FASTA programs. Initially, this approachidentified 219 genes encoding orthologous proteins in M.tuberculosis andM. leprae, but which showed no appreciablesimilarity with other proteins (Cole, 2002). The conserva-tion of these genes by M. leprae in the face of extensivereductive evolution strongly suggests that they encodeessential functions. These genes appear to code for proteinsthat are restricted to mycobacteria and possibly to closelyrelated actinobacteria, whose sequences were not availablein the public databases at the time of writing. Many ofthese genes were classified in the original genome analysis(Cole et al., 1998) in the group for which no function couldbe predicted. However, during the re-annotation of theM. tuberculosis H37Rv genome (Camus et al., 2002), basedon similarities with new sequence data from other organ-isms or experimental data (Rosenkrands et al., 2000), themajority of these genes were re-grouped into categoriesfor which a higher level of functional information is nowavailable. Of the 219 genes specific for M. tuberculosis andM. leprae, 102 belong to the cell-wall and cell-processesclass and 97 to the class of conserved hypothetical pro-teins. Ten are PE and PPE proteins with the remainderbelonging to other classes (Fig. 1, Table 1). To test whetherthese genes were indeed conserved throughout the genus

http://mic.sgmjournals.org 485

Analyses of the M. tuberculosis complex

Page 4: Macro-array and bioinformatic analyses reveal ... · PDF fileproposed evolutionary scenario of the M. tuberculosis com-plextobetestedfurtherandconsolidated(Brosch etal.,2002). METHODS

Mycobacterium, in silico comparison of the translatedsequences with the predicted open reading frames (ORFs)from the almost-finished genome sequences ofM.marinum,M. avium and M. smegmatis were undertaken. This analy-sis showed that, with a few exceptions, the great majorityof these genes had orthologues present in M. marinum,M. avium and M. smegmatis (Table 1). Most of the geneswhich were not conserved among these species belong tothe PE/PPE families, suggesting that extensive variationin number and sequence of these genes exists among themycobacteria. M. marinum, one of the closest relativesof M. tuberculosis, as determined by 16S rRNA analysis(Springer et al., 1996), was missing only nine of the 219conserved genes, whereas M. avium lacked 20, includingthe genes of the RD1 region, which are also absent from M.bovis BCG andM. microti. In the fast-growerM. smegmatis,18 of the conserved genes did not have a counterpart.

Macro-array analysis of the M. tuberculosiscomplex

The focused macro-array contained 500 gene fragmentschosen according to criteria outlined above. To test thespecificity of the array, hybridizations were undertaken withlabelled plasmid DNA from a bacterial artificial chromo-some (BAC) clone (Mi10C12) containing a 100 kb fragmentfrom M. microti OV254 corresponding to genes Rv3802to Rv3884 in M. tuberculosis H37Rv. Analysis of the hybri-dization signals quantified by using a phosphorimager andthe ARRAY VISION software and generic file managementprograms allowed a large dataset to be established (Table 2).Absence of genes was further confirmed by PCR analysisand, where possible, by in silico comparison with finishedor unfinished genome sequences. As expected, most of the219 genes conserved between M. tuberculosis and M. lepraehybridized with the set of strains from the M. tuberculosiscomplex. Among the 102 genes presumably implicated in

cell-wall structure and cell processes, three genes embCAB(Rv3793–94–95) code for arabinosyltransferases, which aretargets for ethambutol, a front-line anti-tuberculosis drug(Belanger et al., 1996; Telenti et al., 1997). Apart from resultsof bioinformatic analyses, confirming the presence of thethree genes in M. avium, M. marinum and M. smegmatis,these three genes were also identified by the macro-arrayhybridizations as being present among all tested strains ofthe M. tuberculosis complex. This finding is encouragingand suggests that, among the other conserved mycobacterialgenes of this group, new drug targets may also be discovered.Only for seven of the 219 genes was variability detected inthe genomes of the tested strains from the M. tuberculosiscomplex and most of these genes are situated in RD regions(Table 2). Interestingly, several of these genes belong to theESAT-6 family, which has 23members on theM. tuberculosisH37Rv chromosome at 11 distinct sites (Cole et al., 1998;Tekaia et al., 1999). Some of these genes were located inpreviously defined regions of difference, such as RD1, RD5and RD8 (Brosch et al., 2002; Gordon et al., 1999; Tekaiaet al., 1999), whereas others such as esxR (Rv3019c) and esxS(Rv3020c) were identified in this study for the first time asbeing absent from several strains of the M. tuberculosiscomplex.

In silico and macro-array analyses of the RD1 region showedthat this region is of particular interest because the genecontent, as well as the gene order, at this locus is highlyconserved among several, sometimes rather distant, myco-bacterial species such as M. tuberculosis, M. marinum,M. leprae and M. smegmatis (Fig. 2), whereas portionsof this region were found to be absent from M. avium(Gey Van Pittius et al., 2001),M. bovis BCG (Mahairas et al.,1996) andM. microti. In fact, the hybridization experimentspresented here (Fig. 2), using DNA from BAC cloneMiBAC10C12 (from 4252?3 to 4367?9 kb relative to M.tuberculosis H37Rv), as well as genomic DNAs from two

Fig. 1. Classification of 219 orthologousgenes of M. tuberculosis H37Rv and M.

leprae which did not show appreciablehomology to genes of other organisms inpublic databases.

486 Microbiology 150

M. Marmiesse and others

Page 5: Macro-array and bioinformatic analyses reveal ... · PDF fileproposed evolutionary scenario of the M. tuberculosis com-plextobetestedfurtherandconsolidated(Brosch etal.,2002). METHODS

Table 1. Conservation of ‘core’ mycobacterial genes in silico

Presence or absence of the 219 orthologous genes of M. tuberculosis H37Rv (Mt) and M. leprae (Ml) in the mycobacterial species M. avium (Ma), M. marinum (Mm) and M. smegmatis

(Ms). Genes that were found to share >40% amino acid identity over >70% of the predicted protein were considered as orthologous genes and their presence is represented on the

table by the sign +, whereas absence of the gene is indicated by the sign 2. Genes of the PE/PPE families are boxed.

Gene Mt Ml Mm Ma Ms Gene Mt Ml Mm Ma Ms Gene Mt Ml Mm Ma Ms Gene Mt Ml Mm Ma Ms Gene Mt Ml Mm Ma Ms

Rv0007 + + + + + Rv0455c + + + + + Rv1324 + + + + + Rv2185c + + + + + Rv3281 + + + + +

Rv0010c + + + + + Rv0463 + + + + + Rv1342c + + + + + Rv2186c + + + + + Rv3311 + + + + +

Rv0011c + + + + + Rv0464c + + + + + Rv1343c + + + + + Rv2197c + + + + + Rv3354 + + + + +

Rv0020c + + + + ” Rv0475 + + + + + Rv1361c + + ” ” ” Rv2198c + + + + + Rv3412 + + + + +

Rv0039c + + + + + Rv0477 + + + + + Rv1386 + + + ” ” Rv2203 + + + + + Rv3415c + + + + +

Rv0040c + + + + + Rv0479c + + + + + Rv1387 + + ” ” ” Rv2229c + + + + + Rv3438 + + + + +

Rv0049 + + + + + Rv0497 + + + + + Rv1388 + + + + + Rv2235 + + + + + Rv3584 + + + + +

Rv0093c + + + + + Rv0531 + + + + + Rv1411c + + + + + Rv2342 + + + + + Rv3587c + + + + +

Rv0096 + + + ” ” Rv0543c + + + + + Rv1476 + + + + + Rv2347c + + + + ” Rv3604c + + + + +

Rv0098 + + + + ” Rv0544c + + + + + Rv1590 + + + + + Rv2376c + + + + + Rv3605c + + + + +

Rv0100 + + + ” ” Rv0556 + + + + + Rv1591 + + + + + Rv2446c + + + + + Rv3614c + + + ” +

Rv0164 + + + + + Rv0559c + + + + + Rv1610 + + + + + Rv2468c + + + + + Rv3615c + + + ” +

Rv0169 + + + + + Rv0634A + + + + + Rv1635c + + + + ” Rv2484c + + + + + Rv3616c + + + ” +

Rv0175 + + + + + Rv0779c + + + + + Rv1693 + + + + ” Rv2507 + + + + + Rv3619c + + + + ”

Rv0176 + + + + + Rv0817c + + + + + Rv1697 + + + + + Rv2520c + + + + + Rv3632 + + + + +

Rv0177 + + + + + Rv0863 + + + + + Rv1698 + + + + + Rv2525c + + + + + Rv3647c + + + + +

Rv0178 + + + + + Rv0875c + + + + + Rv1754c + + + + + Rv2536 + + + + + Rv3705c + + + + +

Rv0184 + + + + + Rv0879c + + + + + Rv1782 + + + + + Rv2695 + + + + + Rv3723 + + + + +

Rv0185 + + + + + Rv0910 + + + + + Rv1794 + + + + + Rv2700 + + + + + Rv3753c + + + + +

Rv0199 + + + + + Rv0912 + + + + + Rv1797 + + + + ” Rv2709 + + + + + Rv3779 + + + + +

Rv0200 + + + + + Rv0954 + + + + + Rv1836c + + + + + Rv2719c + + + + + Rv3792 + + + + +

Rv0201c + + + + + Rv0955 + + + + + Rv1861 + + + + + Rv2722 + + + + + Rv3793 + + + + +

Rv0207c + + + + + Rv0996 + + + + + Rv1891 + + + + + Rv2732c + + + + + Rv3794 + + + + +

Rv0216 + + + + + Rv0998 + + + + + Rv1893 + + + + ” Rv2740 + + + + + Rv3795 + + + + +

Rv0226c + + + + + Rv1016c + + + + + Rv1906c + + + + + Rv2772c + + + + + Rv3802c + + + + +

Rv0227c + + + + + Rv1024 + + + + + Rv2050 + + + + + Rv2843 + + + + + Rv3805c + + + + +

Rv0236A + + + + + Rv1081c + + + + + Rv2061c + + + + + Rv2876 + + + + + Rv3807c + + + + +

Rv0236c + + + + + Rv1083 + + + + + Rv2091c + + + + + Rv2898c + + + + + Rv3808c + + + + +

Rv0250c + + + + + Rv1100 + + + + + Rv2107 + + ” ” + Rv2901c + + + + + Rv3810 + + ” + +

Rv0256c + + + + ” Rv1109c + + + + + Rv2108 + + ” ” ” Rv2945c + + + ” ” Rv3821 + + ” + +

Rv0257 + + ” + ” Rv1111c + + + + + Rv2116 + + + + + Rv2949c + + + ” ” Rv3835 + + + + +

Rv0283 + + + + + Rv1157c + + + + + Rv2133c + + + + + Rv2980 + + + + + Rv3843c + + + + +

Rv0285 + + + + + Rv1158c + + + + + Rv2134c + + + + + Rv3013 + + + + + Rv3847 + + + + +

Rv0288 + + + + + Rv1184c + + ” + + Rv2137c + + + + + Rv3020c + + ” + + Rv3849 + + + + +

Rv0289 + + + + + Rv1209 + + + + + Rv2144c + + + + + Rv3035 + + + + + Rv3850 + + + + +

http://mic.sg

mjournals.org

48

7

Analyses

ofthe

M.tub

erculosis

complex

Page 6: Macro-array and bioinformatic analyses reveal ... · PDF fileproposed evolutionary scenario of the M. tuberculosis com-plextobetestedfurtherandconsolidated(Brosch etal.,2002). METHODS

M. microti strains, have contributed to the identification ofthe RD1mic deleted region, a segment of 14 kb comprisinggenes Rv3864–Rv3876 that was deleted from the genome ofM. microti strains (Brodin et al., 2002). The ESAT-6 family,genes esxO–esxP (Rv2346c–47c) located in the RD5 region,as well as esxV–esxW (Rv3619c–20c) from the RD8 region,were absent from all tested M. bovis and M. bovis BCGstrains (Table 2), whereas M. microti lacks the genes inthe RD8 region but has the ones from RD5 esxO–esxP(Rv2346c–47c) present. For esxR–esxS (Rv3019c–20c), wefound that three M. tuberculosis strains (950530, 950531,950532) and four M. microti strains (OV254, ATCC 35782,005004 and 94/2272) did not harbour these genes, whereasthey were present in all other tested strains. PCR ampli-fication experiments using sequences flanking esxR–esxSshowed that parts of the neighbouring genes PPE46(Rv3018c) and PPE47 (Rv3021c) were also absent from thestrains lacking esxR–esxS. Inspection of the sequence of thisregion in M. tuberculosis H37Rv showed that in this seg-ment numerous genes contain highly repetitive sequences[PPE46 (Rv3018c), PE27A (Rv3018A), PPE47 (Rv3021c),PPE48 (Rv3022c), PE29 (Rv3022A)]. It appears that inM. microti and the three M. tuberculosis strains the dele-tion of esxR–esxS was mediated independently by recom-bination between the highly similar PPE genes PPE46(Rv3018c) and PPE47 (Rv3021c) which share stretchesof 364 and 408 bp of identical sequences, removing a2?4 kb fragment (Fig. 3). This study clearly shows that thevariability is greater among the members of the ESAT-6family than for the other conserved mycobacteria-specificgenes. The strong immunogenic character of the ESAT-6proteins during host–pathogen interactions may be relatedto this greater variability.

Identification and description of deleted regions

In contrast to the conserved genes between M. tuberculosisand M. leprae, the genes from the known RD regionsshowed much more variability in the 40 isolates of the M.tuberculosis complex, confirming the finding that certainlineages of strains (e.g. M. bovis) have successively lostgenetic material during evolution. In agreement with pre-viously published studies (Brosch et al., 2001, 2002; Niobe-Eyangoh et al., 2003), the RD9 region was absent from thetested M. africanum, M. microti and M. bovis strains,whereas it was present in M. tuberculosis and M. canettiistrains. It represents a key element that defines one evolu-tionary lineage within the M. tuberculosis complex that hasseparated from the M. tuberculosis lineage and comprisesM. africanum, M. microti and M. bovis. Within this lineage,the M. bovis BCG substrains tested, Birkhaug and Merieux,showed the greatest number of deleted regions, followed byM. bovis strains and M. microti (Table 2). Some deletionswere found to be characteristic for certain subspecies, forexample, the RD1mic deletion forM. microti strains (Brodinet al., 2002).

Similarly, we identified a specific deletion, RD2seal, forstrains which were isolated from infected seals in differentT

able

1.

cont

.

Gene

Mt

Ml

Mm

Ma

Ms

Gene

Mt

Ml

Mm

Ma

Ms

Gene

Mt

Ml

Mm

Ma

Ms

Gene

Mt

Ml

Mm

Ma

Ms

Gene

Mt

Ml

Mm

Ma

Ms

Rv0292

++

++

+Rv1222

++

++

+Rv2147c

++

++

+Rv3205c

++

++

+Rv3867

++

+”

+

Rv0313

++

++

+Rv1224

++

++

+Rv2164c

++

++

+Rv3209

++

++

+Rv3869

++

+”

+

Rv0356c

++

++

+Rv1249c

++

++

+Rv2169c

++

++

+Rv3212

++

++

+Rv3873

++

+”

+

Rv0358

++

++

+Rv1251c

++

++

+Rv2170

++

++

+Rv3217c

++

++

+Rv3874

++

+”

+

Rv0361

++

++

+Rv1252c

++

++

+Rv2171

++

++

+Rv3256c

++

++

+Rv3875

++

+”

+

Rv0383c

++

++

+Rv1274

++

++

+Rv2172c

++

++

+Rv3259

++

++

+Rv3877

++

+”

+

Rv0401

++

++

+Rv1275

++

++

+Rv2175c

++

++

+Rv3269

++

++

+Rv3880c

++

+”

+

Rv0416

++

++

+Rv1278

++

++

+Rv2179c

++

++

+Rv3277

++

++

+Rv3882c

++

+”

+

Rv0451c

++

++

+Rv1303

++

++

+Rv2183c

++

++

+Rv3278c

++

++

+

488 Microbiology 150

M. Marmiesse and others

Page 7: Macro-array and bioinformatic analyses reveal ... · PDF fileproposed evolutionary scenario of the M. tuberculosis com-plextobetestedfurtherandconsolidated(Brosch etal.,2002). METHODS

Table 2. Presence or absence of the tested genes in 40 strains of the M. tuberculosis complex

Presence of an RD region is indicated by the sign +, whereas absence of the region is indicated by the sign 2. Note that for regions RD1, RD2 and RD12 in some strains junction

sequences are not identical, as outlined in the results section.

Hybridized strains Localization Host Location

of isolation

RD Other deleted genes

TbD1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

M. tuberculosis H37Rv Human USA ” + + + + + + + + + + + + + + + +

M. tuberculosis H37Ra Human USA ” + + + + + + + + + + + + + + + +

M. tuberculosis 950507 Lymphadenitis Human Morocco ” + + ” + + + + + + + + + + + + + Rv1190

M. tuberculosis 950530 Lymphadenitis Human France ” + + + + + + + + + + + + + + + + Rv3019c, Rv3020c

M. tuberculosis 950531 Pleural Human France (DOM) ” + + + + + ” + + + + + + + + + + Rv3019c, Rv3020c

M. tuberculosis 950532 Pleural Human France (DOM) ” + + + + + + + + + + + + + + + + Rv3019c, Rv3020c

M. tuberculosis 950533 Lung Human Morocco ” + + ” + + + + + + + + + + + + +

M. tuberculosis 950824 Lung Human France + + + ” + + + + + + + + + + + + +

M. tuberculosis 950534 Lung+digestive Human Spain ” + + ” + + + + + + + + + + + + +

M. tuberculosis 950983 Bone Human Mali ” + + + + ” ” + + + + + + + + + +

M. tuberculosis 950944 Pleural Human France ” + + + + + ” + + + + ” + + + + +

M. tuberculosis 950780 Lung Human France ” + + + + + ” + + + + + + + + + +

M. tuberculosis 951015 Lung Human France ” + + ” + + + + + + + + + + + + +

M. tuberculosis 22 Human Cameroon + + + ” + ” + + + + + + + + + + + Rv1762c

M. tuberculosis 79 Human Cameroon ” + + ” + + + + + + + + + + + + +

M. tuberculosis Beijing 5 Human Mongolia ” + + ” + + + + + + + + + + + + + RvD2, RvD3

M. tuberculosis Beijing 10 Human China ” + + ” + + + + + + + + + + + + + RvD2, RvD3

M. tuberculosis Beijing 83 Human South Africa ” + + ” + + + + + + + + + + + + + RvD2, RvD3

M. bovis AF2122/97 Cattle UK + + + + ” ” ” ” ” ” ” ” ” ” + + +

M. bovis MDR Lung Human Spain + + + + ” ” ” ” ” ” ” ” ” ” + + + Rv3470c, Rv3479, Rv3888c

M. bovis 6 Lymphadenitis Cattle France + + + + ” ” ” ” ” ” ” ” ” ” + + + Rv3479

M. bovis 38 Cat France + + + + ” ” ” ” ” ” ” ” ” ” + + + Rv3470c, Rv3479, Rv3888c

M. bovis 951363 Lung Human + + + + ” ” ” ” ” ” ” ” ” ” + + + Rv3479, Rv3888c

M. bovis 951366 Lung Human Algeria + + + + ” ” ” ” ” ” ” ” ” ” + + + Rv3479

M. bovis 13 Lymphadenitis Cattle + + + ” ” ” ” ” ” ” ” ” ” ” + + + Rv3479

M. bovis 15 Lymphadenitis Cattle France 47 + + + + ” ” ” ” ” ” ” ” ” ” + + + Rv3739c

M. bovis 16 Lymphadenitis Cattle France 38 + + + ” ” ” ” ” ” ” ” ” ” ” + + +

M. bovis 950643 Otitis Human France 38 + + + + ” ” ” ” ” ” ” ” ” ” + + +

M. bovis BCG Merieux France 51 + ” + ” ” ” ” ” ” ” ” ” ” ” + + +

M. bovis BCG Birkhaug + + + ” ” ” ” ” ” ” ” ” ” ” + + +

Seal isolate IP40 Lung Sea lion France + + ” ” + + + ” ” ” ” + + + + + + Rv2428

M. microti ATCC 35782 Lung + ” + ” + + + ” ” ” ” + + + + + + Rv1754c, Rv3019c, Rv3020c

M. microti 94/2272 Peritoneal Human Holland + ” + ” + + + ” ” ” ” + + + + + + Rv1754c, Rv3019c, Rv3020c

M. microti 005004 Cat Holland + ” + ” + + + ” ” ” ” ” + + + + + Rv3019c, Rv3020c

M. microti OV254 Vole UK + ” + ” + +2 + ” ” ” ” + + + + + + Rv1754c, Rv3019c, Rv3020c

http://mic.sg

mjournals.org

48

9

Analyses

ofthe

M.tub

erculosis

complex

Page 8: Macro-array and bioinformatic analyses reveal ... · PDF fileproposed evolutionary scenario of the M. tuberculosis com-plextobetestedfurtherandconsolidated(Brosch etal.,2002). METHODS

parts of the world. Hybridization results suggested thatgenes Rv1978 and Rv1979 were absent from the seal isolates.Sequence analysis of the junction region in the four testedseal isolates confirmed this finding and showed that inthese strains a 1941 bp deletion has removed parts of genesRv1978 and Rv1979 (Fig. 4a). We named this region RD2seal

as it overlaps the 10?7 kb RD2 region, which is missing fromsome but not all BCG substrains (Mahairas et al., 1996; Behret al., 1999; Gordon et al., 1999). In addition, strains isolatedfrom seals were deleted for regions RD7, RD8, RD9 andRD10, whereas regions RD4, RD5, RD6, RD11, RD12 andRD13, usually missing from M. bovis (Brosch et al., 2002;Mostowy et al., 2002), were present. As the RD2seal junctionregions in the four seal isolates were identical, but differentfrom the RD2 deletion of BCG strains, it appears thatthis deletion is a specific evolutionary marker for strainsprevalent in seals and sea lions (Fig. 4d).

The selection of strains used in this study also includedtwo isolates of M. canettii, which have been proposed torepresent the most distant phylogenetic variant presentlyknown within the M. tuberculosis complex (Brosch et al.,2002; Gutacker et al., 2002). We were particularly interestedin the unusually low copy number of IS1081 (1 copy) inthese strains, as all other members of the M. tuberculosiscomplex harbour 5–6 copies and display very homogeneousIS1081 RFLP (van Soolingen et al., 1997). In this respect,one key question was whether the different copy numberof IS1081 in the M. canettii strains is due to a low rate ofIS1081 transposition in this group of strains or if IS1081copies may have been deleted from the genome. Bioinfor-matic comparisons showed that these sites were conservedin all completely or partially sequenced strains fromthe M. tuberculosis complex (i.e. M. tuberculosis strainsH37Rv, CDC1551, Beijing 210, M. microti OV254, M. bovisAF2122/97 and M. bovis BCG) and that the genes flankingIS1081 copies were present in these strains. Furthermore,more distantly related mycobacteria, such as M. avium,M. marinum or M. smegmatis, possess orthologues of theseflanking genes, without harbouring IS1081 insertion ele-ments (data not shown). In contrast, hybridization resultsobtained with genomic DNAs from the two M. canettiistrains showed that these strains lacked most of the genesthat flank the IS1081 copies in the other members ofthe M. tuberculosis complex (Table 3), suggesting that inM. canettii strains the number of IS1081 copies is lowbecause of deletion events in this lineage. As an example,Fig. 4(b) shows the genomic region containing genesRv3113–Rv3124 from M. tuberculosis H37Rv, whereas inM. canettii 14000059 a 12 436 bp deletion was observedthat truncated genes Rv3111 (moaD) at position 3 491 865and Rv3127 at position 3 479 429 relative to M. tuberculosisH37Rv, removing the intervening genomic region thatcarries a copy of IS1081. Interestingly, this deletion in M.canettii partially overlaps deleted region RD12 from M.bovis strains, which is 2?4 kb in size and has not removedthe IS1081 copy (Fig. 4b). As for other copies of IS1081 thatare missing from M. canettii 14000059 and 990161, theT

able

2.

cont

.

Hyb

ridized

strains

Localization

Host

Location

ofisolation

RD

Other

deleted

genes

TbD1

12

34

56

78

910

11

12

13

14

15

16

M.canettii14000059

Lung

Human

France

++

+”

++

”+

++

+”

”+

++

+Rv0163,

Rv0602c,Rv1045,

Rv2088,

Rv2520c,Rv2623,Rv3190c

M.canettii990161

Abscess

Human

Djibouti

++

+”

++

”+

++

+”

”+

++

+Rv1045,

Rv1048c,Rv2665

M.canettii-like990263

Abscess

Human

Djibouti

++

+”

++

++

++

””

++

++

M.africanum

001

Lung

Human

Senegal

++

+”

++

+”

””

””

++

++

+

M.africanum

940946

Lung

Human

Madagascar

++

+”

++

+”

””

””

++

++

+

490 Microbiology 150

M. Marmiesse and others

Page 9: Macro-array and bioinformatic analyses reveal ... · PDF fileproposed evolutionary scenario of the M. tuberculosis com-plextobetestedfurtherandconsolidated(Brosch etal.,2002). METHODS

absence of many flanking genes suggests that they may havebeen removed by deletion events as well (Table 3).

In the selection of strains were three belonging to theM. tuberculosis Beijing type, showing the characteristicspoligotype and IS6110 insertion in the dnaA–N region.For these strains, macro-array results suggested that they all

lacked regions RvD2 and RvD3, which are also absent fromM. tuberculosis H37Rv, as shown previously (Gordon et al.,1999). However, for the Beijing strains, a single deletionof 15?5 kb relative to M. bovis AF2122/97 was detected thathas removed both RvD regions, situated next to each otheron the chromosome, apparently by homologous recom-bination between two copies of IS6110. This observation is

Fig. 2. (a) Macro-array hybridization results from genes in the RD1 region. (b) Representation of the genes (shown by arrows)and gene order of the RD1 region in M. marinum, M. leprae, M. tuberculosis and M. smegmatis. The genomic position of thelocus relative to the M. tuberculosis H37Rv and M. leprae genomes, as well as percentage amino acid identities among thevarious predicted proteins, are shown. Gene identification and annotation were performed using the ARTEMIS software(Rutherford et al., 2000).

lpqA

esxQ

PPE46

PE27A

esxR

esxS

PPE47

PPE48

PE29

Rv3023c

esxQ

PPE48

PE29

Rv3023c

lpqA

M. tuberculosis H37Rv

M. tuberculosis950530950531950532

M. microtiOV254ATCC 3578294/2272

∆2.4 kb

005004

Fig. 3. Representation of the genes (shownby arrows) and gene order of the genomicregion containing the ESAT-6 familymembers esxR–esxS (Rv3019c–20c), absentfrom several M. tuberculosis strains and M.

microti strains. The sequence of the junctionregion was deposited in the EMBL databaseunder accession number AJ583832 andrelates to AJ550619. Note that genesPPE46 and PPE47 share large portions ofidentical nucleotide sequences.

http://mic.sgmjournals.org 491

Analyses of the M. tuberculosis complex

Page 10: Macro-array and bioinformatic analyses reveal ... · PDF fileproposed evolutionary scenario of the M. tuberculosis com-plextobetestedfurtherandconsolidated(Brosch etal.,2002). METHODS

Fig. 4. (a) Representation of the genes (shown by arrows) and gene order of the RD2 region in M. tuberculosis, BCG andseal isolates (accession no. AJ583834). Deleted regions are delineated by red and blue lanes. (b) Representation of thegenes (represented by arrows) and gene order of the RD12 region in M. tuberculosis, M. bovis and M. canettii (accessionno. AJ583833). Deleted regions are delineated by purple and brown lanes. The positions of the locus relative to theM. tuberculosis H37Rv genome are shown. (c) Representation of the nucleotide polymorphism observed in the pks15/1 genein different members of the M. tuberculosis complex, which are specified in the evolutionary scheme shown in (d). (d) Refinedevolutionary scheme after Brosch et al. (2002), showing presence or absence of conserved RD regions in members ofthe M. tuberculosis complex, as well as the pks15/1 polymorphism. Red arrows indicate that all tested strains belonging tothe phylogenetic groups 2 and 3 (Sreevatsan et al., 1997) showed a 7 bp deletion in their pks15/1 gene, whereas alltested strains from M. africanum, M. microti and M. bovis that had RD7–RD10 deleted were characterized by a 6 bp deletion(green arrows).

492 Microbiology 150

M. Marmiesse and others

Page 11: Macro-array and bioinformatic analyses reveal ... · PDF fileproposed evolutionary scenario of the M. tuberculosis com-plextobetestedfurtherandconsolidated(Brosch etal.,2002). METHODS

in agreement with the findings of Ho and colleagues, whoshowed that deletions in the RvD2 region can be as large as20 kb (Ho et al., 2000).

Micro-deletions in gene pks15/1

In a recent study, Guilhot and colleagues showed that severalwell characterized M. tuberculosis strains (H37Rv, Erdman,CDC1551 and MT106) lack a particular phenolglycolipid(PGL) that is produced by M. tuberculosis 210 (Beijingtype) and M. canettii 14000059, and they linked thisobservation to a deletion of 7 bp that introduces a frame-shift in a gene encoding a polyketide synthase (pks15/1) inthese strains (Constant et al., 2002). Interestingly, inM. bovisAF2122/97 andM. bovis BCG, a 6 bp deletion was observedat the same locus of the pks15/1 gene. As M. bovis andM. bovis BCG both produce PGL, it seems likely that this6 bp deletion, which does not cause a frameshift in thepks15/1 gene, does not influence the enzymic activity of theresulting gene product for the synthesis of the particularPGL (Constant et al., 2002). However, as this interestingpolymorphism has direct phenotypic consequences, we

were interested to determine at what stage of the phylo-genetic diversification the deletion of 7 or 6 bp occurred.Therefore, we sequenced PCR products from the poly-morphic locus in the pks15/1 gene (Fig. 4c) in 15 strainsfrom the present study and in 21 additional strains usedpreviously (Brosch et al., 2002). The results of this approachshowed that the 7 bp deletion only occurred in a particularsubgroup of M. tuberculosis strains that show the katG463

mutation CGG and, according to the nomenclature ofSreevatsan and colleagues, belong to genetic group 2 or 3(Sreevatsan et al., 1997). All these strains had region TbD1deleted and also lacked spacers 33–36 in their spoligotype,which is a characteristic feature of these genetic groups(Brosch et al., 2002; Soini et al., 2000). In contrast, noM. tuberculosis strains of genetic group 1 (katG463 CTG),including strains of the ancestral type that have the TbD1region present as well as strains of the Beijing type cluster,which lack the TbD1 region (Brosch et al., 2002), showed adeletion in the polymorphic locus of the pks15/1 gene.

As for the 6 bp deletion previously observed for M. bovisAF2122/97 and M. bovis BCG (Constant et al., 2002), inthe present study we found the same 6 bp deletion in thepks15/1 gene of all tested M. bovis strains, seal isolates,M. microti and M. africanum lacking regions RD7–RD10.Only M. africanum strains that lack the RD9 region buthave retained regions RD7, RD8 and RD10 did not showthe 6 bp deletion, suggesting that it occurred after the RD9deletion, at about the same period as deletion of regionsRD7, RD8 and RD10 occurred in the M. africanumRM.bovis lineage (Fig. 4d). These findings fit well with the pro-posed evolutionary scenario of the M. tuberculosis complex(Brosch et al., 2002) and suggest that two independentdeletion events have occurred in the pks15/1 gene in twodistinct branches of the phylogenetic tree of the M.tuberculosis complex. The 7 bp deletion, which inactivatedthe pks15/1 gene, occurred in the branch of TbD1-deleted‘modern’ M. tuberculosis strains at about the same time-range as the katG463 mutation (CTGRCGG), whereas the6 bp deletion occurred after the RD9 and before the RD10deletion in theM. africanumRM. bovis lineage. Consideringa clonal structure (Supply et al., 2003; Fleischmann et al.,2002) of the M. tuberculosis complex, it seems that this6 bp deletion in gene pks15/1 was then inherited by theother members of this branch, and can therefore be foundin M. microti, seal isolates, M. bovis and BCG strains(Fig. 4d).

DISCUSSION

In this study, we evaluated the extent of genetic variabilityamong mycobacteria and in particular those belongingto the M. tuberculosis complex, by using bioinformaticcomparisons of newly available mycobacterial sequencestogether with the macro-array technology that allows thesimultaneous screening of many more genes than is possiblewith previously used PCR-based strategies. In this perspec-tive, it was particularly interesting to determine if genes

Table 3. Presence or absence of IS1081 flanking genes insome strains from the M. tuberculosis complex

In M. tuberculosis H37Rv, ORFs Rv1047, Rv1199, Rv2512, Rv2666,

Rv3023 and Rv3115 code for IS1081 transposases. Strain/species: 1,

M. tuberculosis CDC1551; 2, M. tuberculosis strain 210; 3, M.

microti; 4, M. bovis; 5, M. bovis BCG; 6, M. canettii.

Gene 1 2 3 4 5 6

Rv1045 + + + + + 2

Rv1046 + + + + + 2

Rv1048 + + + + + 2

Rv1049 + + + + + 2

Rv1197 + + + + + ND

Rv1198 + + + + + ND

Rv1200 + + + + + +

Rv1201 + + + + + ND

Rv2510 + + + + + +

Rv2511 + + + + ND +

Rv2513 + + + + + +

Rv2514 + + + + + +

Rv2664 + + + + + 2

Rv2665 + + + + + 2

Rv2667 + + + + + +

Rv2668 + + + + ND +

Rv3021c + ND ND + + +

Rv3022c PPE48 + + 2 + ND +

Rv3022A PE29 + + 2 + + +

Rv3024A + + + + + +

Rv3025 + + + + + +

Rv3113 + + + + + 2

Rv3114 + ND + + + 2

Rv3116 + + + + + 2

Rv3117 + + + ND + 2

ND, Not determined.

http://mic.sgmjournals.org 493

Analyses of the M. tuberculosis complex

Page 12: Macro-array and bioinformatic analyses reveal ... · PDF fileproposed evolutionary scenario of the M. tuberculosis com-plextobetestedfurtherandconsolidated(Brosch etal.,2002). METHODS

that are conserved between the two major mycobacterialpathogens M. tuberculosis and M. leprae were present in allM. tuberculosis complex members and other more distantlyrelated mycobacterial species. As shown in Results, thecombined approach of bioinformatic analyses, macro-arrayhybridizations and sequencing of selected genes has resultedin a more-refined picture of the M. tuberculosis complex,indicating that within the M. tuberculosis complex andother tested mycobacteria, a very high degree of conserva-tion exists for the genes shared by M. tuberculosis H37Rvand M. leprae.

Among the few exceptions, members of the ESAT-6 familywere most prominent. The members of this family arecharacterized by a small size (~100 aa) (Cole et al., 1998),common amino acid motifs (Tekaia et al., 1999), and severalof them are organized in genomic loci with similar organi-zation, suggesting that the neighbouring genes may havesome function in the transport of these proteins out of thebacterial cell (Cole et al., 1998; Tekaia et al., 1999; Pallen,2002). The first experimental proof for this hypothesis wasrecently obtained for the RD1 region of M. tuberculosis(Fig. 2), which is absent from BCG and M. microti (Pymet al., 2003). In the same study, it was shown that recom-binant vaccine strains that appropriately exported ESAT-6and CFP10 induced better protection against tuberculosisin animal models. This finding may be linked to the highlyimmunogenic character that was demonstrated in severalstudies for ESAT-6 and other members of this family(Skjot et al., 2002). Indeed, most ESAT-6 proteins, deletedfrom one or more strains as identified in the present study,are strongly recognized by the immune system of the host(Skjot et al., 2002). Furthermore, two additional membersof the ESAT-6 family (Rv3809c, Rv3905c) are reported to bealtered in the sequencedM. bovis AF2122/97 strain (Garnieret al., 2003). Taken together, it seems plausible that varia-tion of ESAT-6 family proteins in strains of M. tuberculosisand/or members of the M. tuberculosis complex couldcontribute to antigenic variation, eventually helping thebacteria to escape immune recognition by the host. Toelucidate the biological function of this protein family,further studies are necessary. The finding that the RD1region is highly conserved in gene content and gene orderin several pathogenic and non-pathogenic mycobacterialspecies (Fig. 2) suggests that ESAT-6 systems may play afundamental role in survival in specific environments. Thisknowledge, together with appropriate cosmid and BAClibraries from these species (Brosch et al., 1998), shouldenable now very focused studies on the role of these proteinsin the various mycobacteria.

In the tight-knit M. tuberculosis complex, where singlenucleotide substitutions do not seem to be a substantialsource of genetic diversity between strains, the presence orabsence of certain regions of difference may play importantroles in the varying phenotypes, host range and virulenceof these bacteria (Pym et al., 2002; Lewis et al., 2003).Analyses of these RD regions in well-defined strains from

the M. tuberculosis complex have allowed us to describedistinct phylogenetic lineages within the M. tuberculosiscomplex (Brosch et al., 2002) that have evolved from acommon ancestor. In this study, we describe RD regionsthat are characteristic for certain subpopulations of theM. tuberculosis complex. One of the regions (RD2seal) isrestricted to strains that were isolated from seals. In thepast, seals have been described to be susceptible to tuber-culosis, but it was not always clear if the infections in sealswere caused by M. tuberculosis and/or M. bovis (Zumarragaet al., 1999). However, by the use of macro-arrays andsequencing strategies we show here that the tested strains,which were isolated from seals in different geographicalregions (Argentina, France), lack RD7, RD8, RD9 andRD10 and the particular region RD2seal that seems to bespecific for tubercle bacilli hosted by seals. The analysis ofall available genetic markers (RDs, mmpL6 polymorphism,pks15/1 polymorphism and spoligotype) showed that theseal isolates are phylogenetically more closely related toM. bovis than to M. tuberculosis. Their position in theestablished evolutionary scheme (Fig. 4d) is somewhereclose to M. microti, which also lacks RD7–RD10, shares themmpL6 codon 551 single nucleotide polymorphism (SNP)of M. bovis (AAG) and presents a particular deletion(RD1mic) that is restricted to this subspecies. The positionof the seal isolates in the phylogenetic scheme of themembers of theM. tuberculosis complex shown in Fig. 4(d)is in good agreement with a recent SNP analysis by Musserand colleagues (Gutacker et al., 2002), who also placedthese isolates as intermediate between M. tuberculosis andM. bovis. In a very recent study, the seal isolates wereconsidered as sufficiently distant from M. bovis and M.tuberculosis to place them in a separate subspecies of theM. tuberculosis complex (Cousins et al., 2003). In thisrespect, the marker RD2seal is a valuable tool for the rapididentification of such strains.

The analysis of the sequence polymorphism in the pks15/1gene, which abolishes production of a particular phenolgly-colipid in a large group of M. tuberculosis strains (Constantet al., 2002), showed excellent agreement of the observedpolymorphism with all other evolutionary markers avail-able and confirmed the phylogenetic position of the strainsused in this study (Fig. 4c, d). These results suggest that the6 bp deletion in the pks15/1 gene in the M. africanumRM.bovis lineage arose independently from the 7 bp deletionobserved for M. tuberculosis strains of Sreevatsan’s group 2and 3. Closer inspection of the flanking sequences of thispolymorphic locus (Fig. 4c) in the pks15/1 gene showed thatthis genomic region is very GC-rich. In genes that code forPE and PPE proteins, such GC-rich regions have previouslybeen associated with increased sequence polymorphismbetween strains (Cole et al., 1998; Banu et al., 2002). Inter-estingly, the pks15/1 sequence polymorphism is not theonly example where independent deletion events haveoccurred in different evolutionary lineages of the tuberclebacilli in the same genomic regions. Other examples arethe RD1 region of BCG (9?7 kb) andM. microti (14 kb), the

494 Microbiology 150

M. Marmiesse and others

Page 13: Macro-array and bioinformatic analyses reveal ... · PDF fileproposed evolutionary scenario of the M. tuberculosis com-plextobetestedfurtherandconsolidated(Brosch etal.,2002). METHODS

RD2 region of BCG (10?7 kb) and seal strains (2 kb), or theRD12 region inM. bovis (2?7 kb) andM. canettii (12?4 kb).The size of the deletions, as well as the junction sequencesof these regions, are clearly distinct from each other,indicating that no direct phylogenetic relationship existsbetween them. This observation raises an important pointfor the interpretation of micro- and macro-array data andimplies that sequencing of the junction regions of therebyidentified deleted regions (Fig. 4a, b) is necessary beforethe presence/absence of these marker genes can be used inthe construction of evolutionary schemes. From a practicalpoint of view, the pks15/1 polymorphism may serve asan important additional marker for the identification andclassification of members of the M. tuberculosis complex,as well as for the characterization of mycobacterial DNAsamplified from mummified human remains. Recent studieshave shown that, according to their spoligotype and theirkatG463 SNP, in former human populations M. tuberculosisstrains were present that resembled TbD1-deleted M.tuberculosis strains of Sreevatsan’s genetic group 2 and 3(Zink et al., 2003; Fletcher et al., 2003). As shown in thepresent study, it appears that a strict correlation existsbetween these characteristics and the frameshift mutation(deletion of 7 bp) in the pks15/1 gene.

The situation of mycobacterial research has considerablychanged in the last few years due to the informationcontained in the whole-genome sequence of M. tuberculosisH37Rv, the paradigm strain of tuberculosis research.However, genomic variation may exist among differentstrains and, for the mycobacteria, only very few studieshave addressed this question by the use of DNA arraysand then for a limited number of strains (Behr et al., 1999;Kato-Maeda et al., 2001). We therefore evaluated the extentof the conserved gene pool relative to the flexible gene poolin a collection of strains from the M. tuberculosis complexand for some other mycobacterial species; this has led to abetter understanding of the genetic criteria that may haveplayed a role in the selection of the most successful M.tuberculosis strains during the evolution of the pathogen.This information is of importance for the developmentof new therapeutic and preventive strategies in the fightagainst tuberculosis.

ACKNOWLEDGEMENTS

We are grateful to Lionel Frangeul, Thierry Garnier and AboubakarMaitournam for help in primer design and data comparison, andStephen Gordon, Sarah Ngo Niobe-Eyangoh and Alexander Pymfor fruitful discussions. Preliminary sequence data were obtainedfrom The Institute for Genomic Research (TIGR) web site (http://www.tigr.org) and the M. marinum sequence database at theWellcome Trust Sanger Institute (http://www.sanger.ac.uk/Projects/M_marinum/). Sequencing of M. avium and M. smegmatis at TIGRwas accomplished with support from NIAID, and sequencing ofM. marinum at the Sanger Institute with support from BeowulfGenomics. This study received financial support from the InstitutPasteur (PTR 35), the Genopole Programme and the AssociationFrancaise Raoul Follereau.

REFERENCES

Banu, S., Honore, N., Saint-Joanis, B., Philpott, D., Prevost, M. C. &Cole, S. T. (2002). Are the PE-PGRS proteins of Mycobacterium

tuberculosis variable surface antigens? Mol Microbiol 44, 9–19.

Behr, M. A., Wilson, M. A., Gill, W. P., Salamon, H., Schoolnik, G. K.,Rane, S. & Small, P. M. (1999). Comparative genomics of BCG

vaccines by whole-genome DNA microarray. Science 284, 1520–1523.

Belanger, A. E., Besra, G. S., Ford, M. E., Mikusova, K., Belisle, J. T.,Brennan, P. J. & Inamine, J. M. (1996). The embAB genes of Myco-bacterium avium encode an arabinosyl transferase involved in cell

wall arabinan biosynthesis that is the target for the antimycobacterial

drug ethambutol. Proc Natl Acad Sci U S A 93, 11919–11924.

Brodin, P., Eiglmeier, K., Marmiesse, M., Billault, A., Garnier, T.,Niemann, S., Cole, S. T. & Brosch, R. (2002). Bacterial artificialchromosome-based comparative genomic analysis identifies Myco-

bacterium microti as a natural ESAT-6 deletion mutant. Infect

Immun 70, 5568–5578.

Brosch, R., Gordon, S. V., Billault, A., Garnier, T., Eiglmeier, K.,Soravito, C., Barrell, B. G. & Cole, S. T. (1998). Use of a

Mycobacterium tuberculosis H37Rv bacterial artificial chromosome

library for genome mapping, sequencing, and comparative genomics.

Infect Immun 66, 2221–2229.

Brosch, R., Gordon, S. V., Pym, A., Eiglmeier, K., Garnier, T. & Cole,S. T. (2000). Comparative genomics of the mycobacteria. Int J Med

Microbiol 290, 143–152.

Brosch, R., Pym, A. S., Gordon, S. V. & Cole, S. T. (2001). Theevolution of mycobacterial pathogenicity: clues from comparativegenomics. Trends Microbiol 9, 452–458.

Brosch, R., Gordon, S. V., Marmiesse, M. & 12 other authors(2002). A new evolutionary scenario for the Mycobacterium

tuberculosis complex. Proc Natl Acad Sci U S A 99, 3684–3689.

Camus, J. C., Pryor, M. J., Medigue, C. & Cole, S. T. (2002).Re-annotation of the genome sequence of Mycobacterium tuberculosis

H37Rv. Microbiology 148, 2967–2973.

Cole, S. T. (2002). Comparative mycobacterial genomics as a tool for

drug target and antigen discovery. Eur Respir J Suppl 36, 78–86.

Cole, S. T., Brosch, R., Parkhill, J. & 39 other authors (1998).Deciphering the biology of Mycobacterium tuberculosis from thecomplete genome sequence. Nature 393, 537–544.

Cole, S. T., Eiglmeier, K., Parkhill, J. & 41 other authors (2001).Massive gene decay in the leprosy bacillus. Nature 409, 1007–1011.

Constant, P., Perez, E., Malaga, W., Laneelle, M. A., Saurel, O.,Daffe, M. & Guilhot, C. (2002). Role of the pks15/1 gene in thebiosynthesis of phenolglycolipids in the Mycobacterium tuberculosis

complex. Evidence that all strains synthesize glycosylated p-

hydroxybenzoic methyl esters and that strains devoid of phenol-

glycolipids harbor a frameshift mutation in the pks15/1 gene. J BiolChem 277, 38148–38158.

Cousins, D. V., Bastida, R., Cataldi, A. & 16 other authors (2003).Tuberculosis in seals caused by a novel member of the Myco-

bacterium tuberculosis complex: Mycobacterium pinnipedii sp. nov.Int J Syst Evol Microbiol 53, 1305–1314.

Fleischmann, R. D., Alland, D., Eisen, J. A. & 23 other authors(2002). Whole-genome comparison of Mycobacterium tuberculosis

clinical and laboratory strains. J Bacteriol 184, 5479–5490.

Fletcher, H. A., Donoghue, H. D., Taylor, G. M., van der Zanden,A. G. & Spigelman, M. (2003). Molecular analysis of Mycobacteriumtuberculosis DNA from a family of 18th century Hungarians.

Microbiology 149, 143–151.

Garnier, T., Eiglmeier, K., Camus, J. C. & 19 other authors (2003).The complete genome sequence of Mycobacterium bovis. Proc NatlAcad Sci U S A 100, 7877–7882.

http://mic.sgmjournals.org 495

Analyses of the M. tuberculosis complex

Page 14: Macro-array and bioinformatic analyses reveal ... · PDF fileproposed evolutionary scenario of the M. tuberculosis com-plextobetestedfurtherandconsolidated(Brosch etal.,2002). METHODS

Gey Van Pittius, N. C., Gamieldien, J., Hide, W., Brown, G. D.,Siezen, R. J. & Beyers, A. D. (2001). The ESAT-6 gene cluster ofMycobacterium tuberculosis and other high G+C Gram-positivebacteria. Genome Biol 2, RESEARCH0044.1-0044.18.

Gordon, S. V., Brosch, R., Billault, A., Garnier, T., Eiglmeier, K. &Cole, S. T. (1999). Identification of variable regions in the genomesof tubercle bacilli using bacterial artificial chromosome arrays. MolMicrobiol 32, 643–655.

Gutacker, M. M., Smoot, J. C., Migliaccio, C. A. & 7 other authors(2002). Genome-wide analysis of synonymous single nucleotidepolymorphisms in Mycobacterium tuberculosis complex organisms.Resolution of genetic relationships among closely related microbialstrains. Genetics 162, 1533–1543.

Ho, T. B., Robertson, B. D., Taylor, G. M., Shaw, R. J. & Young, D. B.(2000). Comparison of Mycobacterium tuberculosis genomes revealsfrequent deletions in a 20 kb variable region in clinical isolates. Yeast17, 272–282.

Kato-Maeda, M., Rhee, J. T., Gingeras, T. R., Salamon, H.,Drenkow, J., Smittipat, N. & Small, P. M. (2001). Comparinggenomes within the species Mycobacterium tuberculosis. Genome Res11, 547–554.

Lewis, K. N., Liao, R., Guinn, K. M., Hickey, M. J., Smith, S., Behr,M. A. & Sherman, D. R. (2003). Deletion of RD1 from Mycobacteriumtuberculosis mimics bacille Calmette-Guerin attenuation. J Infect Dis187, 117–123.

Mahairas, G. G., Sabo, P. J., Hickey, M. J., Singh, D. C. & Stover,C. K. (1996). Molecular analysis of genetic differences betweenMycobacterium bovis BCG and virulent M. bovis. J Bacteriol 178,1274–1282.

Mostowy, S., Cousins, D., Brinkman, J., Aranaz, A. & Behr, M. A.(2002). Genomic deletions suggest a phylogeny for the Mycobacteriumtuberculosis complex. J Infect Dis 186, 74–80.

Niobe-Eyangoh, S. N., Kuaban, C., Sorlin, P., Cunin, P., Thonnon, J.,Sola, C., Rastogi, N., Vincent, V. & Gutierrez, M. C. (2003). Geneticbiodiversity of Mycobacterium tuberculosis complex strains frompatients with pulmonary tuberculosis in Cameroon. J Clin Microbiol41, 2547–2553.

Pallen, M. J. (2002). The ESAT-6/WXG100 superfamily – and a newGram-positive secretion system? Trends Microbiol 10, 209–212.

Pym, A. S., Brodin, P., Brosch, R., Huerre, M. & Cole, S. T. (2002).Loss of RD1 contributed to the attenuation of the live tuberculosisvaccines Mycobacterium bovis BCG and Mycobacterium microti. MolMicrobiol 46, 709–717.

Pym, A. S., Brodin, P., Majlessi, L. & 7 other authors (2003).Recombinant BCG exporting ESAT-6 confers enhanced protectionagainst tuberculosis. Nat Med 9, 533–539.

Rosenkrands, I., King, A., Weldingh, K., Moniatte, M., Moertz, E. &Andersen, P. (2000). Towards the proteome of Mycobacteriumtuberculosis. Electrophoresis 21, 3740–3756.

Rutherford, K., Parkhill, J., Crook, J., Horsnell, T., Rice, P.,Rajandream, M.-A. & Barrell, B. (2000). ARTEMIS: sequencevisualisation and annotation. Bioinformatics 16, 944–945.

Skjot, R. L., Brock, I., Arend, S. M., Munk, M. E., Theisen, M.,Ottenhoff, T. H. & Andersen, P. (2002). Epitope mapping of theimmunodominant antigen TB10.4 and the two homologous proteinsTB10.3 and TB12.9, which constitute a subfamily of the esat-6 genefamily. Infect Immun 70, 5446–5453.

Soini, H., Pan, X., Amin, A., Graviss, E. A., Siddiqui, A. & Musser,J. M. (2000). Characterization of Mycobacterium tuberculosis isolatesfrom patients in Houston, Texas, by spoligotyping. J Clin Microbiol38, 669–676.

Springer, B., Stockman, L., Teschner, K., Roberts, G. D. & Bottger,E. C. (1996). Two-laboratory collaborative study on identification ofmycobacteria: molecular versus phenotypic methods. J Clin Microbiol34, 296–303.

Sreevatsan, S., Pan, X., Stockbauer, K. E., Connell, N. D.,Kreiswirth, B. N., Whittam, T. S. & Musser, J. M. (1997).Restricted structural gene polymorphism in the Mycobacteriumtuberculosis complex indicates evolutionarily recent global dissemi-nation. Proc Natl Acad Sci U S A 94, 9869–9874.

Supply, P., Warren, R. M., Banuls, A. L. & 7 other authors (2003).Linkage disequilibrium between minisatellite loci supports clonalevolution of Mycobacterium tuberculosis in a high tuberculosisincidence area. Mol Microbiol 47, 529–538.

Tekaia, F., Gordon, S. V., Garnier, T., Brosch, R., Barrell, B. G. &Cole, S. T. (1999). Analysis of the proteome of Mycobacteriumtuberculosis in silico. Tuber Lung Dis 79, 329–342.

Telenti, A., Philipp, W. J., Sreevatsan, S., Bernasconi, C.,Stockbauer, K. E., Wieles, B., Musser, J. M. & Jacobs, W. R., Jr(1997). The emb operon, a gene cluster of Mycobacterium tuberculosisinvolved in resistance to ethambutol. Nat Med 3, 567–570.

van Soolingen, D., Hoogenboezem, T., de Haas, P. E. & 9 otherauthors (1997). A novel pathogenic taxon of the Mycobacteriumtuberculosis complex, Canetti: characterization of an exceptionalisolate from Africa. Int J Syst Bacteriol 47, 1236–1245.

Zink, A. R., Sola, C., Reischl, U., Grabner, W., Rastogi, N., Wolf, H. &Nerlich, A. G. (2003). Characterization of Mycobacterium tuberculosiscomplex DNAs from Egyptian mummies by spoligotyping. J ClinMicrobiol 41, 359–367.

Zumarraga, M. J., Bernardelli, A., Bastida, R. & 10 other authors(1999). Molecular characterization of mycobacteria isolated fromseals. Microbiology 145, 2519–2526.

496 Microbiology 150

M. Marmiesse and others