peter vandamme
TRANSCRIPT
Introduction to polyphasic taxonomy
Peter Vandamme
EUROBILOFILMS - Third European Congress on Microbial Biofilms
Ghent, Belgium, 9 - 12 September 2013
http://www.lm.ugent.be/http://www.lm.ugent.be/
Content
� The observation of diversity: phenotypic and genotypic
coherence allows to define bacterial species
� Taxonomy and species definitions vary with technology:
old and new practices
� Phenotypic and numerical taxonomyPhenotypic and numerical taxonomy
� DNA
� Phylogeny
� Polyphasic taxonomy
� Whole genome sequences
Observation of diversity in species
4
Campylobacter lari whole cell protein patterns
Observation of diversity in species
5Campylobacter jejuni RAPD patterns
Observation of diversity in species: AFLP
6
Origin of diversity: genetic drift
7
Evolution
• Growth, genetic drift, physical separation and periods of selection lead to evolution and variation in bacterial genomes
8
– Size & organization
– Content
– Sequence
Genome size and organization
� Genome size varies from 580,074 bp (Mycoplasma genitalium) to 9,105,828 bp (Bradyrhizobium japonicum)
9
Genome size and organization
� Genome size varies from 580,074 bp (Mycoplasma genitalium) to 9,105,828 bp (Bradyrhizobium japonicum)
� 1 circular chromosome (eg. Escherichia coli 4.6 – 5.4 Mbp)
� Multiple circular chromosomes
10
� eg. Ralstonia solanacearum 3.7 Mbp and 2.1 Mbp ; Burkholderia cenocepacia 3.8 Mbp, 3.2 Mbp en 0.9 Mbp
� 1 linear chromosome (eg. Borrelia burgdorferi 0.9 Mbp)
� 1 linear and 1 circular chromosome (eg. Agrobacterium tumefaciens 2.8 en 2.1 Mbp)
Variability in gene content
11
• Venn diagram showing core and accessory genes for Streptococcus species. The surfaces are approximately proportional to the number of genes (Lefébure and Stanhope 2007 Genome Biol. 8: R71)
Variability in gene content
12
• Venn diagram showing core and accessory genes for Streptococcus species. The surfaces are approximately proportional to the number of genes (Lefébure and Stanhope 2007 Genome Biol. 8: R71)
The number of genes two genomes have in common depends on their evolutionary distance
Gene contentGene contentGene contentGene content
13Avg. no. of nucleotide substitutions/site for 16S rRNA
Fraction of shared genes
The species core and pan-genome
14
Fig. 2. GBS core genome
15Copyright ©2005 by the National Academy of Sciences
Tettelin et al. (2005) Proc. Natl. Acad. Sci. USA 102, 13950-13955
Fig. 3. GBS pan-genome
16Copyright ©2005 by the National Academy of Sciences
Tettelin et al. (2005) Proc. Natl. Acad. Sci. USA 102, 13950-13955
Lefébure et al. 2010: WGS of 96 C. coli and C. jejuni strains
The two species have a similar pan-genome size; however, C. coli has acquired a larger core genome and each species has evolved a number of species-specific core genes, possibly reflecting different adaptive strategies, in spite of their occurrence in the same niche (the gastrointestinal tract of several hosts).
17
hosts).
Recombination within the core genome is frequent within species, rare between sister species, and extremely rare with other species.
Both species’ pan-genomes underwent unique and cohesive features defining their genomic identity.
Difference in sequence?
• Relative occurrence of di-, tri-, tetra- (…) nucleotides: Karlin signatures
• Genes that are shared between organisms can differ considerably in sequence. The percentage sequence divergence in orthologous genes is
18
sequence divergence in orthologous genes is described by the ANI parameter (ANI: average nucleotide identity)
19
• Genomes seem to be composed of a core set of genes that is conserved among strains of the same species and accessory genes that are strain specific.
• Content and size of core vary with species
• Although it is clear that mechanisms exist for abundant
Variability in gene content
20
• Although it is clear that mechanisms exist for abundant and widespread genetic transfer between microbial lineages, the observation of phenotypic and genotypic clustering argues for genomic stability and cohesion. Especially LGT and recombination are now considered cohesive rather then disruptive forces in bacterial species.
* Konstantinidis and Tiedje. 2005. Genomic insights that advance the species definition for prokaryotes. PNAS 102:2567-2572
How is this information used to define bacterial species?
21
“...Taxonomy is written by taxonomists for taxonomists; in this form the subject is so dull that few, if any, non-taxonomists are tempted to read it, and presumably even fewer try their hand at it. It is the most
subjective branch of any biological discipline, and in many ways is more of an
art than a science...”
22
art than a science...”
(S. T. Cowan, 1971)
The bacterial species concept, definition & taxonomy
• There is a practical need to define bacterial species as a name bears information.
• The approaches used to define bacterial species past and present reflect state-of-the-art in
23
past and present reflect state-of-the-art in science and technology.
• The observation of phenotypic and genotypic clustering argues for genomic stability and cohesion.
• Such clusters could be called species.
• Progress in the field of taxonomy has been dominated by technological progress. Initially (until the 1950s), ‘conventional’ bacterial taxonomy placed heavy emphasis on analyses of
The bacterial species concept, definition & taxonomy
24
taxonomy placed heavy emphasis on analyses of phenotypic properties of the organism.
• To define and identify an organism, one must assess several of its phenotypic properties, from general to specific.
Phenotypic characterisationPhenotypic characterisation
26
Numerical taxonomy
• In the 1950s – 1960s it became evident that the analysis of large numbers of characteristics provided a more stable classification and a superior means to classify and identify bacteria.
27
• First generations of computers were used to analyze large data sets of biochemical and phenotypic characteristics
Discovery of the secret of life
• DNA was used to classify bacteria!
• Determining the guanine plus cytosine base ratio (GC ratio) of the DNA of
28
(GC ratio) of the DNA of the organism can be part of this process.
DNA-DNA hybridisation
• Single stranded whole genomic DNA of two strains is hybridised. The thermal stability of the obtained heterologous hybrid
29
heterologous hybrid (expressed as a percentage value) is a measure for whole genome sequence similarity.
• The complete genome should be the reference standard to determine phylogeny and taxonomy
• Pending routine access to whole genome sequences, measuring the thermal stability between two genomes,
Ad Hoc Committees on Reconciliation of
Approaches to Bacterial Systematics(Wayne et al. 1987 – TC [08/09/2013]:3,261)
30
measuring the thermal stability between two genomes, through DNA-DNA hybridization represents the best indirect assessment of the level of whole genome sequence similarity
• The phylogenetic definition of coherent phenotypic clusters, called species, generally would include strains with at least 60 - 70% DNA-DNA hybridization
What about phylogeny?
• DNA-DNA hybridisations between organisms considered closely related very often yielded low DNA-DNA hybridisation values, just like DNA-DNA hybridisations between completely different bacteria.
• Perhaps, if evolution of the whole genome can not be measured, similarities in more conserved parts of the
31
measured, similarities in more conserved parts of the genome might be more accessible?
• A gene encoding a highly conserved function (chronometer) might be a good target: rRNA genes???
• DNA-rRNA hybridisations provided a framework of five rRNA superfamilies which corresponded with the five subdivisions in the Proteobacteria.
• Technological progress allowed ‘isolation’ and sequence analysis of conserved genes.
• The most widely used molecular clocks (‘single locus appraoches’ are small subunit ribosomal RNA (SSU rRNA) genes – Found in all domains of life (not the case with other
chronometers)
• 16S rRNA in prokaryotes and 18S rRNA in eukaryotes
Molecular clocks (chronometers)
32
• 16S rRNA in prokaryotes and 18S rRNA in eukaryotes
– Functionally constant
– Sufficiently conserved (change slowly) with variable regions (V1-
V9), but too conserved to discriminate between closely related
species
– Sufficient length
– Without (?) lateral gene transfer or recombination: differences
should be primarily caused by point mutation, such that the
number of nucleotide differences correlates with the number of
changes through evolution
•The Ribosomal Database Project (RDP) •A large collection of rRNA sequences•Provides a variety of analytical programs
Ribosomal Database Project
33
•Provides a variety of analytical programs
• RDP Release 10, Update 32: May 14, 2013: 2,765,278 16S rRNAs
• http://rdp.cme.msu.edu/
• Phylogenetic trees reflecting similarity in ribosomal RNA sequences, but assumed to reflect organismal phylogeny have now been prepared for all the major prokaryotic and eukaryotic groups.
34
eukaryotic groups.
'The All-Species Living Tree' Project
• Public databases accumulated poor quality and erroneously annotated sequences.
• The need for curated databases!
• http://www.arb-silva.de/projects/living-tree/
35
• http://www.arb-silva.de/projects/living-tree/
16S rRNA sequence analysis: advantages
• There are several technological and scientific advantages for using 16S rRNA genes sequences for studying the phylogeny of bacteria. The main assets are:
36
• The availability of a near-universal database
• The availability of highly conserved 16S rRNA primers
16S rRNA sequence analysis: caveats
• Often insufficient
diversity to distinguish
closely related species
(Fox et al., 1992. How close is
close: 16S rRNA sequence
identity may not be sufficient
37
identity may not be sufficient
to guarantee species identity).
16S rRNA sequence analysis: caveats
• Often insufficient diversity to distinguish closely related species (Fox et al., 1992. How close is close: 16S rRNA sequence identity may not be sufficient to guarantee species identity).
38
• Often too much diversity within species:– 2.5-3% (Stackebrandt and Goebel. 1994. Taxonomic
note: a place for DNA-DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteriology)
– 4-5% in 16S rRNA genes of epsilon proteobacteria
Limits of 16S rRNA basedphylogeny
3939
16S rRNA sequence analysis: caveats
• Often insufficient diversity to distinguish closely related species(Fox et al., 1992. How close is close: 16S rRNA sequence identity may not be sufficient to guarantee species identity).
• Often too much diversity within species:
– 2.5-3% (Stackebrandt and Goebel. 1994. Taxonomic
40
– 2.5-3% (Stackebrandt and Goebel. 1994. Taxonomic note: a place for DNA-DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteriology)
– 4-5% in 16S rRNA genes of epsilon proteobacteria
• Tentative representation of the phylogeny of closely related bacteria
• Chronometers: (genes of)
– ribosomal proteins and RNAs
– Cytochrome
– Fe-S proteins (e.g. ferredoxins)
Evolutionary relationships of prokaryotes
41
– Fe-S proteins (e.g. ferredoxins)
– ATPase (synthesis/hydrolysis of ATP)
– recA (recombination protein)
– gyrB, groEL, rpoB...
Analysis of other chronometers to study phylogeny of bacteria?
• Pro: the less conserved nature of these genes facilitates a higher taxonomic resolution between closely related bacteria
42
• Con:
– Not universally present
– No universal databases
– Development of universal primers proved impossible
– Interference of recombination and lateral gene transfer
Limits of recA based phylogeny
43
• There is no single molecule that represents all organismal relationships adequately.
• Different molecules carry different types of information.
Polyphasic taxonomy
44
• A wealth of other methods was developed which were, just like the original biochemical tests, used to classify and identify bacteria. All of these methods carried some information that could be used as indirect measure of whole genome similarity between isolates.
Chemotaxonomy - Respiratoryquinones
45
Chemotaxonomy - Phospholipidanalysis
46
Chemotaxonomy - Polyamineanalysis
47
Chemotaxonomy - Whole cellfatty acids
48
SDS-PAGE en DNA-DNA hybridisatie (Azospirillum)
Au 2LMG 7108T
Au 5100 2396 22
A. h
alo
pa
efe
ren
s%DNA-binding7108T 2787T
Whole-cell protein electrophoresis: Azospirillum
49
Au 7Au 9
Au 10Au 11Au 12
DSM 2787T
Y 13Y 9
ATCC29145T
SpBr17
96 22
97 2093 15
22 10024 6321 7024 1819 9
A. h
alo
pa
efe
ren
s
A. amazonense
A. brasilenseA. lipoferum
Comparison of MALDI-TOF MS spectral patterns
50
Raman spectroscopy
51
Genotyping - Ribotyping
Lactobacillus sakei
52
Lactobacillus curvatus
Lactobacillus curvatus
Lactobacillus sakei
Genotyping - AFLP -Campylobacter
53
Polyphasic taxonomy
• Consensus approach to bacterial taxonomy which integrates several generally accepted ideas for the classification of bacteria• Species delineation is based on DNA-DNA
hybridisation experiments• Bacterial phylogeny can be studied through
comparative sequence analysis of conserved
54
comparative sequence analysis of conserved macromolecules such as 16S rRNA
• Polyphasic taxonomy determines and acknowledges the value of other methods for the delineation of bacteria at different hierarchical levels
• The aim is to collect as much information as possible in order to define a pragmatic consensus classification that facilitates identification
Polyphasic species definition
• The bacterial species appears to be an assemblage of isolates originating from a common ancestor population in which genetic drift resulted in clones with different degrees
55
drift resulted in clones with different degrees of recombination and characterized by:
– a certain degree of phenotypic consistency
– a significant degree of DNA-DNA hybridization
– over 97% of 16S rRNA gene sequence similarity
Polyphasic Genomic taxonomyPolyphasic Genomic taxonomy
56
observation 1observation 2
observation 3
Now that we have access to whole-genome sequences: what do they tell us?
57
Gene content could be used to define species …
58
• Venn diagram showing core and accessory genes for Streptococcus species. The surfaces are approximately proportional to the number of genes (Lefébure and Stanhope 2007 Genome Biol. 8: R71)
… and higher taxonomic units
59
• Venn diagram showing core and accessory genes for Streptococcus species. The surfaces are approximately proportional to the number of genes (Lefébure and Stanhope 2007 Genome Biol. 8: R71)
Average Nucleotide Identity?
• Genomes seem to be composed of a core set of genes that is conserved among strains of the same species and accessory genes that are strain specific
• Phylogenetic signal present in core genes (ANI values): 95% ANI corresponds with 70% DNA-DNA hybridisation
60
Average Nucleotide Identity?
• Phylogenetic signal present in core genes (ANI values): 95% ANI corresponds with 70% DNA-DNA hybridization
• ANI does not necessarily correlate with gene content
– ANI values reflect phylogeny
– Gene content reflects ecology
61
– Gene content reflects ecology
• Bacteria with considerable differences in gene content are classified in the same species in spite of considerable genomic differences
ANI based phylogeny
62
Figure 3
Conclusions (1)
– Whole genome sequences can become part of polyphasic taxonomy and the standard description of bacterial species.
– Whole genome sequences provide parameters
63
– Whole genome sequences provide parameters for a superior reconstruction of organismalphylogeny and for the delineation of species as defined by DNA-DNA hybridization.
– Why hold on to DNA-DNA hybridization level as a standard?
Conclusions (2)
– Currently, less than 10,000 bacterial species have been described representing far less than 0.1% of the existing bacterial diversity.
64
– The present practice of polyphasic taxonomy as requested by the editorial boards of taxonomic journals is counterproductive in light of the vast microbial diversity that remains to be described.