title
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Genomics of five Clostridium
species
Comparative Microbial Genomics Group
A DNA-Centric View
DIAGNOSIS, EPIDEMIOLOGY AND ANTIBIOTIC RESISTANCE OF THE GENUS CLOSTRIDIUM
17-18 October, 2003 - Parma, Italy
Adv. Bioinformatics Lekture, 21 November, 2003
OVERVIEW
1. Comparison of five Clostridium genomes
overview
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Parma, 2003
2. Local View: DNA Structures and Clostridium Promoters
3. Global View: oligomer bias in Clostridium genomes
Cp 100 genomes
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
How do you compare more than 100 bacterial genomes?
Organism length (bp) Percent ATClostridium acetobutylicum ATCC 824 3,940,880 69.1Clostridium botulinum ATCC 3502 3,886,916 71.8Clostridium difficile 630X 4,290,252 70.9Clostridium perfringens 13 3,031,430 71.4Clostridium tetani E88 2,799,251 71.3 Clostridium genomes Average 3,589,746 70.9 Ave. of 150 Bact. genomes 3,323,702 52.7
Parma, 2003
AT content
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
AT content in 150 Sequenced Bacterial Chromosomes
0
2
4
6
8
10
12
14
20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80
Percent AT
Number of Chromosomes
C. acetobutylicum
50% AT
C. botulinumC. difficileC. perfringensC. tetani
Parma, 2003
AT content
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Parma, 2003
Global Direct Repeats in 150 Sequenced Bacterial Chromosomes
0
5
10
15
20
25
30
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Percent of Chromosome with Repeats > 80%
Number of chromosomes
C. perfringensC. tetani
C. botulinum
C. acetobutylicum
C. difficile
Global Repeats
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Organism length (bp) Percent ATClostridium acetobutylicum ATCC 824 3,940,880 69.1Clostridium botulinum ATCC 3502 3,886,916 71.8Clostridium difficile 630X 4,290,252 70.9Clostridium perfringens 13 3,031,430 71.4Clostridium tetani E88 2,799,251 71.3 Clostridium genomes Average 3,589,746 70.9 Ave. of 150 Bact. genomes 3,323,702 52.7
Global Direct Global Inverted4.2 2.32.8 1.3
6.4 5.03.5 2.83.8 3.14.14 2.94.13 2.9
Parma, 2003
Global Inverted Repeats in 150 Sequenced Bacterial Chromosomes
0
5
10
15
20
25
30
35
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Percent of Chromosome with Repeats > 80%
Number of chromosomes
C. botulinum
C. acetobutylicumC. perfringens
C. difficile
C. tetani
Local Inverted Repeats in 150 Sequenced Bacterial Chromosomes
0
5
10
15
20
25
30
35
40
0 3 6 9 12 15 18 21 24 27 30
Percent of Chromosome with Repeats > 80%
Number of chromosomes
C. difficileC. tetani
C. botulinumC. perfringens
C. acetobutylicum
Local Repeats
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Organism length (bp) Percent ATClostridium acetobutylicum ATCC 824 3,940,880 69.1Clostridium botulinum ATCC 3502 3,886,916 71.8Clostridium difficile 630X 4,290,252 70.9Clostridium perfringens 13 3,031,430 71.4Clostridium tetani E88 2,799,251 71.3 Clostridium genomes Average 3,589,746 70.9 Ave. of 150 Bact. genomes 3,323,702 52.7
Local Dir. Local Invert.11.4 8.017.9 12.3
15.9 10.116.7 12.116.1 10.115.6 10.528.3 6.14
Parma, 2003
Local Direct Repeats in 150 Sequenced Bacterial Chromosomes
0
5
10
15
20
25
30
35
0 3 6 9 12 15 18 21 24 27 30 33
Percent of Chromosome with Repeats > 80%
Number of chromosomes
C. perfringensC. tetani
C. botulinum
C. acetobutylicum
C. difficile
Purine stretches
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Organism length (bp) Percent ATClostridium acetobutylicum ATCC 824 3,940,880 69.1Clostridium botulinum ATCC 3502 3,886,916 71.8Clostridium difficile 630X 4,290,252 70.9Clostridium perfringens 13 3,031,430 71.4Clostridium tetani E88 2,799,251 71.3 Clostridium genomes Average 3,589,746 70.9 Ave. of 150 Bact. genomes 3,323,702 52.7
pur stretchesYR stretches3.36 0.493.84 0.48
3.12 0.584.23 0.394.21 0.463.752 0.481.86 1.18
expect 1.0 expect 1.0
Parma, 2003
purine stretches in 150 Sequenced Bacterial Chromosomes
0
10
20
30
40
50
60
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8
Fraction of purine stretches >= 10 bp
Number of chromosomes
EXPECTED
C. acetobutylicumC. difficile C. perfringens
C. tetani
C. botulinum
Cp. rRNAs
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Parma, 2003
# Genes # rRNAs
3652 11
3524 10
3810 11
2693 10
2688 6
Organism length (bp) Percent ATClostridium acetobutylicum ATCC 824 3.940.880 69.1
Clostridium botulinum ATCC 3502 3.886.916 71.8
Clostridium difficile 630X 4.290.252 70.9
Clostridium perfringens 13 3.031.430 71.4
Clostridium tetani E88 2.799.251 71.3
Cp. protein
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Parma, 2003
All paralogs
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Parma, 2003
firmicutes
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Parma, 2003
C.tetani plasmid
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Parma, 2003
C.tet zoom
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Parma, 2003
Promoter structures
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Promoter Structural profile
-10 “TATA box”-35
Parma, 2003
Campy pVir zoom
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Parma, 2003
Promoter structure
Comparative Microbial Genomics Group
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
-35
-10 TATA
J. Mol. Biol., 326:1361-1372, (2003). Parma, 2003
Translation start aligned
Comparative Microbial Genomics Group
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
J. Mol. Biol., 299:907-930, (2000).
Parma, 2003
Promoter structures
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Promoter Structural profile
+1
CDS
DNA curvature,flexibility importanthere
meltsrigid
cruciform
-10-35
mRNACDS
’
-10-35
+1
Parma, 2003
context
Comparative Microbial Genomics Group
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Information Depends on Context…..
Parma, 2003
sigmas
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Organism length (bp) Percent ATClostridium acetobutylicum ATCC 824 3.940.880 69.1
Clostridium botulinum ATCC 3502 3.886.916 71.8
Clostridium difficile 630X 4.290.252 70.9
Clostridium perfringens 13 3.031.430 71.4
Clostridium tetani E88 2.799.251 71.3
# sigma70s
11
11
11
9
17
Parma, 2003
Sigma70s with colour
Comparative Microbial Genomics Group
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Clostridium ’s
RpoH
C. perfringens
C. acetobutylicum
C. tetani
RpoD
RpoS
FliA
SpoII
spoI
Sig70 tree
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Proteobacteria
Firmicutes
Actinobacteria
70
ChlamydiaCyanobact.
“other phyla”
RpoDRpoS
“RpoD like”
sigEsporulationiniation
RpoH
sigFsporulation
sigGsporulation sigH
RpoFflagella
sigmas
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Parma, 2003
Environmental influence
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
’
-10-35
DNA curvature,flexibility importanthere
+1 50 mM salt
20 deg C
’
-10-35 +1
300 mM salt
37 deg C
Parma, 2003
Context 2
Comparative Microbial Genomics Group
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Information Depends on Context…..
Parma, 2003
C. acetobutylicum
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Parma, 2003
C. botulinum
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Parma, 2003
C.difficile Atlas
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Parma, 2003
C.tetani genome
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Parma, 2003
Toxin tetanospasmin
• Anaerobic conditions allow germination of spores and production of toxins.
• The tetanus toxin is a 150 KDa protein, which is cleaved extra cellular, to a 100 Kda and a 50 Kda, that are binded with a sulfur bridge.
• Toxin binds in central nervous system
• Interferes with neurotransmitter release to block inhibitor impulses.
• Leads to unopposed muscle contraction and spasm.
C.perfringens
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Parma, 2003
C.perf. skew
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Oligomer skew in C. perfringens
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0 0.5 1 1.5 2 2.5 3 3.5
Millions
Millions
chromosomal position (Mbp)
information content (bits)
Parma, 2003
Parma, 2003
oligomers
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
LEADING STRAND, Oligomer bias (up to 8mers)Origin position: 0 Strand difference(bits): 4105557Oligo bitsG 92292GA 81583AG 64095AGA 60700GG 46169GAG 34200GAA 33918A 31417GGA 31259AA 25862AGG 25340AAG 24943AGAG 24464AGAA 23535AAGA 23005GAAA 18825GAGA 18448ATG 18129GAT 17623GGG 17200
LAGGING STRAND, Oligomer biasTerminus position: 1361000Strand difference(bits): 3977053Oligo bitsC 93908TC 82712CT 62349TCT 57477CC 47113TTC 35786CTC 32995TCC 32250T 29779CCT 24636TT 23842TTCT 23751CTCT 22962CTT 22661TCTT 21517TTTC 19334CCC 18554CAT 18537TCTC 17833ATC 17657
C. Perf skew2
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Oligomer skew in C. perfringens
-4
-3
-2
-1
0
1
2
3
4
5
0 0.5 1 1.5 2 2.5 3 3.5
Millions
Millions
Chromosome Position (bp)
Information Content (bits)
Parma, 2003
Organisms S/N
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
Parma, 2003
Organism Signal NoiseClostridium perfringens 43.792.284Clostridium difficile 43.053.789Clostridium perfringens 42.403.666Clostridium botulinum 42.239.771Clostridium acetobutylicum 39.364.732Bacillus cereus 31.536.141Bacillus anthracis 30.115.576Thermoanaerobacter tengcongensis 24.539.893Xylella fastidiosa 21.905.088Enterococcus faecalis 17.899.285Staphylococcus aureus 16.446.595Staphylococcus aureus 16.043.397Oceanobacillus iheyensis 15.185.416Fusobacterium nucleatum 15.164.722Blochmannia floridanus 14.386.685Listeria innocua 13.880.720Borrelia burgdorferi 13.805.045Xylella fastidiosa 13.778.823Bacteroides thetaiotaomicron 13.755.678Bacillus halodurans 13.098.141Bacteroides fragilis 12.662.606Lactobacillus plantarum 12.521.367Listeria monocytogenes 12.283.482Lactococcus lactis 11.556.365Staphylococcus epidermidis 11.211.357Bacillus subtilis 11.103.547Streptococcus mutans 9.348.241Streptococcus agalactiae 9.181.368Photorhabdus luminescens 9.075.427Streptococcus pneumoniae TIGR4 9.073.648Streptococcus pneumoniae R6 8.659.665Streptococcus agalactiae 8.652.586Mycoplasma penetrans 8.483.056Mycobacterium leprae 8.273.044Campylobacter jejuni 7.619.615Streptococcus pyrogenes 7.611.987Treponema pallidum 7.599.792Erwinia carotovora 7.545.523Streptococcus pyrogenes 7.089.889Mycoplasma gallisepticum 6.752.566Shewanella oneidensis 6.710.673Pseudomonas aeruginosa 6.699.858Rhodospirellula baltica 6.652.206Prochlorococcus marinus 6.465.620Wolinella succinogenes 5.294.449
Signal/Noise plot
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Comparative Microbial Genomics Group
S/N values for 612 chromosmes/plasmids
0
100
200
300
400
500
600
5.00 10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00
Bin
Frequency
Clostridiumacetobutylicum
Bin Frequency------------------5.00 56010.00 2415.00 1220.00 525.00 330.00 035.00 240.00 145.00 4
Clostridium x4
B. cereusB. anthracis
Parma, 2003
Summary
1. DNA sequence DNA structure Function
summary
Comparative Microbial Genomics Group
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
2. Clostridium difficile is more likely to be variable.
3. Clostridium DNA is different!
Parma, 2003
Acknowledgements
Comparative Microbial Genomics Group
Cen
ter fo
r Bio
log
ical S
eq
uen
ce A
naly
sis Th
e T
ech
nica
l Un
iversity
of D
en
mark D
TU
Acknowledgements• Jakob Bondo• Jimmy Hoffmann Hansen• Paiman Khorsand-Jamal • Anne Egholm Pedersen• Maria Seier Petersen• Peter Hallin
The Danish Research Foundation
www.cbs.dtu.dk • Hans-Henrik Stærfeldt• Lars Juhl Jensen• Carsten Friis• Marie Skovgaard • Peder Worning• Thomas Scheritz Pontén• Søren Brunak
Parma, 2003