jonathan eisen talk on 1$ genome
DESCRIPTION
Talk given by Jonathan Eisen at ASM General Meeting 2009 in session on "The 1$ Bacterial Genome"TRANSCRIPT
The 1$ Bacterial Genome:Advances in Bioinformatics
Jonathan A. EisenU. C. Davis Genome Center
The 1$ Bacterial Genome:Oh $^#^ - We’re $&#$
Jonathan A. EisenU. C. Davis Genome Center
The 1$ Bacterial Genome:Informatics, GEBA and me
Jonathan A. EisenU. C. Davis Genome Center
Outline
• GEBA - The JGI Genomic Encyclopedia ofBacteria and Archaea
• Insights into the 1$ genome from the GEBAproject
• Additional insights into the 1$ genome
GEBA: The Genomic Encyclopedia ofBacteria and Archaea
Run by JGI$$ from DOEWork by many
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40phyla ofbacteria
As of 2002
Based onHugenholtz, 2002
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40phyla ofbacteria
• Genomesequences aremostly fromthree phyla
As of 2002
Based onHugenholtz, 2002
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40phyla ofbacteria
• Genomesequences aremostly fromthree phyla
• Some otherphyla areonly sparselysampled
As of 2002
Based onHugenholtz, 2002
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40phyla ofbacteria
• Genomesequences aremostly fromthree phyla
• Some otherphyla areonly sparselysampled
• Same trend inArchaea
As of 2002
Based onHugenholtz, 2002
Need for Tree Guidance Well Established
• Common approach within some eukaryotic groups– NHGRI animal projects– FGI at Whitehead– Plant LSP
• Phylogenetic gaps in bacterial and archaeal projectscommonly lamented in literature, conversations, etc
• Many small projects funded to fill in some gaps– DOE/TIGR Sequencing– Multiple CSP projects– Multiple NSF/USDA projects– Private projects (e.g., Integrated Genomics, Diversa)
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40phyla ofbacteria
• Genomesequences aremostly fromthree phyla
• Some otherphyla are onlysparselysampled
• Solution I:sequence morephyla
• NSF-fundedTree of LifeProject
• A genomefrom each ofeight phyla
Eisen, Ward,Badger, Wu,Wu, et al.
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 100 phyla ofbacteria
• Genome sequences aremostly from three phyla
• Most phyla with culturedspecies are sparselysampled
• Lineages with no culturedtaxa even more poorlysampled
• Solution - use tree to reallyfill gaps
Well sampled phyla
http://www.jgi.doe.gov/programs/GEBA/pilot.html
GEBA Pilot Project Overview
• Select 200 organisms using rRNA tree as aguide
• Develop high throughput pipeline for straingrowth and DNA preparation
• Sequence and finish 100 genomes
• Annotate, analyze, release data
• Assess benefits of tree guided sequencing
GEBA Pilot Target List
0
5
10
15
20
25
30
35
B: A
ctinob
acteria (H
igh GC)
B: A
minan
aero
bia
B: A
quifica
e
B: B
actero
idetes
B: C
hlor
oflexi
B: D
efer
riba
cter
es
B: D
efer
riba
cter
es
B: D
eino
cocc
i
B: D
elta Pro
teob
acteria
B: Eps
ilon Pr
oteo
bacter
ia
B: Firmicut
es
B: Fus
obac
teria
B: G
amma Pr
oteo
bacter
ia
B: G
emmatim
onad
etes
B: H
aloa
naer
obiales
B: Planc
tomyc
etes
B: S
piro
chae
tes
B: The
rmod
esulfoba
cter
ia
B: The
rmod
esulfobia
B: The
rmov
enab
ulae
A: H
alob
acteria
A: A
rcha
eoglob
i
A: M
etha
noba
cter
ia
A: M
etha
nomicro
bia
A: The
rmoc
occi
A: The
rmop
rotei
Phyla
# o
f G
en
om
es
IMG/GEBA
http://img.jgi.doe.gov/cgi-bin/geba/main.cgi
Why Increase Taxonomic Coverage?
• Gene discovery• Annotation, functional prediction• Metagenomic analysis• Mechanisms of diversification• Species phylogeny and classification
Phylogenetic Metagenomics
Non-Homology Predictions:Phylogenetic Profiling
• Step 1: Search all genes inorganisms of interest against allother genomes
• Ask: Yes or No, is each genefound in each other species
• Cluster genes by distributionpatterns (profiles)
GEBA Lesson 1
Tree of Life is a Useful Guide
rRNA Tree of Life
GEBA Lesson 2
We have still only scratched thesurface of microbial diversity
Phylogenetic Diversity: Sequenced Bacteria & Archaea
Phylogenetic Diversity with GEBA
Phylogenetic Diversity: GreenGenes
Viruses Too
First Bacterial Actin Related Protein -Haliangium ochraceum DSM 14365
First found by V. Kunin, Structure Analysis by Patrik D. et al
GEBA Lesson 3
Need Experiments from Across theTree of Life too
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40phyla ofbacteria
As of 2002
Based onHugenholtz, 2002
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40phyla ofbacteria
• Experimentalstudies aremostly fromthree phyla
As of 2002
Based onHugenholtz, 2002
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40phyla ofbacteria
• Experimentalstudies aremostly fromthree phyla
• Some studiesin other phyla
As of 2002
Based onHugenholtz, 2002
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
Needexperimentalstudies fromacross the treetoo
GEBA Lesson 4
The Importance of ProjectManagement
GEBA Project Flowchart
GEBAProposal
Scientific andTechnicalReview1
NegotiateScope of
Work
ReceiveStartingMaterial1
OK?
Project Initiation SequencingAnnotation
DraftSequencing
andAssembly1
FinishSequencing
andAssembly2
IMG1
FinishAnnotation3
CompleteGenomeGenBank
Submission1
DraftAnnotation3
ShotgunGenomeGenBank
Submission1
IMG – ER1
1 PGF2 LANL3 ORNL
OK?
OK?
IMG – ER1
Gene-QA1
David Bruce, Lynne Goodwin et al
GEBA Lessons 5
The Importance of Culture(Collections that is)
GEBA Biggest Challenge:Getting DNA
• Getting quality DNA is biggest bottleneck• Solution: Beg Borrow and Steal
• DSMZ offered to do for free• ATCC is doing a small number for a fee• In discussions with other PCC and other
collections
MicroorganismsQuantification gel of the genomic DNA isolated fromConexibacter woesei (DSM 14684T)
Conexibacter woesei (DSM 14684T) was taken from the German Collection of Microorganismsand Cell Cultures (DSMZ). The genomic DNA was isolated using the Qiagen Genomic 500 DNAKit (Qiagen 10262). The genomic DNA was 10-250 kb in size as determined by Pulsed Field GelElectrophoresis (PFGE). The bulk of DNA had a size of 50-250 kb (see attached PFGE image).The DNA concentration is 500 ng/µl as estimated from the gel. Spectrophotometric measurementsyielded a DNA concentration of 450 µg/ml; 300 µl of genomic DNA are shipped (150 µg).
1 2 3 4 5 6 7 8
Lane 1: c(λ-Marker)= 15 ngLane 2: c(λ-Marker)= 30 ngLane 3: c(λ-Marker)= 50 ngLane 4: DNA Molecular Weight Marker II (Roche
236250)Lane 5: DSM 13279, Collinsella stercorisLane 6: DSM 43043, Intrasporangium calvumLane 7: DSM 18053, Dyadobacter fermentansLane 8: DSM 20476, Slackia heliotrinireducens
Lane 9: DSM 18081, Patulibacter minatonensisLane 10: DSM 14684, Conexibacter woeseiLane 11: DSM 11002, Dethiosulfovibrio peptidovoransLane 12: DSM 11551, Halogeometricum borinquenseLane 13: DNA Molecular Weight Marker II (Roche
236250)Lane 14: c(λ-Marker)= 125 ngLane 15: c(λ-Marker)= 250 ngLane 16: c(λ-Marker)= 500 ng
9 10 11 12 13 14 15 16
Related Lesson 1
METADATA ROCKS
SIGS
• The Genomic Standards Consortium• The GSC is an open-membership working
body which formed in September 2005.• The goal of this international community is to
promote mechanisms that standardize thedescription of genomes and the exchange andintegration of genomic data.
• Seehttp://gensc.org/gc_wiki/index.php/Main_Page
Related Lesson 2
Completeness Matters
Completeness
• Final quality of genome sequence influences what one cando with the data
• Why completeness (closed, high quality) is important– Gene presence/absence– Gene order– Genome rearrangements– Identifying islands
• See “The Value of Complete Microbial GenomeSequencing (You Get What You Pay For).” Fraser et al. J.Bact. 2002.
StrpB vs. StrpA
13621300
13621500
13621700
13621900
13622100
13622300
13622500
13622700
13622900
13623100
0 500 1000 1500 2000 2500
Series1
Mauve, Artemis
Additional Lessons
• Computational methods need to be moreautomated
• Need to limit analyses to subsets of allavailable data
• Need for people to help interpret and studydata is increasing not decreasing
• Sequence is just the beginning• Need to train more students
MICROBES