jonathan eisen talk on 1$ genome

58
The 1$ Bacterial Genome: Advances in Bioinformatics Jonathan A. Eisen U. C. Davis Genome Center

Upload: jonathan-eisen

Post on 28-Jan-2015

109 views

Category:

Technology


0 download

DESCRIPTION

Talk given by Jonathan Eisen at ASM General Meeting 2009 in session on "The 1$ Bacterial Genome"

TRANSCRIPT

Page 1: Jonathan Eisen talk on 1$ Genome

The 1$ Bacterial Genome:Advances in Bioinformatics

Jonathan A. EisenU. C. Davis Genome Center

Page 2: Jonathan Eisen talk on 1$ Genome
Page 3: Jonathan Eisen talk on 1$ Genome

The 1$ Bacterial Genome:Oh $^#^ - We’re $&#$

Jonathan A. EisenU. C. Davis Genome Center

Page 4: Jonathan Eisen talk on 1$ Genome

The 1$ Bacterial Genome:Informatics, GEBA and me

Jonathan A. EisenU. C. Davis Genome Center

Page 5: Jonathan Eisen talk on 1$ Genome

Outline

• GEBA - The JGI Genomic Encyclopedia ofBacteria and Archaea

• Insights into the 1$ genome from the GEBAproject

• Additional insights into the 1$ genome

Page 6: Jonathan Eisen talk on 1$ Genome

GEBA: The Genomic Encyclopedia ofBacteria and Archaea

Run by JGI$$ from DOEWork by many

Page 7: Jonathan Eisen talk on 1$ Genome

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40phyla ofbacteria

As of 2002

Based onHugenholtz, 2002

Page 8: Jonathan Eisen talk on 1$ Genome

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40phyla ofbacteria

• Genomesequences aremostly fromthree phyla

As of 2002

Based onHugenholtz, 2002

Page 9: Jonathan Eisen talk on 1$ Genome

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40phyla ofbacteria

• Genomesequences aremostly fromthree phyla

• Some otherphyla areonly sparselysampled

As of 2002

Based onHugenholtz, 2002

Page 10: Jonathan Eisen talk on 1$ Genome

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40phyla ofbacteria

• Genomesequences aremostly fromthree phyla

• Some otherphyla areonly sparselysampled

• Same trend inArchaea

As of 2002

Based onHugenholtz, 2002

Page 11: Jonathan Eisen talk on 1$ Genome

Need for Tree Guidance Well Established

• Common approach within some eukaryotic groups– NHGRI animal projects– FGI at Whitehead– Plant LSP

• Phylogenetic gaps in bacterial and archaeal projectscommonly lamented in literature, conversations, etc

• Many small projects funded to fill in some gaps– DOE/TIGR Sequencing– Multiple CSP projects– Multiple NSF/USDA projects– Private projects (e.g., Integrated Genomics, Diversa)

Page 12: Jonathan Eisen talk on 1$ Genome

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40phyla ofbacteria

• Genomesequences aremostly fromthree phyla

• Some otherphyla are onlysparselysampled

• Solution I:sequence morephyla

• NSF-fundedTree of LifeProject

• A genomefrom each ofeight phyla

Eisen, Ward,Badger, Wu,Wu, et al.

Page 13: Jonathan Eisen talk on 1$ Genome

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 100 phyla ofbacteria

• Genome sequences aremostly from three phyla

• Most phyla with culturedspecies are sparselysampled

• Lineages with no culturedtaxa even more poorlysampled

• Solution - use tree to reallyfill gaps

Well sampled phyla

Page 14: Jonathan Eisen talk on 1$ Genome

http://www.jgi.doe.gov/programs/GEBA/pilot.html

Page 15: Jonathan Eisen talk on 1$ Genome

GEBA Pilot Project Overview

• Select 200 organisms using rRNA tree as aguide

• Develop high throughput pipeline for straingrowth and DNA preparation

• Sequence and finish 100 genomes

• Annotate, analyze, release data

• Assess benefits of tree guided sequencing

Page 16: Jonathan Eisen talk on 1$ Genome

GEBA Pilot Target List

0

5

10

15

20

25

30

35

B: A

ctinob

acteria (H

igh GC)

B: A

minan

aero

bia

B: A

quifica

e

B: B

actero

idetes

B: C

hlor

oflexi

B: D

efer

riba

cter

es

B: D

efer

riba

cter

es

B: D

eino

cocc

i

B: D

elta Pro

teob

acteria

B: Eps

ilon Pr

oteo

bacter

ia

B: Firmicut

es

B: Fus

obac

teria

B: G

amma Pr

oteo

bacter

ia

B: G

emmatim

onad

etes

B: H

aloa

naer

obiales

B: Planc

tomyc

etes

B: S

piro

chae

tes

B: The

rmod

esulfoba

cter

ia

B: The

rmod

esulfobia

B: The

rmov

enab

ulae

A: H

alob

acteria

A: A

rcha

eoglob

i

A: M

etha

noba

cter

ia

A: M

etha

nomicro

bia

A: The

rmoc

occi

A: The

rmop

rotei

Phyla

# o

f G

en

om

es

Page 17: Jonathan Eisen talk on 1$ Genome

IMG/GEBA

http://img.jgi.doe.gov/cgi-bin/geba/main.cgi

Page 18: Jonathan Eisen talk on 1$ Genome

Why Increase Taxonomic Coverage?

• Gene discovery• Annotation, functional prediction• Metagenomic analysis• Mechanisms of diversification• Species phylogeny and classification

Page 19: Jonathan Eisen talk on 1$ Genome

Phylogenetic Metagenomics

Page 20: Jonathan Eisen talk on 1$ Genome

Non-Homology Predictions:Phylogenetic Profiling

• Step 1: Search all genes inorganisms of interest against allother genomes

• Ask: Yes or No, is each genefound in each other species

• Cluster genes by distributionpatterns (profiles)

Page 21: Jonathan Eisen talk on 1$ Genome

GEBA Lesson 1

Tree of Life is a Useful Guide

Page 22: Jonathan Eisen talk on 1$ Genome
Page 23: Jonathan Eisen talk on 1$ Genome

rRNA Tree of Life

Page 24: Jonathan Eisen talk on 1$ Genome
Page 25: Jonathan Eisen talk on 1$ Genome
Page 26: Jonathan Eisen talk on 1$ Genome
Page 27: Jonathan Eisen talk on 1$ Genome

GEBA Lesson 2

We have still only scratched thesurface of microbial diversity

Page 28: Jonathan Eisen talk on 1$ Genome

Phylogenetic Diversity: Sequenced Bacteria & Archaea

Page 29: Jonathan Eisen talk on 1$ Genome

Phylogenetic Diversity with GEBA

Page 30: Jonathan Eisen talk on 1$ Genome

Phylogenetic Diversity: GreenGenes

Page 31: Jonathan Eisen talk on 1$ Genome
Page 32: Jonathan Eisen talk on 1$ Genome

Viruses Too

Page 33: Jonathan Eisen talk on 1$ Genome
Page 34: Jonathan Eisen talk on 1$ Genome

First Bacterial Actin Related Protein -Haliangium ochraceum DSM 14365

First found by V. Kunin, Structure Analysis by Patrik D. et al

Page 35: Jonathan Eisen talk on 1$ Genome
Page 36: Jonathan Eisen talk on 1$ Genome

GEBA Lesson 3

Need Experiments from Across theTree of Life too

Page 37: Jonathan Eisen talk on 1$ Genome

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40phyla ofbacteria

As of 2002

Based onHugenholtz, 2002

Page 38: Jonathan Eisen talk on 1$ Genome

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40phyla ofbacteria

• Experimentalstudies aremostly fromthree phyla

As of 2002

Based onHugenholtz, 2002

Page 39: Jonathan Eisen talk on 1$ Genome

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40phyla ofbacteria

• Experimentalstudies aremostly fromthree phyla

• Some studiesin other phyla

As of 2002

Based onHugenholtz, 2002

Page 40: Jonathan Eisen talk on 1$ Genome

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Needexperimentalstudies fromacross the treetoo

Page 41: Jonathan Eisen talk on 1$ Genome

GEBA Lesson 4

The Importance of ProjectManagement

Page 42: Jonathan Eisen talk on 1$ Genome

GEBA Project Flowchart

GEBAProposal

Scientific andTechnicalReview1

NegotiateScope of

Work

ReceiveStartingMaterial1

OK?

Project Initiation SequencingAnnotation

DraftSequencing

andAssembly1

FinishSequencing

andAssembly2

IMG1

FinishAnnotation3

CompleteGenomeGenBank

Submission1

DraftAnnotation3

ShotgunGenomeGenBank

Submission1

IMG – ER1

1 PGF2 LANL3 ORNL

OK?

OK?

IMG – ER1

Gene-QA1

David Bruce, Lynne Goodwin et al

Page 43: Jonathan Eisen talk on 1$ Genome

GEBA Lessons 5

The Importance of Culture(Collections that is)

Page 44: Jonathan Eisen talk on 1$ Genome

GEBA Biggest Challenge:Getting DNA

• Getting quality DNA is biggest bottleneck• Solution: Beg Borrow and Steal

• DSMZ offered to do for free• ATCC is doing a small number for a fee• In discussions with other PCC and other

collections

Page 45: Jonathan Eisen talk on 1$ Genome
Page 46: Jonathan Eisen talk on 1$ Genome

MicroorganismsQuantification gel of the genomic DNA isolated fromConexibacter woesei (DSM 14684T)

Conexibacter woesei (DSM 14684T) was taken from the German Collection of Microorganismsand Cell Cultures (DSMZ). The genomic DNA was isolated using the Qiagen Genomic 500 DNAKit (Qiagen 10262). The genomic DNA was 10-250 kb in size as determined by Pulsed Field GelElectrophoresis (PFGE). The bulk of DNA had a size of 50-250 kb (see attached PFGE image).The DNA concentration is 500 ng/µl as estimated from the gel. Spectrophotometric measurementsyielded a DNA concentration of 450 µg/ml; 300 µl of genomic DNA are shipped (150 µg).

1 2 3 4 5 6 7 8

Lane 1: c(λ-Marker)= 15 ngLane 2: c(λ-Marker)= 30 ngLane 3: c(λ-Marker)= 50 ngLane 4: DNA Molecular Weight Marker II (Roche

236250)Lane 5: DSM 13279, Collinsella stercorisLane 6: DSM 43043, Intrasporangium calvumLane 7: DSM 18053, Dyadobacter fermentansLane 8: DSM 20476, Slackia heliotrinireducens

Lane 9: DSM 18081, Patulibacter minatonensisLane 10: DSM 14684, Conexibacter woeseiLane 11: DSM 11002, Dethiosulfovibrio peptidovoransLane 12: DSM 11551, Halogeometricum borinquenseLane 13: DNA Molecular Weight Marker II (Roche

236250)Lane 14: c(λ-Marker)= 125 ngLane 15: c(λ-Marker)= 250 ngLane 16: c(λ-Marker)= 500 ng

9 10 11 12 13 14 15 16

Page 47: Jonathan Eisen talk on 1$ Genome

Related Lesson 1

METADATA ROCKS

Page 48: Jonathan Eisen talk on 1$ Genome

SIGS

• The Genomic Standards Consortium• The GSC is an open-membership working

body which formed in September 2005.• The goal of this international community is to

promote mechanisms that standardize thedescription of genomes and the exchange andintegration of genomic data.

• Seehttp://gensc.org/gc_wiki/index.php/Main_Page

Page 49: Jonathan Eisen talk on 1$ Genome
Page 50: Jonathan Eisen talk on 1$ Genome

Related Lesson 2

Completeness Matters

Page 51: Jonathan Eisen talk on 1$ Genome
Page 52: Jonathan Eisen talk on 1$ Genome
Page 53: Jonathan Eisen talk on 1$ Genome

Completeness

• Final quality of genome sequence influences what one cando with the data

• Why completeness (closed, high quality) is important– Gene presence/absence– Gene order– Genome rearrangements– Identifying islands

• See “The Value of Complete Microbial GenomeSequencing (You Get What You Pay For).” Fraser et al. J.Bact. 2002.

Page 54: Jonathan Eisen talk on 1$ Genome

StrpB vs. StrpA

13621300

13621500

13621700

13621900

13622100

13622300

13622500

13622700

13622900

13623100

0 500 1000 1500 2000 2500

Series1

Page 55: Jonathan Eisen talk on 1$ Genome

Mauve, Artemis

Page 56: Jonathan Eisen talk on 1$ Genome

Additional Lessons

• Computational methods need to be moreautomated

• Need to limit analyses to subsets of allavailable data

• Need for people to help interpret and studydata is increasing not decreasing

• Sequence is just the beginning• Need to train more students

Page 57: Jonathan Eisen talk on 1$ Genome
Page 58: Jonathan Eisen talk on 1$ Genome

MICROBES