c a m e r a a metagenomics resource for microbial ecology
DESCRIPTION
C A M E R A A Metagenomics Resource for Microbial Ecology. Saul A. Kravitz J. Craig Venter Institute Rockville, Maryland USA KNAW Colloquium May 29, 2008. Goals. Introduce you to CAMERA Encourage you to use CAMERA What can CAMERA do for you?. Presentation Outline. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/1.jpg)
C A M E R AA Metagenomics Resource
for Microbial Ecology
Saul A. KravitzJ. Craig Venter InstituteRockville, Maryland USA
KNAW ColloquiumMay 29, 2008
![Page 2: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/2.jpg)
Goals
• Introduce you to CAMERA
• Encourage you to use CAMERA
• What can CAMERA do for you?
![Page 3: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/3.jpg)
Presentation Outline
• Introduction to Metagenomics
• Global Ocean Sampling (GOS) Expedition
• CAMERA Capabilities and Features- Compute Resources
- Data Resources
- Tools Resources
• Looking Forward
![Page 4: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/4.jpg)
• Within an environment- What biological functions are present (absent)?
- What organisms are present (absent)
• Compare data from (dis)similar environments- What are the fundamental rules of microbial ecology
• Adapting to environmental conditions?- How?
- Evidence and mechanisms for lateral transfer
• Search for novel proteins and protein families
- And diversity within known families
Metagenomic Questions
![Page 5: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/5.jpg)
• Genomics – ‘Old School’- Study of a single organism's genome - Genome sequence determined using shotgun
sequencing and assembly- >1300 microbes sequenced, first in 1995
- DNA usually obtained from pure cultures (<1%) • Metagenomics
- Application of genome sequencing methods to environmental samples (no culturing)
- Environmental shotgun sequencing is the most widely used approach
- Environmental Metadata provides key context
Genomics vs Metagenomics
![Page 6: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/6.jpg)
Complexity of Microbial Communities
• Simple (e.g., AMD, gutless worm)- Few species present (<10)
- Diverse
Variations on standard genomics techniques
• Complex (e.g., Soil or Marine)- Many species present (>10, often >1000)
- Many closely related
New techniques
![Page 7: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/7.jpg)
Global Ocean Sampling Expedition
![Page 8: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/8.jpg)
Global Ocean Sampling (GOS)• 178 Total Sampling Locations
- Phase 1: 7.7M reads, >6M proteins 3/07- Phase 2-IO: 2.2M reads 3/08- Phase 2: ~10M reads future
• Diverse Environments- Open ocean, estuary, embayment, upwelling, fringing reef, atoll…
3/08
3/07
4/04
![Page 9: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/9.jpg)
• Most sequence reads are unique- Very limited assembly- Most sequences not taxonomically anchored- Relating shotgun data to reference genomes- Annotation challenging
• New Techniques Needed- Fragment Recruitment- Extreme Assembly to find pan genomes- Sample to Sample Comparisons
GOS: Sequence Diversity in the Ocean
Rusch et al (PLoS 2007)
![Page 10: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/10.jpg)
Comparing of Dominant Ribotypes
![Page 11: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/11.jpg)
Comparison of Total Genomic Content
![Page 12: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/12.jpg)
• Novel clustering process• Sequence similarity based
• Predict proteins and group into related clusters
• Include GOS and all known proteins
• Findings• GOS proteins
• cover ~all existing prokaryotic families
• expands diversity of known protein families
• ~10% of large clusters are novel
• Many are of viral origin
• No saturation in the rate of novel protein family discovery
GOS Protein Analysis Yooseph et al (PLoS 2007)
![Page 13: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/13.jpg)
Rubisco homologs
Added Protein Family Diversity
Yooseph et al (PLoS 2007)
New Groups
GOS prokaryotes
Known eukaryotes
Known prokaryotes
![Page 14: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/14.jpg)
• Study of dsDNA viruses from shotgun data- 155k viral proteins identified from 37 GOS I sites (~2.5%)
- 59% of viral sequences were bacteriophage
• Viral acquisition and retention of host metabolic genes is common and widespread- Viruses have made these genes “their own”- Clade tightly with viral genes
• Codistribution of P-SSM4-like cyanophage and the dominant ecotype of Prochlorococcus in GOS samples.
GOS Viral Analysis(Williamson et al PLoSOne 2008)
![Page 15: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/15.jpg)
Viral acquisition of host genestalC Gene
GOS Viral
Public Viral
GOS Bacterial
Public Bacterial
Public Euk
![Page 16: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/16.jpg)
Reference Genomes
• Overview- 150+ reference marine microbes (101 released)
- Scaffold for GOS
- Sequenced, assembled, autoannotated
• Isolation Metadata- Incomplete
• Bottlenecks- Availability of DNA
- Purity of DNA
• Status and Data- https://research.venterinstitute.org/moore/
![Page 17: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/17.jpg)
• Significant investment in sequencing- Only accessible to bioinformatics elite
- Diversity of user sophistication and needs
• Bioinformatics and Computation Challenges- Assembly, annotation, comparative analysis, visualization
- Dedicated compute resources
• Importance of Metadata- Metadata required for environmental analysis
- Need to drive standards
• Compliance with Convention on Biodiversity
Motivations for CAMERA
![Page 18: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/18.jpg)
Convention on Biological Diversity
• Sample in territorial waters?- Country granted certain rights by CBD
- Sampling agreements may contain restrictions
• CAMERA users must acknowledge potential restrictions on commercial data use
• CAMERA maintains mapping of country-of-origin for all data objects
![Page 19: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/19.jpg)
CAMERA – http://camera.calit2.net
• “Convenient acronym for cumbersome name…”- Henry Nichols, PLoS Biology
• Mission- Enable Research in Marine Microbiology
• Debuted March 2007
![Page 20: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/20.jpg)
CAMERA Capabilities
• Compute Resources- 512 node compute grid + 200 Tb storage
• Data and Metadata Resources- Annotated Metagenomic and genomic data
• Tools Resources- Scalable BLAST
- Fragment Recruitment
- Metagenomic Annotation
- Text Search
![Page 21: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/21.jpg)
512 Processors ~5 Teraflops
~ 200 Terabytes Storage
CAMRA Compute and Storage Complexat UCSD/Calit2
Source: Larry Smarr, Calit2
![Page 22: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/22.jpg)
CAMERA Metagenomic Data Volume
by Project
![Page 23: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/23.jpg)
CAMERA Metagenomic Samples
![Page 24: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/24.jpg)
CAMERA Users>2000 Registered Since March
2007
![Page 25: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/25.jpg)
• Metagenomic Sequence Collection- Reads and assemblies w/associated metadata
- CAMERA-computed annotation
• Protein Clusters- Maintaining clusters from Yooseph et al (Yooseph and Li, ’08)
• Genomic Data- Viral, Fungal, pico-Eukaryotes, Microbial- Moore Marine Genomes with Metadata
• Non-redundant sequence Collection- Genbank, Refseq, Uniprot/Swissprot, PDB etc
CAMERA Data Collections
![Page 26: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/26.jpg)
• Genome Standards Consortium- Led by Dawn Field, NIEeS
- Members from EU, UK, US
• Goals are to promote- Standardization of genomic descriptions
- Exchange & Integration of genomic data
• Metadata standardization key enabler- MIMS: Min Info for Metagenomic Sample
- GCDML: Standard format
Standardizing Contextual Metadata
![Page 27: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/27.jpg)
Contextual Metadata Challenges
• Researchers Need to Collect and Submit
• Relevant metadata depends on study – MIMS- Specification of minimum metadata
• Standardize Exchange Format - GCDML
- Comprehensive and extensible
- Leverages Existing Ontologies, Validatable
And…
- Easy for a scientist to use...
• Need ongoing software support for tools
![Page 28: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/28.jpg)
CAMERA Core Metadata by Project
• Defacto Core•Lattitude and Longitude•Collection date•Habitat and Geographic Location
• Missing metadata =
![Page 29: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/29.jpg)
CAMERA Contextual Metadata
![Page 30: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/30.jpg)
CAMERA 1.3
http://camera.calit2.net
![Page 31: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/31.jpg)
Scalable BLAST with Metadata
• Large searches permitted and encouraged
• 454 FLX run vs “All Metagenomic”
• Some larger tblastx jobs have run >20 hrs
• 10kbp BLASTN vs All Metagenomic – 1 min
• BLAST XML or Tabular Export
• Searches against NRAA
• BLAST XML output feeds MEGAN
• Searches against ‘All Metagenomic’
• GUI with metdata
• Tabular with metadata
![Page 32: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/32.jpg)
Scalable BLAST with Metadata
![Page 33: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/33.jpg)
Integration of Metadata and Data
![Page 34: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/34.jpg)
Browsing Large Data Collections: Fragment Recruitment Viewer
• Microbial Communities vs Reference Genomes- Millions of sequence reads vs Thousands of genomes
• Definition: A read is recruited to a sequence if:- End-to-end blastN alignment exists
• Rapid Hypothesis Generation and Exploration- How do cultured and wildtype genomes differ?
- Insertions, deletion, translocations
- Correlation with environmental factors
• Export sequence and annotation• Credits: Doug Rusch and Michael Press
![Page 35: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/35.jpg)
Fragment Recruitment ViewerS
eque
nce
Sim
ilarit
y
Genomic Position
Doug Rusch, JCVI
![Page 36: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/36.jpg)
Seq
uenc
e S
imila
rity
Genomic Position
Annotation
Geographic Legend
![Page 37: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/37.jpg)
![Page 38: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/38.jpg)
![Page 39: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/39.jpg)
![Page 40: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/40.jpg)
Prochlorococcus marinus str. MIT 9312
• Coloring by geography • 80-95% identity cloud • = GOS Indian Ocean• Regions with no coverage
• Where?• Real?
![Page 41: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/41.jpg)
Mate Status Highlights Differences
• Paired end (mate) sequencing • Coloring by mate status• Highlights cultured vs metagenomic differences • Selective display of
- Mates by status- Reads by sample
![Page 42: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/42.jpg)
Mate Pairs Highlight Variation
![Page 43: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/43.jpg)
What Genes are Involved
![Page 44: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/44.jpg)
View
by
Sample
![Page 45: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/45.jpg)
View by Sample
Filter by mate status
![Page 46: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/46.jpg)
Annotation ofEnvironmental Shotgun Data
• Gene Finding- Using Yooseph’s Protein Clusters, and/or- Metagene
• Functional Assignment- Variation of JCVI prok annotation pipeline*- Leverages protein cluster annotation -- soon
• Quality Nearly Comparable to Prokaryotic Genomic Annotation
![Page 47: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/47.jpg)
Protein Clusters as Gene Finder
• Identification and soft mask of ncRNAs• Naïve identification of ORFs (60aa min)• Add peptides to clusters incrementally
- Yooseph and Li, 2008
• Predicted Genes based on ORFS in- Clusters of sufficient size- Clusters that satisfy additional filters
![Page 48: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/48.jpg)
Protein ClustersAdvantages and Disadvantages
• Weaknesses- Homology-based- Stateful (also a strength)- Less sensitive (for now)
• Strengths- More specific- Transitive Annotation- Learns over time- Easy to maintain
![Page 49: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/49.jpg)
Search for Dehalogenase
![Page 50: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/50.jpg)
Browse Clusters
![Page 51: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/51.jpg)
Near Future
• More extensive data collection
• Summary views of data sets by- Annotation
- Samples
- Mate Status
- Taxonomy
- Habitat and other contextual metadata
• 16S datasets?
![Page 52: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.vdocument.in/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/52.jpg)
Credits• JCVI CAMERA Team
- Leonid Kagan, Michael Press, Todd Safford, Cristian Goina, Qi Yang, Sean Murphy, Jeff Hoover, Tanja Davidsen, Ramana Madupu, Sree Nampally, Nikhat Zhafar, Prateek Kumar
- Doug Rusch, Shibu Yooseph, Aaron Halpern*, Granger Sutton, Shannon Williamson
- Marv Frazier and Bob Friedman
• Calit2 CAMERA Team- Adam Brust, Michael Chiu, Brian Fox, Adam Dunne, Kayo
Arima
- Larry Smarr and Paul Gilna
http://camera.calit2.net