rob edwards phage.sdsu/~rob san diego state university
DESCRIPTION
SGM Meeting, Warwick, April 2006. Challenges for metagenomic data analysis and lessons from viral metagenomes [What would you do if sequencing were free?]. Rob Edwards http://phage.sdsu.edu/~rob San Diego State University Fellowship for Interpretation of Genomes. Outline. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/1.jpg)
Challenges for metagenomic data analysis and lessons from viral metagenomes
[What would you do if sequencing were free?]
Rob Edwards
http://phage.sdsu.edu/~rob
San Diego State UniversityFellowship for Interpretation of Genomes
SGM Meeting, Warwick, April 2006
![Page 2: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/2.jpg)
Outline
• The envy is not mine
• A tour around the world, thanks to phage
• People suck
• What is the most successful gene in
evolution?
• Is there a Future?
![Page 3: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/3.jpg)
This is all 454 sequence data
• 21 libraries– 10 microbial, 11 phage
• 597,340,328 bp total– 20% of the human genome– 50% of all complete and partial microbial
genomes
• 5,769,035 sequences– Average 274,716 per library
• Average read length 103.5 bp– Av. read length has not increased in 7 months
• Cost 0.04¢ per bp
![Page 4: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/4.jpg)
Sequencing is cheap and easy.
Bioinformatics is neither.
![Page 5: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/5.jpg)
The Soudan Mine, Minnesota
Red Stuff OxidizedBlack Stuff Reduced
![Page 6: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/6.jpg)
Red and Black Samples Are Different
Cloned and 454 sequenced16S are indistinguishable
Black stuff
Red
ClonedRed
![Page 7: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/7.jpg)
There are different amounts of metabolism in each environment
![Page 8: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/8.jpg)
There are different amounts ofsubstrates in each environment
BlackStuff
RedStuff
![Page 9: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/9.jpg)
But are the differences significant?
• Sample 10,000 proteins from site 1• Count frequency of each “subsystem”• Repeat 20,000 times
• Repeat for sample 2
• Combine both samples• Sample 10,000 proteins 20,000 times• Build 95% CI
• Compare medians from sites 1 and 2 with 95% CI
Rodriguez-Brito (2006). BMC Bioinformatics
![Page 10: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/10.jpg)
Subsystem differences & metabolism
Iron acquisitionBlack Stuff
Siderophore enterobactin biosynthesisferric enterobactin transportABC transporter ferrichromeABC transporter heme
Black stuff: ferrous iron (Fe2+, ferroan [(Mg,Fe)6(Si,Al)4O10(OH)8])
Red stuff: ferric iron (goethite [FeO(OH)])
![Page 11: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/11.jpg)
Nitrification differentiates the samples
Edwards (2006)BMC Genomics
![Page 12: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/12.jpg)
The challenge is explaining the differences between samples
Red Sample
Arg, Trp, His UbiquinoneFA oxidationChemotaxis, FlagellaMethylglyoxal
metabolism
Black Sample
Ile, Leu, ValSiderophoresGlycerolipidsNiFe hydrogenasePhenylpropionate
degradation
![Page 13: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/13.jpg)
We can cheaply compare the importantbiochemistry happening in different
environments
We don’t care which organisms are doing the metabolism but we know what organisms are
there
![Page 14: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/14.jpg)
Outline
• The envy is not mine
• A tour around the world, thanks to phage
• People suck
• What is the most successful gene in
evolution?
• Is there a Future?
![Page 15: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/15.jpg)
Why Phages?
• Phages are viruses that infect bacteria– 10:1 ratio of phages:bacteria
– 1031 phages on the planet
• Specific interactions (probably)– one virus : one host
• Small genome size– Higher coverage
• Horizontal gene transfer– 1025-1028 bp DNA per year in the oceans
• Can’t do fosmids
![Page 16: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/16.jpg)
Phages In The Worlds Oceans
GOM41 samples
13 sites5 years
SAR1 sample
1 site1 year
BBC85 samples
38 sites8 years
ARC56 samples
16 sites1 year
LI4 sites1 year
![Page 17: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/17.jpg)
Most Marine Phage Sequences are Novel
![Page 18: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/18.jpg)
Thanks: Mya Breitbart
Phages are specific to environments
PhageProteomicTree v. 5(Edwards, Rohwer)
ssDNA
-like
T7-likeT4-like
![Page 19: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/19.jpg)
Marine Single-Stranded DNA Viruses
• 6% of SAR sequences ssDNA phage (Chlamydia-like Microviridae)
• 40% viral particles in SAR are ssDNA phage
• Several full-genome sequences were recovered via de novo assembly of these fragments
• Confirmed by PCR and sequencing
![Page 20: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/20.jpg)
12,297 sequence fragments hit using TBLASTXover a ~4.5 kb genome
3890 bp 4490 bp
0
1033
SAR Aligned Against the Chlamydia 4
Individual sequence reads
Chlamydia phi 4genome
Coverage
Concatenated hits
![Page 21: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/21.jpg)
Outline
• The envy is not mine
• A tour around the world, thanks to phage
• People suck
• What is the most successful gene in
evolution?
• Is there a Future?
![Page 22: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/22.jpg)
Phages, Reefs, and Human Disturbance
![Page 23: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/23.jpg)
Phages, Reefs, and Human Disturbance
The Northern Line IslandsExpedition, 2005
Christmas
Kingman
Christmas
Kingman
Palmyra
Washington
Fanning
![Page 24: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/24.jpg)
Christmas to Kingman Bias in No. Phage HostsNegative numbers mean relatively more phage hosts at Kingman
More pathogens at Christmas.More people at Christmas.
More photosynthesis at Kingman.No people at Kingman.
![Page 25: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/25.jpg)
Outline
• The envy is not mine
• A tour around the world, thanks to phage
• People suck
• What is the most successful gene in
evolution?
• Is there a Future?
![Page 26: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/26.jpg)
Phages enrich for important genesRios Mesquites Stromatolites• No photosynthesis genes in phages
Pozas Azules Stromatolites• 5 different photosynthesis genes in phages
![Page 27: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/27.jpg)
RNR is the most successful reaction in evolution
![Page 28: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/28.jpg)
Outline
• The envy is not mine
• A tour around the world, thanks to phage
• People suck
• What is the most successful gene in
evolution?
• Is there a Future?
![Page 29: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/29.jpg)
Computational Challenges
• Sequence annotations and analysis
– What is there?
– What is it doing?
– How is it doing it?
• Gene predictions in unknowns
– Lutz Krause (Bielefeld)
• Sequence comparisons
– BLAST
– Other ways to rapidly compare short sequences
– What happens when everyone is using 454
sequencing?
![Page 30: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/30.jpg)
Sequence data from 21 libraries
6 million sequences600 million bp
• Each BLASTX search takes 1,000 CPU hours• 21 libraries = 21,000 CPU hours or 2.4 CPU years• Users want
• repeat runs, • TBLASTX, • more analysis• more data• more, more, more, more
![Page 31: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/31.jpg)
SDSU Forest Rohwer Beltran Rodriguez-Brito
USF Mya Breitbart
Rohwer Lab Linda Wegley Florent Angly Matt Haynes
Stromatolites Janet Seifert Rice University) Valeria Souza (UNAM, Mexico)
Math Guys@SDSU Peter Salamon Joe Mahaffy James Nulton Ben Felts David Bangor Steve Rayhawk Jennifer Mueller
MIT: Ed DeLong
FIG Veronika Vonstein Ross Overbeek Annotators
ANL Rick Stevens Bob Olsen CI Support
Also at SDSU Anca Segall Stanley Maloy
UBC Curtis Suttle Amy Chan
![Page 32: Rob Edwards phage.sdsu/~rob San Diego State University](https://reader033.vdocument.in/reader033/viewer/2022061603/568139d4550346895da18605/html5/thumbnails/32.jpg)