introduction to metagenomics data analysis - ueb-vhir - 2013

54
An Introduction to An Introduction to Metagenomics Data Analysis Metagenomics Data Analysis Metagenomics Training Metagenomics Training Ferran Briansó Ferran Briansó VHIR - 26/08/2013 [email protected] [email protected]

Upload: vhir-vall-dhebron-institut-de-recerca

Post on 04-Jul-2015

1.759 views

Category:

Education


5 download

DESCRIPTION

UEB-VHIR's Metagenomics Training. Session 1. 2013/08/26. An Introduction to Metagenomics Data Analysis. Ferran Briansó ([email protected])

TRANSCRIPT

Page 1: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

An Introduction to An Introduction to Metagenomics Data AnalysisMetagenomics Data Analysis

Metagenomics TrainingMetagenomics Training

Ferran BriansóFerran Briansó

VHIR - 26/08/2013

[email protected]@vhir.org

Page 2: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

OutlineOutline

Introduction to Metagenomics

Basic Terminology

Computational Approaches & Tools Whole Genome Shotgun 16S/ITS Community Surveys

Recommended Tools MEGAN mothur QIIME AXIOME & CloVR

Page 3: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Introduction to METAGENOMICSMETAGENOMICS

Page 4: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

IntroductionIntroduction

First use of the term metagenome, referencing the idea that a collection of genes sequenced from the environment could be analyzed in a way analogous to the study of a single genome.

Handelsman, J.; Rondon, M. R.; Brady, S. F.; Clardy, J.; Goodman, R. M. (1998). "Molecular biological access to the chemistry of unknown soil microbes: A new frontier for natural products".Chemistry & Biology 5 (10): R245–R249. doi:10.1016/S1074-5521(98)90108-9. PMID 9818143

Page 5: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

First use of the term metagenome, referencing the idea that a collection of genes sequenced from the environment could be analyzed in a way analogous to the study of a single genome.

“The application of modern genomics techniques to the study of communities of microbial organisms directly in their natural environments, bypassing the need for isolation and lab cultivation of individual species.”

Handelsman, J.; Rondon, M. R.; Brady, S. F.; Clardy, J.; Goodman, R. M. (1998). "Molecular biological access to the chemistry of unknown soil microbes: A new frontier for natural products".Chemistry & Biology 5 (10): R245–R249. doi:10.1016/S1074-5521(98)90108-9. PMID 9818143

Chen, K.; Pachter, L. (2005). "Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities".PLoS Computational Biology 1 (2): e24. doi:10.1371/journal.pcbi.0010024

IntroductionIntroduction

Page 6: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Source: US Division of Earth & Life Studies of the National Academieshttp://dels-old.nas.edu/metagenomics/overview.shtml

IntroductionIntroduction

Page 7: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Source: US Division of Earth & Life Studies of the National Academieshttp://dels-old.nas.edu/metagenomics/overview.shtml

IntroductionIntroduction

Page 8: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Source:

IntroductionIntroduction

Page 9: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Source: Feng Chen, JGI

IntroductionIntroduction

Perfomance Comparison for (some) Platforms

Page 10: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Basic TERMINOLOGYTERMINOLOGY

Page 11: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

TerminologyTerminology

Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.

Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs).

OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study. Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.

Page 12: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.

Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs).

OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study. Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.

Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a template derived from a different but similar sequence. This then acts as a primer that is extended to form a chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998, Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological sequence.

TerminologyTerminology

Page 13: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.

Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs).

OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study. Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.

Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a template derived from a different but similar sequence. This then acts as a primer that is extended to form a chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998, Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological sequence.

Alpha diversity: the diversity within a particular area or ecosystem; expressed by the number of species (i.e., species richness) in that ecosystem, or by one or more diversity indices.

Beta diversity: a comparison of of diversity between ecosystems, usually measured as the amount of species change between the ecosystems.

Gamma diversity: a measure of the overall diversity within a large region. Geographic-scale species diversity according to Hunter (2002:448).

TerminologyTerminology

Page 14: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.

Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs).

OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study. Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.

Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a template derived from a different but similar sequence. This then acts as a primer that is extended to form a chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998, Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological sequence.

Alpha diversity: the diversity within a particular area or ecosystem; expressed by the number of species (i.e., species richness) in that ecosystem, or by one or more diversity indices.

Beta diversity: a comparison of of diversity between ecosystems, usually measured as the amount of species change between the ecosystems.

Gamma diversity: a measure of the overall diversity within a large region. Geographic-scale species diversity according to Hunter (2002:448).

Rarefaction allows the calculation of species richness for a given number of individual samples, based on the construction of so-called rarefaction curves. This curve is a plot of the number of species as a function of the number of samples.

TerminologyTerminology

Page 15: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Computational APPROACHES & TOOLSAPPROACHES & TOOLS

Page 16: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Approaches & ToolsApproaches & Tools

Page 17: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Approaches & ToolsApproaches & Tools

Page 18: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Approaches & ToolsApproaches & Tools

Page 19: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Approaches & ToolsApproaches & Tools

Page 20: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Whole Genome SHOTGUNSHOTGUN

Page 21: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Whole Genome ShotgunWhole Genome Shotgun

Page 22: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

WGS WorkflowWGS Workflow

Page 23: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

WGS WorkflowWGS Workflow

Page 24: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

WGS WorkflowWGS Workflow

Page 25: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

WGS WorkflowWGS Workflow

Page 26: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Examples of WGS ToolsExamples of WGS Tools

Page 27: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Examples of WGS ToolsExamples of WGS Tools

Page 28: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Analysis of 16S/ITS 16S/ITS Community SurveysCommunity Surveys

Page 29: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

16S/ITS community surveys16S/ITS community surveys

Page 30: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

16S/ITS issues16S/ITS issues

Page 31: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

16S/ITS workflow16S/ITS workflow

Page 32: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

16S/ITS workflow16S/ITS workflow

Page 33: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

16S/ITS workflow16S/ITS workflow

Page 34: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

16S/ITS workflow16S/ITS workflow

Page 35: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Some recommended ToolsTools

Page 36: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Some (recommended) ToolsSome (recommended) Tools

mothur

MEGAN

Page 37: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

MEGANMEGAN

2007 →

2011 →

...

...

2012 →

Page 38: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

MEGAN 4 for 16S rRNAMEGAN 4 for 16S rRNA

Page 39: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

MEGAN 4 for 16S rRNAMEGAN 4 for 16S rRNA

Page 40: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

mothurmothur

2009 →

Page 41: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

mothurmothur

2009 →

Page 42: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

QIIMEQIIME

Page 43: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Integrative Tools/PlatformsTools/Platforms

Page 44: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

AXIOMEAXIOME

Page 45: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

AXIOMEAXIOME

Page 46: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

AXIOMEAXIOME

Page 47: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

CloVRCloVR

http://www.edgebio.com

Page 48: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

CloVRCloVR

http://www.edgebio.com

http://clovr.org

Page 49: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

CloVRCloVR

Page 50: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

CloVRCloVR

Page 51: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

CloVRCloVR

Page 52: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

CloVRCloVR

Page 53: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

CloVRCloVR

Page 54: Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Ferran BriansóFerran BriansóMGTraining 26/08/2013

Thanks for your attentionThanks for your attention

[email protected]@vhir.org

An Introduction to An Introduction to Metagenomics Data AnalysisMetagenomics Data Analysis

more info at http://ueb.vhir.org/MGT