esa 2014 qiime

82
Community Profiling via QIIME Dorota Porazinska and Zech Xu University of Colorado Boulder, CO

Upload: zech-xu

Post on 20-Jun-2015

452 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Esa 2014 qiime

Community  Profiling  via  

QIIME  Dorota  Porazinska  and  Zech  Xu  

University  of  Colorado  Boulder,  CO  

 

Page 2: Esa 2014 qiime

File  Download  

•  View  slides  at:  –  hAp://goo.gl/4duXII  

•  Raw  files:    –  hAps://app.box.com/s/kwzjd1go2g8cmic59xcd  –  Extract  it:  !tar zxf crawford_mice.tar.gz!

•  View  IPython  Notebook  –  hAp://nbviewer.ipython.org/gist/RNAer/d8e7cbd7b68a273d2269  –  Also  inside  the  downloaded  files  (require  ipython  to  open  it)  

•  Processed  file:  –  hAps://app.box.com/s/3a6gvuyn8crjamx7uqte  –  Run:  !mv output.tar.gz crawford_mice!!tar zxf output.tar.gz!

Page 3: Esa 2014 qiime

Sequencing  cost  ge]ng  cheaper  

hAp://goo.gl/rWW1Ay  

Page 4: Esa 2014 qiime

Tsunami  of  sequence  data  

???

Page 5: Esa 2014 qiime

1st  vs.  NGS  technologies    

hAp://www.patrickwardphd.com/wp-­‐content/uploads/2012/05/sprinkler-­‐kids-­‐l.jpg   hAp://1000awesomethings.com/2011/06/21/218-­‐drinking-­‐from-­‐the-­‐hose/  

Page 6: Esa 2014 qiime

A  classic  microbial  ecology  study  

Page 7: Esa 2014 qiime

A  classic  microbial  ecology  study  

Page 8: Esa 2014 qiime

A  classic  microbial  ecology  study  

Page 9: Esa 2014 qiime

A  classic  microbial  ecology  study  

Page 10: Esa 2014 qiime

A  classic  microbial  ecology  study  

Page 11: Esa 2014 qiime

A  classic  microbial  ecology  study  

Bacterial  Community  Variacon  in  Human  Body  Habitats  Across  Space  and  Time,  Costello  et  al.,  Science  2009  

Modified  from  Hamady  et  al.  Genome  Research.  2009  

Page 12: Esa 2014 qiime

Datasets  with  billions  of  sequences:    

•  Human  Microbiome  Project:  Largest  characterizacon  of  the  microbiome  of  healthy  individuals  – NIH  sponsored,  $185  million  project  – Samples  from  300  adults  and  18  body  sites  – Raw  data:  ~232  GB  

Page 13: Esa 2014 qiime

 Earth  Microbiome  Project  

Page 14: Esa 2014 qiime
Page 15: Esa 2014 qiime

Coursera  Course  

hAps://www.coursera.org/course/microbiome  

Page 16: Esa 2014 qiime

…  accumulacng  data  Healthy  individual  traveling    from  the  US  to  Bangladesh    

Page 17: Esa 2014 qiime

Healthy  individual  traveling    from  the  US  to  Bangladesh    

Relacves  of  Crohn's  disease  pacents  

Page 18: Esa 2014 qiime

Healthy  individual  traveling    from  the  US  to  Bangladesh    

Relacves  of  Crohn's  disease  pacents  

Pacents  

Page 19: Esa 2014 qiime

…  so  what  can  we  tell  from  all  this  work?  

Healthy  individual  traveling    from  the  US  to  Bangladesh    

Relacves  of  Crohn's  disease  pacents  

Pacents  

USA  Global  gut  

Page 20: Esa 2014 qiime

Healthy  individual  traveling    from  the  US  to  Bangladesh    

Relacves  of  Crohn's  disease  pacents  

Pacents  

USA  Venezuela  Malawi  

Global  gut  

Page 21: Esa 2014 qiime

Healthy  individual  traveling    from  the  US  to  Bangladesh    

Relacves  of  Crohn's  disease  pacents  

Pacents  

USA  Venezuela  Malawi  

Global  gut  

HMP  

Page 22: Esa 2014 qiime

…  so  what  can  we  tell  from  all  this  work?  

Healthy  individual  traveling    from  the  US  to  Bangladesh    

Relacves  of  Crohn's  disease  pacents  

Pacents  

USA  Venezuela  Malawi  

Global  gut  

HMP  

Page 23: Esa 2014 qiime

…  so  what  can  we  tell  from  all  this  work?  

Healthy  individual  traveling    from  the  US  to  Bangladesh    

Relacves  of  Crohn's  disease  pacents  

Pacents  

USA  Venezuela  Malawi  

Global  gut  

HMP  

Page 24: Esa 2014 qiime

hAp://qiime.org  hAp://forum.qiime.org  hAp://blog.qiime.org  

Page 25: Esa 2014 qiime

Graphical  User  Interface  

Page 26: Esa 2014 qiime

Command  line  

Page 27: Esa 2014 qiime

Perform  idenccal  operacons  

Page 28: Esa 2014 qiime

Paths  (absolute)  /Users/yoshiki/evident-data/hmp-v13_arare/alpha_div $HOME/evident-data/hmp-v13_arare/alpha_div ~/evident-data/hmp-v13/alpha_div

A  slash  at  the  beginning  of  a  path  denotes  it  as  an  absolute  path,  i.  e.  from  the  base  of  your  hard  drive.  

Page 29: Esa 2014 qiime

Paths  (relacve)  evident-data/hmp-v13_arare/alpha_div

On  the  other  side  relacve  paths  are  not  preceeded  by  a  slash  

Page 30: Esa 2014 qiime

QIIME  

Page 31: Esa 2014 qiime

QIIME  Structure  

●  Integrates  other  somware  ●  Set  of  scripts  to  perform  certain  funccons  ●  Allows  an  easy  workflow  ●  Keys,  wallet,  phone:  print_qiime_config.py

 

Page 32: Esa 2014 qiime

QIIME  somware  dependencies  [data-­‐lanemask]  [data-­‐core]  [python]  [setuptools]  [MySQL-­‐python]  [SQLAlchemy]  [pycogent]  [pynast]  [numpy]  [matplotlib]  [mpi4py]  [lxml]  [sphinx]  [raxml]  [fasFree]  

[cdbtools]  [chimeraslayer]  [cdhit]  [rdpclassifier]  [blast]  [muscle]  [infernal]  [cytoscape]  [clearcut]  [mothur]  [uclust]  [r]  [ampliconnoise]  [vienna]  [pprospector]  

Page 33: Esa 2014 qiime

Script  types  

Single  Task    One  step    Most  of  them  

 Workflows    MulGple  scripts  in  one    Uses  a  log  file    Indicated  in  the  script  descripcon  

Page 34: Esa 2014 qiime

QIIME  commands  

Get  help  with  index  site  hAp://qiime.org/genindex.html  Get  help  with  the  -­‐h  opcon    pick_otus.py -h

Command  names  are  self-­‐explanatory  Filtering  filter_fasta.py filter_otus_by_sample.py filter_distance_matrix.py Sorcng  sort_otu_table.py

Page 35: Esa 2014 qiime

Ge]ng  help  

hAp://qiime.org/genindex.html    

Page 36: Esa 2014 qiime

These  opGons  are  required,  else  the  script  will  not  funcGon  correctly  

These  arguments  are  opGonal,  you  can  either  use  them  or  not,  some  default  values  are  explained  here.  

Page 37: Esa 2014 qiime

QIIME  

•  The  code  is  tested  (properly)  •  The  documentacon  is  updated  constantly  based  on  users  suggescons  

•  The  help  in  the  QIIME-­‐forum  has  a  collaboracve  spirit  (developers  &  users  sharing  their  research  experiences)  

Page 38: Esa 2014 qiime

print_qiime_config.py  

Page 39: Esa 2014 qiime

QIIME  

Page 40: Esa 2014 qiime

Sequencing output (454, Illumina, Sanger)

fastq, fasta, qual, or sff/trace files

Metadata

mapping file

Pre-processinge.g., remove primer(s), demultiplex,

quality filter

Denoise 454 Data

PyroNoise, Denoiser

Reference basedBLAST, UCLUST,

USEARCH

Pick OTUs and representative sequences

De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH

Assign taxonomy

BLAST, RDP Classifier

Align sequences

e.g., PyNAST, INFERNAL, MUSCLE,

MAFFT

Build 'OTU table'i.e., sample by observation

matrix

Build phylogenetic treee.g., FastTree, RAxML,

ClearCut

Database Submission

(In development)

OTU (or other sample by observation) table

Phylogenetic Tree

Evolutionary relationship between OTUs

α-diversity and rarefaction

e.g., Phylogenetic Diversity, Chao1,

Observed Species

β-diversity and rarefaction

e.g., Weighted and unweighted UniFrac, Bray-

Curtis, Jaccard

Interactive visualizations

e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.

Legend

Required step or input Optional step or input

Currently supported for marker-gene data only

(i.e., 'upstream' step)

Currently supported for general sample by observation data

(i.e., 'downstream' step)

www.QIIME.org

Upstream  analyses     Downstream  analyses    

Page 41: Esa 2014 qiime

Sequencing output (454, Illumina, Sanger)

fastq, fasta, qual, or sff/trace files

Metadata

mapping file

Pre-processinge.g., remove primer(s), demultiplex,

quality filter

Denoise 454 Data

PyroNoise, Denoiser

Reference basedBLAST, UCLUST,

USEARCH

Pick OTUs and representative sequences

De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH

Assign taxonomy

BLAST, RDP Classifier

Align sequences

e.g., PyNAST, INFERNAL, MUSCLE,

MAFFT

Build 'OTU table'i.e., sample by observation

matrix

Build phylogenetic treee.g., FastTree, RAxML,

ClearCut

Database Submission

(In development)

OTU (or other sample by observation) table

Phylogenetic Tree

Evolutionary relationship between OTUs

α-diversity and rarefaction

e.g., Phylogenetic Diversity, Chao1,

Observed Species

β-diversity and rarefaction

e.g., Weighted and unweighted UniFrac, Bray-

Curtis, Jaccard

Interactive visualizations

e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.

Legend

Required step or input Optional step or input

Currently supported for marker-gene data only

(i.e., 'upstream' step)

Currently supported for general sample by observation data

(i.e., 'downstream' step)

www.QIIME.org

QC  and  split  libraries  

Page 42: Esa 2014 qiime

Sequencing output (454, Illumina, Sanger)

fastq, fasta, qual, or sff/trace files

Metadata

mapping file

Pre-processinge.g., remove primer(s), demultiplex,

quality filter

Denoise 454 Data

PyroNoise, Denoiser

Reference basedBLAST, UCLUST,

USEARCH

Pick OTUs and representative sequences

De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH

Assign taxonomy

BLAST, RDP Classifier

Align sequences

e.g., PyNAST, INFERNAL, MUSCLE,

MAFFT

Build 'OTU table'i.e., sample by observation

matrix

Build phylogenetic treee.g., FastTree, RAxML,

ClearCut

Database Submission

(In development)

OTU (or other sample by observation) table

Phylogenetic Tree

Evolutionary relationship between OTUs

α-diversity and rarefaction

e.g., Phylogenetic Diversity, Chao1,

Observed Species

β-diversity and rarefaction

e.g., Weighted and unweighted UniFrac, Bray-

Curtis, Jaccard

Interactive visualizations

e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.

Legend

Required step or input Optional step or input

Currently supported for marker-gene data only

(i.e., 'upstream' step)

Currently supported for general sample by observation data

(i.e., 'downstream' step)

www.QIIME.org

Building  an  OTU  table  

Page 43: Esa 2014 qiime

Alpha  and  Beta  diversity  

Sequencing output (454, Illumina, Sanger)

fastq, fasta, qual, or sff/trace files

Metadata

mapping file

Pre-processinge.g., remove primer(s), demultiplex,

quality filter

Denoise 454 Data

PyroNoise, Denoiser

Reference basedBLAST, UCLUST,

USEARCH

Pick OTUs and representative sequences

De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH

Assign taxonomy

BLAST, RDP Classifier

Align sequences

e.g., PyNAST, INFERNAL, MUSCLE,

MAFFT

Build 'OTU table'i.e., sample by observation

matrix

Build phylogenetic treee.g., FastTree, RAxML,

ClearCut

Database Submission

(In development)

OTU (or other sample by observation) table

Phylogenetic Tree

Evolutionary relationship between OTUs

α-diversity and rarefaction

e.g., Phylogenetic Diversity, Chao1,

Observed Species

β-diversity and rarefaction

e.g., Weighted and unweighted UniFrac, Bray-

Curtis, Jaccard

Interactive visualizations

e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.

Legend

Required step or input Optional step or input

Currently supported for marker-gene data only

(i.e., 'upstream' step)

Currently supported for general sample by observation data

(i.e., 'downstream' step)

www.QIIME.org

Page 44: Esa 2014 qiime

Sequencing output (454, Illumina, Sanger)

fastq, fasta, qual, or sff/trace files

Metadata

mapping file

Pre-processinge.g., remove primer(s), demultiplex,

quality filter

Denoise 454 Data

PyroNoise, Denoiser

Reference basedBLAST, UCLUST,

USEARCH

Pick OTUs and representative sequences

De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH

Assign taxonomy

BLAST, RDP Classifier

Align sequences

e.g., PyNAST, INFERNAL, MUSCLE,

MAFFT

Build 'OTU table'i.e., sample by observation

matrix

Build phylogenetic treee.g., FastTree, RAxML,

ClearCut

Database Submission

(In development)

OTU (or other sample by observation) table

Phylogenetic Tree

Evolutionary relationship between OTUs

α-diversity and rarefaction

e.g., Phylogenetic Diversity, Chao1,

Observed Species

β-diversity and rarefaction

e.g., Weighted and unweighted UniFrac, Bray-

Curtis, Jaccard

Interactive visualizations

e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.

Legend

Required step or input Optional step or input

Currently supported for marker-gene data only

(i.e., 'upstream' step)

Currently supported for general sample by observation data

(i.e., 'downstream' step)

www.QIIME.org

Visualizacons  

Page 45: Esa 2014 qiime

Sequencing output (454, Illumina, Sanger)

fastq, fasta, qual, or sff/trace files

Metadata

mapping file

Pre-processinge.g., remove primer(s), demultiplex,

quality filter

Denoise 454 Data

PyroNoise, Denoiser

Reference basedBLAST, UCLUST,

USEARCH

Pick OTUs and representative sequences

De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH

Assign taxonomy

BLAST, RDP Classifier

Align sequences

e.g., PyNAST, INFERNAL, MUSCLE,

MAFFT

Build 'OTU table'i.e., sample by observation

matrix

Build phylogenetic treee.g., FastTree, RAxML,

ClearCut

Database Submission

(In development)

OTU (or other sample by observation) table

Phylogenetic Tree

Evolutionary relationship between OTUs

α-diversity and rarefaction

e.g., Phylogenetic Diversity, Chao1,

Observed Species

β-diversity and rarefaction

e.g., Weighted and unweighted UniFrac, Bray-

Curtis, Jaccard

Interactive visualizations

e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.

Legend

Required step or input Optional step or input

Currently supported for marker-gene data only

(i.e., 'upstream' step)

Currently supported for general sample by observation data

(i.e., 'downstream' step)

www.QIIME.org

QC  and  split  libraries  

Page 46: Esa 2014 qiime

Data  

Sequences  are  in  FASTA  format  

Page 47: Esa 2014 qiime

Data  

•  Quality  scores  are  in  the  .qual  file,  similar  to  FASTA  

Page 48: Esa 2014 qiime

Metadata  (mapping  file)  

Page 49: Esa 2014 qiime

validate_mapping_file.py  

Page 50: Esa 2014 qiime

Split  libraries  

• Demulcplex    • Quality  trim  • Quality  filter  

split_libraries.py  hAp://qiime.org/scripts/split_libraries.html  

Output  files:  seqs.fna  –  demulcplexed  sequences  histograms.txt  –  histogram  of  read  lengths  split_library_log.txt  –  detailed  informacon  about  the  demulcplexing  and  quality  of  reads  

Page 51: Esa 2014 qiime

Error-­‐correccng  codes  allow  mulcplex  sequencing  

Micah  Hamady,  et  al.,  Nature  Methods,  2008.  Error-­‐correccng  barcodes  for  pyrosequencing  hundreds  of  samples  in  mulcplex.  

>GCACCTGAGGACAGGCATGAGGAA…  >GCACCTGAGGACAGGGGAGGAGGA…  >TCACATGAACCTAGGCAGGACGAA…  >CTACCGGAGGACAGGCATGAGGAT…  >TCACATGAACCTAGGCAGGAGGAA…  >GCACCTGAGGACACGCAGGACGAC…  >CTACCGGAGGACAGGCAGGAGGAA…  >CTACCGGAGGACACACAGGAGGAA…  >GAACCTTCACATAGGCAGGAGGAT…  >TCACATGAACCTAGGGGCAAGGAA…  >GCACCTGAGGACAGGCAGGAGGAA…  

>PC.634_1 FLP3FBN01ELBSX CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTTACCCTCTCAGGCCGGCTACGCATCATCGCCTTGGTGGGCCGTTACCTCACCAACTAGCTAATGCGCCGCAGGTCCATCCATGTTCACGCCTTGATGGGCGCTTTAATATACTGAGCATGCGCTCTGTATACCTATCCGGTTTTAGCTACCGTTTCCAGCAGTTATCCCGGACACATGGGCTAGG!>PC.354_3 FLP3FBN01EEWKD !TTGGACCGTGTCTCAGTTCCAATGTGGGGGCCTTCCTCTCAGAACCCCTATCCATCGAAGGCTTGGTGGGCCGTTACCCCGCCAACAACCTAATGGAACGCATCCCCATCGATGACCGAAGTTCTTTAATAGTTCTACCATGCGGAAGAACTATGCCATCGGGTATTAATCTTTCTTTCGAAAGGCTATCCCCGAGTCATCGGCAGGTTGGATACGTGTTACTCACCCGTGCGCCGGT!

Page 52: Esa 2014 qiime

split_libraries.py  

•  seqs.fna  –  demulcplexed  sequences    

Page 53: Esa 2014 qiime

Sequencing output (454, Illumina, Sanger)

fastq, fasta, qual, or sff/trace files

Metadata

mapping file

Pre-processinge.g., remove primer(s), demultiplex,

quality filter

Denoise 454 Data

PyroNoise, Denoiser

Reference basedBLAST, UCLUST,

USEARCH

Pick OTUs and representative sequences

De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH

Assign taxonomy

BLAST, RDP Classifier

Align sequences

e.g., PyNAST, INFERNAL, MUSCLE,

MAFFT

Build 'OTU table'i.e., sample by observation

matrix

Build phylogenetic treee.g., FastTree, RAxML,

ClearCut

Database Submission

(In development)

OTU (or other sample by observation) table

Phylogenetic Tree

Evolutionary relationship between OTUs

α-diversity and rarefaction

e.g., Phylogenetic Diversity, Chao1,

Observed Species

β-diversity and rarefaction

e.g., Weighted and unweighted UniFrac, Bray-

Curtis, Jaccard

Interactive visualizations

e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.

Legend

Required step or input Optional step or input

Currently supported for marker-gene data only

(i.e., 'upstream' step)

Currently supported for general sample by observation data

(i.e., 'downstream' step)

www.QIIME.org

Building  an  OTU  table  

Page 54: Esa 2014 qiime

OTU  Picking  -­‐  “de-­‐novo”  

•  Pros  –  Vast  majority  of  reads  are  clustered    –  No  reference  database  bias  

•  Cons  –  Speed;  not  easily  parallelizable    –  Erroneous  reads  get  clustered  

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA  

Clustered Sequences

OTUS OTU1 OTU2 OTU3

Clustering Algorithm CTGGGCCGTGTCTCAGTCCCAAACA TTGGAAGATGTCTCAGTTCCAGACA

CTGGGCCGTGTCTCAGTCCCAAACA TTGGAAGATGTCTCAGTTCCAGACA

CTGGGCCGTGTCTCAGTCCCAAACA TTGGAAGATGTCTCAGTTCCAGACA

Experimental Sequences

Page 55: Esa 2014 qiime

OTU  Picking  -­‐  “closed-­‐reference”  

•  Pros  –  Reference  database  is  a  quality  filter  –  Speed;  easily  parallelizable  

•  Cons  –  No  new  OTUs  can  be  observed  –  Reference  database  bias  

CTGGGCCGTGTCTCAGTCCCAA

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA  

Experimental Sequences

Reference  Sequences

CTGGGCCGTGTCTCAGTCCCAA

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG

Sequences that hit a reference

CTGGGCCGTGTCTCAGTCCCAA

Sequences that failed to hit

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA  

OTUS OTU1 OTU1 OTU1

Page 56: Esa 2014 qiime

Reference  database  

Page 57: Esa 2014 qiime

Percentage  of  reads  that  do  not  hit  the  reference  colleccon,  by  environment  type.  

Page 58: Esa 2014 qiime

Other  databases  

•  hAp://www.arb-­‐silva.de  hAp://qiime.org/home_stacc/dataFiles.html  

•  hAp://ssu-­‐rrna.org  

Page 59: Esa 2014 qiime

OTU  Picking  -­‐  “open-­‐reference”  

•  Pros  –  Best  of  both  worlds  

•  Cons  –  Downsides  of  de-­‐novo  

CTGGGCCGTGTCTCAGTCCCAA

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA  

Experimental Sequences

Reference  Sequences

CTGGGCCGTGTCTCAGTCCCAA

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG

Sequences that hit a reference

CTGGGCCGTGTCTCAGTCCCAA

Sequences that failed to hit

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA  

OTUS

OTU1 OTU2 OTU3

OTU4 OTU5 OTU6

Clustering Algorithm

Page 60: Esa 2014 qiime

pick_open_reference_otus.py  

•  hAp://qiime.org/scripts/pick_open  _reference_otus.html  •  Workflow  script,  performs  all  steps  through  building  an  OTU  

table  (see  the  log  file)  –  pick_otus.py:  determine  the  OTU  clusters  –  pick_rep_set.py:  pick  the  representacve  sequence  for  each  OTU  cluster  –  align_seqs.py:  align  the  sequences  to  a  template  or  other  reference  alignment  –  assign_taxonomy.py:  allot  a  taxonomy  to  the  representacve  sequences  –  filter_alignment.py:  remove  non-­‐phylogeneccally  informacve  posicons  –  make_phylogeny.py:  construct  a  phylogeny  from  an  alignment  –  make_otu_table.py:  constructs  the  actual  OTU  table  object  

Page 61: Esa 2014 qiime

QIIME parameters

•  hAp://qiime.org/documentacon/qiime_parameters_files.html  

•  Modify  the  default  behavior  of  a  workflow  script.  •   Blank  lines  and  those  starcng  with  ‘#’  are  ignored  •   Format  

–  script:parameter  value  

Page 62: Esa 2014 qiime

OTU  Table  in  BIOM  format  

•  Opcmized  and  efficient  data  abstraccon  •  Can  be  used  with  many  types  of  data,  but  to  make  it  Excel  'readable’  use:  biom  convert  

Page 63: Esa 2014 qiime

biom  convert    

•  hAp://biom-­‐format.org  •  Converts  the  BIOM  format  OTU  table  to  an  Excel  readable  format  

•  biom  convert  –i  otu_table_mc2_w_tax.biom  –o  otu_table.txt  -­‐b  

Page 64: Esa 2014 qiime

OTU  table  sample  idencfiers  

Page 65: Esa 2014 qiime

Taxonomic  Assignment  

•  Kingdom  •  Phylum  

•  Class  •  Order  •  Family  •  Genus  •  Species  

Sequence  16S  gene  and  compare  to  16S  database  with  taxonomic  assignments  

Page 66: Esa 2014 qiime

Taxonomic  Assignment  using    e.g.  Uclust    

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA  

Experimental Sequences

Reference  Sequences CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA  

Page 67: Esa 2014 qiime

Biom  summary  

•  Basic  stacsccs  on  the  OTU  table  –  Num  samples,  OTUs,  sequences  in  OTUs  –  Sequences  per  sample  –  Useful  to  determine  values  to  use  in  downstream  analyses  

Page 68: Esa 2014 qiime

Alpha  and  Beta  diversity  

Sequencing output (454, Illumina, Sanger)

fastq, fasta, qual, or sff/trace files

Metadata

mapping file

Pre-processinge.g., remove primer(s), demultiplex,

quality filter

Denoise 454 Data

PyroNoise, Denoiser

Reference basedBLAST, UCLUST,

USEARCH

Pick OTUs and representative sequences

De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH

Assign taxonomy

BLAST, RDP Classifier

Align sequences

e.g., PyNAST, INFERNAL, MUSCLE,

MAFFT

Build 'OTU table'i.e., sample by observation

matrix

Build phylogenetic treee.g., FastTree, RAxML,

ClearCut

Database Submission

(In development)

OTU (or other sample by observation) table

Phylogenetic Tree

Evolutionary relationship between OTUs

α-diversity and rarefaction

e.g., Phylogenetic Diversity, Chao1,

Observed Species

β-diversity and rarefaction

e.g., Weighted and unweighted UniFrac, Bray-

Curtis, Jaccard

Interactive visualizations

e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.

Legend

Required step or input Optional step or input

Currently supported for marker-gene data only

(i.e., 'upstream' step)

Currently supported for general sample by observation data

(i.e., 'downstream' step)

www.QIIME.org

Page 69: Esa 2014 qiime

 How  do  we  describe  and  compare  diversity?  

•  α  Diversity:  –   “How  many  species  (taxa)  are  in  a  sample?”      

•  e.g.  6  colors  in  A  and  6  in  B  •  Are  polluted  environments  less  diverse  than  priscne?  

•  β  Diversity:  –  “How  many  species  are  shared  between  samples?”  

•  e.g.  2  shared  colors  between  A  and  B  •  Do  the  microbiota  differ  among  different  disease  states?  

 

A  

B  

Page 70: Esa 2014 qiime

Qualitacve  vs.  Quanctacve  measures  

•  Qualitacve:  Considers  presence/absence  –  α:  How  many  species  are  in  a  sample?  

•  e.g.:  6  species  (colors)  in  both  A  and  B.  –  β: How  many  species  are  shared  between  samples?  

•  e.g.:  A  and  B  are  idenccal  because  the  same  colors  are  present  in  both.    

•  Quanctacve:  Considers  abundance  –  α:  Accounts  for  distribucon:  

•  e.g.  in  B,  6  species  are  evenly  distributed  and  thus  the  co                                                        community  is  more  diverse  than  in  A  where  1                                                                                  species  dominates  over  other  5.  

–  β: Samples  will  be  considered  more  similar  if  the                    same  distribucon  of  species  is  similar.  •  e.g.  B  and  A  no  longer  look  idenccal  because  of  differences  in  abundance.  

A  

B  

Page 71: Esa 2014 qiime

 What  is  a  phylogenecc  diversity  measure?  

•  α  Diversity:  –  Taxon:  “How  many  species  are  in  a  sample?”      –  Phylogenecc:  “How  much  phylogenecc  divergence  is  in  a  

sample?”    •  e.g.  B  more  diverse  than  A  -­‐  more  divergent  colors  

•  β  Diversity:  –  Taxon:  “How  many  species  are  shared  between  samples?”  –  Phylogenecc:  “How  much  phylogenecc  distance  is  shared  

between  samples?”  •  only  related  colors  from  B  are  in  A  

A  

B  

Page 72: Esa 2014 qiime

UniFrac  distance  matrix  

Page 73: Esa 2014 qiime

core_diversity_analyses.py •  Workflow  script  

–  filter_samples_from_otu_table.py:  Filter  samples  with  low  sequence  count  from  table    

–  single_rarefaccon.py:  sample  the  table  at  specified  sequencing  depth  –  beta_diversity.py:  use  the  sampled  table  for  beta  diversity  calculacon  –  principal_coordinates.py:  perform  PCoA  analysis  –  make_emperor.py:  make  plots  for  principal  coordinates  –  mulcple_rarefaccons.py:  make  mulcple  subsamplings/rarefaccons  on  an  otu  

table  at  various  sequencing  depths  –  alpha_diversity.py  and  collate_alpha.py:  calculate  alpha  diversices  at  those  

depths  and  collate  them  –  make_rarefaccon_plots.py:  plot  the  rarefaccon  curves  –  summarize_taxa.py  and  plot_taxa_summary.py:  summarize  taxa  and  plot  

them  

Page 74: Esa 2014 qiime

Alpha  diversity  

Basic  alpha  diversity  measure:  count  number  of  OTUs.    other  measures  can  be:  •  phylogenecc  (PD)  •  escmators  (chao1)  •  other  stacsccs  (evenness)  •  …    

Page 75: Esa 2014 qiime

Beta  diversity  

    orange1   orange2   blue1  OTU1   4   4   0  OTU2   4   4   0  OTU3   0   1   7  OTU4   0   0   7  

Page 76: Esa 2014 qiime

Summarize  Taxa  

•  Calculates  proporcon  of  taxa  per  sample,  at  different  taxonomic  levels  

•  summarize_taxa_through_plots.py  

Page 77: Esa 2014 qiime
Page 78: Esa 2014 qiime

Taxa  Summarized  by  Category    

Page 79: Esa 2014 qiime

Procrustes  Analysis  

hAp://qiime.org/tutorials/procrustes_analysis.html  transform_coordinate_matrices.py  compare_3d_plots.py  

Muegge,  B.  D.  et  al.  Science  332,  970–974  (2011).  

Page 80: Esa 2014 qiime
Page 81: Esa 2014 qiime

Stacsccally  Different?  

•  group_significance.py  •  Parametric  

–  G-­‐test  –  ANOVA  –  T-­‐test  

•  Non  parametric  –  Kruskal-­‐Wallis  –  Mann-­‐Whitney-­‐U  –  Bootstrap  Mann-­‐Whitney-­‐U  –  Bootstrap  T-­‐test  

•  compare_categories.py  •  make_distance_boxplots.py  •  …  

 

Page 82: Esa 2014 qiime

Acknowledgements  

Rob  Knight  Antonio  Gonzalez  Meg  Pirrung  Adam  Robbins-­‐Pianka  Luke  Ursell  Tony  Walters  Doug  Wendel  Daniel  McDonald  Yoshiki  Vázquez  Baeza  Will  Van  Treuren  Laura  Wegener  Parfery  Kris  Mayer    

Merete  Eggesbo  Jessica  Metcalf  Ulla  Westermann  Zhenjiang  Zech  Xu  Jose  Navas  Chris  Lauber  MaA  Gebert  Greg  C  Humphrey  Hongwei  Zhou  

Rick  Stevens  (Argonne),  Jack  Gilbert  (Argonne),  Folker  Meyer  (Argonne),  Janet  Jansson  (LBNL),  Jed  Fuhrman  (USC),  Jonathan  Eisen  (UC  Davis),  many,  many  sample  donors.  

Other  collaborators:  Noah  Fierer  (CU,  EEB),  Jeff  Gordon  (Wash  U),  Ruth  Ley  (Cornell),  Peter  Turnbaugh(Harvard),  Maria  Gloria  Dominguez  (UPR),  Catherine  Lozupone  (CU)  ...