microbial bioinformatics
TRANSCRIPT
![Page 1: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/1.jpg)
Microbial Bioinformatics
Keith A. Crandall, PhD, FAAS, FLSDirector, Computational Biology Institute
Director, GW Genomics CoreCo-Director, Informatics, Clinical and Translational Science Institute CN
Co-Director, Institute for Biomedical Sciences Genomics and Bioinformatics ProgramProfessor, Department of Biostatistics and Bioinformatics, GWSPH
Professor, Department of Biological Sciences, CCASResearch Associate, Department of Invertebrate Zoology, US National Museum of Natural History,
Smithsonian Institution
![Page 2: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/2.jpg)
16S rRNA Sequencing Timeline
![Page 3: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/3.jpg)
Mic
robi
al N
GS
Am
plico
n (ta
rget
ed)
sequ
encin
g
![Page 4: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/4.jpg)
• Gold standard bacteria and archaea (16S rRNA): variable (loops) and conserved (stems) regions
• Fungi (ITS)• Protozoa (18S rRNA)
Microbial NGS Amplicon (targeted) sequencing
16S rRNA
![Page 5: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/5.jpg)
Microbial NGSFrom microbial taxonomic profiles to biological questions
Phyla Genera
S1 S2 S3 S1 S2 S3
Phyla Sample1 Sample2 Sample3Actinobacteria 18.8 7.9 8.9Firmicutes 44.8 21.4 38.3Fusobacteria 3.4 2.2 4.8Proteobacteria 28.2 67.1 44.1
![Page 6: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/6.jpg)
Microbiome Analyses - Metagenomics
![Page 7: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/7.jpg)
16S - Metataxonomy
![Page 8: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/8.jpg)
16S – Advantages vs Disadvantages?
● Advantages
○ Cost
○ Samples
○ Ease of analysis
○ Reference databases
○ PCR based -> lower starting DNA template
● Disadvantages
○ Only a single locus
○ No functional information
○ Often not discriminatory at the species level – or even genus level
○ No strain differentiation
○ No pathogenicity inferences
○ No drug resistance inferences
![Page 9: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/9.jpg)
16S - Cost
![Page 10: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/10.jpg)
Approach
![Page 11: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/11.jpg)
What does an Illumina library need to look like?
p5 Index2 Rd1 seq primer Rd2 seq primerIndex1 p7
16S amplicon insert5’3’
3’5’
![Page 12: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/12.jpg)
Making amplicon libraries
16S gene
Rd2 primer overhang overhang
Rd1 primer overhang
5’
3’
5’
3’
*DNA is synthesized in the 5’ to 3’ direction
2-step PCR edition
![Page 13: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/13.jpg)
Making amplicon libraries
PCR Amplicon Rd2 primer overhang overhang
Rd1 primer overhang
5’
3’5’
3’
*DNA is synthesized in the 5’ to 3’ direction
2-step PCR edition
Product
![Page 14: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/14.jpg)
Making amplicon libraries
PCR Amplicon Rd2 primer overhang overhang
Rd1 primer overhang
5’
3’
5’
3’
*DNA is synthesized in the 5’ to 3’ direction
2-step PCR edition
5’
3’
3’
5’
p5 Index2
Index1 p7
![Page 15: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/15.jpg)
p5 Index2 Rd1 seq primer Rd2 seq primerIndex1 p7
16S amplicon insert5’3’
3’5’
![Page 16: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/16.jpg)
5’
3’
Index1 p7
Making amplicon libraries
16s gene5’
3’
5’
3’
*DNA is synthesized in the 5’ to 3’ direction
1-step PCR edition
3’
5’
p5Index2 misc.
misc.
![Page 17: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/17.jpg)
p5 Index2 Misc. seqs Misc. seqs Index1 p7
16S amplicon insert5’3’
3’5’
Misc. seq + gene-specific primer region used as custom sequencing primer
![Page 18: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/18.jpg)
One step PCR Primer StructureSB501 - Forward primer option
AATGATACGGCGACCACCGAGATCTACACCTACTATATATGGTAATTGTGTGCCAGCMGCCGCGGTAA
Adapter - Allows binding to the flow cellSB501 - Barcoded Primer - Different for every primerPad - Boost the primer melting temperatureLink - Anticomplementary to known sequencesV4f - 16S V4 region forward primer
![Page 19: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/19.jpg)
How many PCR steps?One-step PCR
● PROS○ Fewer steps○ Less optimization○ Less possibility for
contamination● CONS
○ Less options for optimization
○ Less sensitive ○ Expensive/less stable
primers
Two-step PCR
● PROS○ Well-established○ Highly sensitive○ Cheaper primers
● CONS○ Possibility of amplicon
contamination○ Higher possibility for user
error/contamination○ More steps○ More optimization
![Page 20: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/20.jpg)
Don’t Trust Your Data
![Page 21: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/21.jpg)
Tools & Databases● Mothur (mothur.org) – full 16S analysis suite● QIIME (qiime.org) – full 16S analysis suite● MG-RAST server (metagnomics.anl.gov) – 16S and WGS● PathoScope (GitHub) – 16S and WGS● CloVR (clovr.org) – 16S and WGS● Animalcules (R Shiny) – downstream hypothesis testing● DADA2 – 16S analysis suite, etc.
● Ribosomal Database Project (RDP)● GreenGenes● SILVA (arb-silva.de)
![Page 22: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/22.jpg)
Basic Analysis Steps● Remove all those adapters you put on for sequencing!● Remove unwanted reads and sequencing and PCR error
○ Read length, error score (remember fastq!)● Assemble paired ends to make a contig● Map contigs against a reference library● Call taxa
● Characterize Diversity (alpha
![Page 23: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/23.jpg)
QIIME2 Workflow
![Page 24: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/24.jpg)
From 16S rRNA fastq files to table of microbial abundance and taxonomy#ASV IDsample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10 taxonomyASV1 23408 7345 38 1947 1066 82761 2679 1681 1135 1650 Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; Escherichia/ShigellaASV2 149 174 21237 2619 2344 58 61 26 2232 60 Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; KlebsiellaASV3 68 141 0 0 7 0 0 0 28 18 Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; ProteusASV4 11829 14760 1586 27 26 2084 41 1314 993 103 Firmicutes; Bacilli; Lactobacillales; Streptococcaceae; StreptococcusASV5 1395 0 551 2895 1010 1259 191 39 176 2003 Firmicutes; Bacilli; Lactobacillales; Aerococcaceae; AerococcusASV6 0 218 0 0 0 0 0 0 104 0 Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; KlebsiellaASV7 353 39 12 58 12 22 37 0 30 17 Firmicutes; Bacilli; Lactobacillales; Enterococcaceae; EnterococcusASV8 0 0 2625 13431 55640 67 13 19 2414 502 Firmicutes; Bacilli; Lactobacillales; Streptococcaceae; StreptococcusASV9 0 0 0 5537 2332 25 18 20 19 1133 Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; PluralibacterASV10 3984 0 128 1538 341 297 94 12 54 1170 Actinobacteria; Actinobacteria; Actinomycetales; Actinomycetaceae; ActinotignumASV11 74 7268 0 0 0 0 0 0 129 0 Firmicutes; Bacilli; Lactobacillales; Lactobacillaceae; LactobacillusASV12 56 63 29 23 91 12 38 0 512 648 Firmicutes; Bacilli; Bacillales; Staphylococcaceae; StaphylococcusASV13 0 0 0 0 0 0 0 0 0 0 Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; CitrobacterASV14 0 0 0 17 0 8 7 0 46 0 Firmicutes; Bacilli; Lactobacillales; Lactobacillaceae; LactobacillusASV15 403 0 133 721 288 278 0 0 0 323 Firmicutes; Bacilli; Lactobacillales; Aerococcaceae; AerococcusASV16 409 0 20 101 0 50 0 0 52 445 Firmicutes; Bacilli; Lactobacillales; Streptococcaceae; StreptococcusASV17 374 17 0 0 0 28 17 0 114 16 Firmicutes; Bacilli; Lactobacillales; Streptococcaceae; LactococcusASV18 0 0 0 0 0 48 0 507 0 0 Bacteroidetes; Bacteroidia; Bacteroidales; Prevotellaceae; Prevotella_7ASV19 0 0 0 0 0 0 0 0 0 0 Actinobacteria; Actinobacteria; Bifidobacteriales; Bifidobacteriaceae; GardnerellaASV20 50 0 0 22 0 0 69 0 183 26 Firmicutes; Negativicutes; Selenomonadales; Veillonellaceae; Veillonella
![Page 25: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/25.jpg)
Power Considerations for Experimental Design
![Page 26: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/26.jpg)
Experimental Considerations – Sample Storage
![Page 27: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/27.jpg)
Experimental Considerations – Extraction Method
![Page 28: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/28.jpg)
Bias From Analysis Approaches● OTUs vs ASVs (operational taxonomic units, amplicon sequence
variants)● Bioinformatics pipeline● Reference database
● Lots to worry about!
![Page 29: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/29.jpg)
Operational Taxonomic Units● Why no species?● Same 16S, different genomes● Same species, different 16S● OTUS are clusters of sequences that
are within a small x% genetic distance from one another (typically 3%)
![Page 30: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/30.jpg)
mothur● QC● Cluster sequences with
97% identify● Form OTUs● Classify OTUs● Taxonomy table output
![Page 31: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/31.jpg)
How do you classify reads?● Align to a reference database● Silva is the most popular and has
collected data for over 20 years● >600 million sequences
![Page 32: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/32.jpg)
DADA2 Pipeline - ASVs
● More taxonomic Resolution
● ASVs are consistent
Callahan et al. Nature Methods 2016
![Page 33: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/33.jpg)
DADA2 will model sequencing error!
![Page 34: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/34.jpg)
Resolution and Accuracy
Abundance predictions in DADA2 (ASV) are more accurate than with mothur (OTUs)
![Page 35: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/35.jpg)
Summary● 16S data are informative for a diversity of questions in microbiome
research● They have an extreme cost advantage for analyzing large numbers
of samples● One needs to take care in sample collection, storage, DNA
extraction, PCR, data analyses, and reference databases to obtain accurate and replicable results
● There are a wide variety of tools available for QC and taxonomic assignment of 16S data. Then one needs to move to R for further statistical analyses.
![Page 36: Microbial Bioinformatics](https://reader031.vdocument.in/reader031/viewer/2022020917/61c181027dfb145388276415/html5/thumbnails/36.jpg)
Tutorials!!
● QIIME2
● Muthor
● DADA2