introduction to the miseq: technology and...
TRANSCRIPT
© 2013 Illumina, Inc. All rights reserved.
Illumina, IlluminaDx, BaseSpace, BeadArray, BeadXpress, cBot, CSPro, DASL, DesignStudio, Eco, GAIIx, Genetic Energy, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, Infinium,
iSelect, MiSeq, Nextera, NuPCR, SeqMonitor, Solexa, TruSeq, TruSight, VeraCode, the pumpkin orange color, and the Genetic Energy streaming bases design are trademarks or registered trademarks
of Illumina, Inc. All other brands and names contained herein are the property of their respective owners.
Introduction to the MiSeq:
Technology and
Applications
Joseph Aman
Field Applications Scientist
2
Introduction to the MiSeq
Sample preparation kits
– DNA
– RNA
16s Sequencing Overview
Illumina sequencing portfolio update
Overview
6
The aim of the sample prep step is to obtain nucleic
acid fragments with adapters attached on both ends
Sample Prep is Critical for Successful Sequencing
Dual Index Library shown
7
The flow cell
Everything except sample preparation is completed on the flow cell
• Template annealing (1 - 96 samples)
• Template amplification
• Sequencing primer hybridization
• Sequencing-by-synthesis reaction
• Generation of fluorescent signal
8
Cluster Generation
Bind single
DNA
molecules to
surface
Amplify on
surface
~1000 molecules per ~ 1 µm cluster
9
MiSeq: Industry Best Data Quality, Paired-End Reads, High
Output, and Flexible Read Length
Read Length
Quality
Scores >Q30
1x36 bp >90% bases
2x25 bp >90% bases
2x100 bp >85% bases
2x150 bp >80% bases
2x250 bp >75% bases
2x300 bp >75% bases
Read Length (bp)
Da
ta o
utp
ut
610
Mb
1x36
850
Mb
2 x 25
3.4G
2 x 100
5.1G
2 x 150
8.5G
2 x 250
15G
2 x 300
11
Illumina Sequencing Publications
More than 5,200 publications using
Illumina’s SBS technology
Greater than 85% of all SRA projects
done on MiSeq
0
2000
4000
6000
8000
10000
12000
14000
MiSeq PGM GS Jr Proton
13,473 MiSeq projects*
86%
8% 5%
0.3%
*Projects in NCBIs Sequence Read Archive (SRA) as of
01-02-14
0
1000
2000
3000
4000
5000
6000
2007 2008 2009 2010 2011 2012 2013
12
Illumina Sequencing
TruSeq DNA
Nextera Mate Pair
Nextera DNA
Nextera XT
TruSeq Custom Amplicon
TruSeq Amplicon
Cancer Panel
Nextera Enrichment
Custom/Exome
TruSeq ChIP
TruSeq RNA v2
TruSeq Targeted RNA
TruSeq Stranded
mRNA/ Total RNA
TruSeq small RNA
Illumina Sample Preparation Portfolio
13
What kind of organism/s am I working with?
– Is there a genome or transcriptome reference? Do I
need to make one?
– What level of ploidy does it have? 1n, 2n, 4n, 8n??
– How big is the genome/region of interest?
– Does it have a balanced base composition?
What kind of samples do I have access to?
– Fresh? Frozen? FFPE? Dried? Ancient?
Environmental?
– Quantity. How much nucleic acid starting material can I
get? Do I only get one biological replicate?
– Quality. Will my starting material be degraded or
contain potential contaminants?
How many samples do I need?
– What number of samples will I need to be confident of
my discovery? What false positive/false negative rates
will I tolerate?
– How much biological/technical replication is prudent?
What are my limitations?
– Time, budget, manpower, skills, tools, access to
instrumentation
– Do I need to operate within a regulatory framework?
Questions to ask yourself
14
Illumina DNA Sequencing Applications Portfolio The growing family of sample prep
TruSeq DNA PCR-Free
Nextera Mate Pair
Nextera Nextera XT
Summary Eliminates PCR-
induced bias Long-insert; gel-
free Low-input, fast, MiSeq
Lowest-input Fast, MiSeq,
small genomes
Time ~5 h 1.5 d 1.5 h 1.5 h
Apps WGRS De novo, SVs WGRS Amplicons,
plasmids, small genomes
Input 1ug+ 1ug+ 50ng 1ng
Indexing 96 48 (gel-free) 96 96
Quality Best! +++ +++ +++
DNA Sample Prep
TruSeq Nano DNA
The new gold standard with
Low-input
~6 h
WGRS
100ng+
96
Best!
16
Nextera and Nextera XT Sample Prep Kit Features
• Fast sample prep, 90 min.
• Single read or paired-end compatibility Highlights
• 50 to 1 ng DNA per sample
• Index up to 96 samples
• Parallel processing of up to 96 samples
Sample input and indices
• Construct completed at the PCR step
• Gel-free protocol
• Enzymatic fragmentation
Specific considerations
• Large and small whole genomes
• Now used for all enrichment workflows
• Nextera Rapid Enrichment!
Suitable for:
17
p7
Index 1 Read 2 Sequencing Primer
p5
Index 1 Read 2 Sequencing Primer
Transposons
Genomic DNA
~ 300 bp
Tagmentation
Reduced-Cycle
PCR Amplification
p7 Index 1 Rd2 SP Rd1 SP p5 Index 2
Sequencing-Ready Fragment
Nextera DNA Sample Prep
Enrichment
18
TruSeq Custom Amplicon Assay Time Go from DNA to called variants in ~2 days
Design studio for custom
panels,
Fixed panels also avaliable
1536 amplicons,
96 samples per plate
Automated
sequencing
Day 1
Receive custom
oligos;
Hybridization setup
Day 1
Assay
biochemistry
Day 1-2
Cluster gen and
sequencing
on MiSeq
Day 2
Finished at 5:00PM
Real-time
analysis
Simple, efficient,
automatic data analysis
and variant calling
19
Nextera® Mate-Pair Sample Prep Kit Industry’s only gel-free protocol, lowest DNA input, ideal for de novo seq
Enables wide-range of whole-genome apps
– De novo assembly of small genomes & complex
genomes (cancer)
– Genome finishing: ‘close-to-finished’ ref. genomes from
single library type
– Spls w/limited input DNA (metagenomics)
– Detection of structural variation
Optimized combo of Nextera & TruSeq DNA Spl Prep
Gel-Free option: 3-15kb gap size
– 1 ug input; 1.5 days: 3hrs HOT
– Gel+ option for more refined gap size, 4ug input
20
TruSeq ChIP Seq - How does this work?
TS ChIP Sample Prep
1. Cross link w/ formalin, shear
2. Remaining DNA protected by
proteins
3. Immunoprecipitate with
antibody against target protein
of interest
4. Reverse crosslinks, use DNA
as input for library generation
5. Sequence, reads indicate
where protein was bound
Figure from Szalkowski, A.M, and Schmid, C.D.(2010).
Rapid innovation in ChIP-seq peak-calling algorithms is outdistancing banchmarking
efforts.
Briefings in Bioinfomatics.
21
TruSeq RNA
TruSeq Stranded RNA
Stranded mRNA
Stranded Total RNA
TruSeq RNA v2
mRNA
TruSeq Small RNA
Small RNA
Illumina RNA Portfolio
22
Sample Prep Solutions for RNA from Discovery to
Validation Transcriptome to Targeted
Whole TranscriptomeAnalysis
RNA-seq for Expression
Profiling
TruSeq
Targeted
RNA
TruSeq RNA-Seq Portfolio
TruSeq Stranded mRNA
TruSeq Stranded Total RNA w/RiboZero
• HRM
• Gold
• Plant
• Globin
TruSeq RNA
TruSeq Targeted RNA Expression
TruSeq Small RNA
24
TruSeq Total RNA Sample Prep Workflow
B
B
B
B
B
B B
B
B
B
B
B
B
Total RNA
Add rRNA Removal
Solution
Add rRNA
Removal Beads
Remove RRB-
bound rRNA
Ribosomal Depleted
RNA
25
Accurate targeting of virtually the entire transcriptome for Human, Mouse, Rat
– Assay specific gene families including alternative isoforms
Individual exons and splice junctions
cSNP detection for allele specific expression
– Non-coding RNA transcripts
Validation of over 10,000 assay designs
Add custom content to Fixed Panels
TruSeq Targeted RNA Create custom panels, select pre-validated fixed, or add-on custom
Pre-Validated Fixed Panels
Immune Response Cardiotox
Lung Cancer Apoptosis
Breast Cancer Neuro Panel
Stem Cell Prostate Cancer
P53 Pathway Wnt Pathway
Cytochrome P450 Cell Cycle
NFκB Pathway Hedgehog
Design Studio: Custom Panel Creation
26
Rapid Workflow
=
Sample to answer in 1.5 days with < 4hrs
hands-on time
Sample prep for 48-384 samples per run
Single MiSeq run equivalent to 15,000
qPCR reactions or 40 384-well plates
On instrument analysis with MSR
Modified from existing DASL Sample Prep Chemistry
27
TruSeq Small RNA Workflow
Interrogate regulatory miRNAs
Starts directly from total RNA -1.0 ug or less input
Pre-pool samples for single gel excision step
© 2011 Illumina, Inc. All rights reserved.
Illumina, illuminaDx, BeadArray, BeadXpress, cBot, CSPro, DASL, Eco, Genetic Energy, GAIIx, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, Infinium, iSelect, MiSeq, Nextera,
Sentrix, Solexa, TruSeq, VeraCode, the pumpkin orange color, and the Genetic Energy streaming bases design are trademarks or registered trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners.
16S Metagenomics
Analysis on the
MiSeq System:
29
Metagenomics & Microbiomics The unexplored living world
Metagenomics is the analysis of genomic DNA from a whole community
– 16s DNA
– Whole Genome
– Targeted Gene
Survey of micro organisms in specific environments
– Soil
– Aqueous / Marine
– Medical / Ag
What can we learn
– Taxonomic diversity (‘who is there’),
– Physiology (‘what are they doing’)
– Gene discovery
> 99% of the bacteria present in nature are non-culturable
Operational Taxonomic Unit - OTU
sciencemolecular.blogspot.com/
Metagenomics
32
Biological question asked?
How many samples per run?
ROI amplicon size?
Data analysis?
Choosing a 16S protocol: Four Questions to answer
33
Nextera XT Protocol
• <96 samples
• For amplicons >400 bp
• MSR/BaseSpace Analysis
Illumina Dem. Protocol, 2x PCR
• <96 samples
• Targets regions <400 bp
• MSR/BaseSpace Analysis
Caporaso, et al Protocol (ISME)
• >96 sample, up to 2167 barcodes
• Targets 16S V4 region; ~300 bp
• Export FASTQ to QIIME
Available options:
35
taxonomic
pathforward primer reverse primer
total
number of
sequences
found
number of
sequences
w/o
sequence
data at
primer start
number of
matchescoverage Start End ~ Amplicon Source Mis HVR
silva;Bacteria;S-D-Bact-0343-a-S-15S-D-Bact-0785-a-A-18 282323 56 236961 83.9 343 802 459 Table 17 0 V3-V4
silva;Bacteria;S-D-Bact-0343-a-S-15S-D-Bact-0785-a-A-18 282323 56 262442 93.0 343 802 459 Table 18 1 V3-V4
silva;Bacteria;S-D-Bact-0343-a-S-15S-D-Bact-0785-b-A-18 282323 56 238918 84.6 343 802 459 Table 17 0 V3-V4
silva;Bacteria;S-D-Bact-0343-a-S-15S-D-Bact-0785-b-A-18 282323 56 262456 93.0 343 802 459 Table 18 1 V3-V4
silva;Bacteria;S-D-Bact-0347-a-S-19S-D-Bact-0787-b-A-20 282323 56 257513 91.2 347 806 459 Table 18 1 V3-V4
silva;Bacteria;S-D-Bact-0371-a-S-20S-D-Bact-0787-b-A-20 282323 48 234238 83.0 371 806 435 Table 17 0 V3-V4
silva;Bacteria;S-D-Bact-0371-a-S-20S-D-Bact-0787-b-A-20 282323 48 258210 91.5 371 806 435 Table 18 1 V3-V4
silva;Bacteria;S-D-Bact-0371-a-S-20S-D-Bact-0785-a-A-21 282323 48 233701 82.8 371 805 434 Table 17 0 V3-V4
silva;Bacteria;S-D-Bact-0371-a-S-20S-D-Bact-0785-a-A-21 282323 48 256593 90.9 371 805 434 Table 18 1 V3-V4
37
• Sequence 96 sample
pooled library in single
MiSeq run
• MSR demultiplexes
data to uniquely
assign reads to
samples
Three main steps
• Amplify genomic ROI
• “Tails” on PCR
primers
• Amplify amplicons
from PCR# 1 using
indexed adapter
oligos from ILMN
• Produces barcoded
amplicons ready for
MiSeq
• Pool up to 96 samples
PCR#1 PCR #2 MiSeq
38
Step ① 1st round PCR to amplify region-of-interest (ROI)
DNA
F
R
Locus-specific sequence
Overhang adapter sequence used in Step 2
cleanup
39
Step ② 2nd round PCR to add indices + adapters
Index adapter oligos from ILMN: contain P5/P7 adapters
to make template compatible with flow cell, also
contains a unique sample index
P5 Index
1
Insert to be sequenced Index
2
P7
cleanup
40
Sequencing order on MiSeq system
– Read 1 – sequence amplicons in Forward direction up to 250 nuc.
– Index 1 – read first barcode
– Index 2 – read second barcode (software can now uniquely identify the sample)
– Read 2 – sequence amplicons in Reverse direction up to 250 nuc.
Step ③ Pool and sequence on MiSeq
P5 Index
1
Insert to be sequenced Index
2
P7
Read 1
Read 2 (optional)
Index 1 Index 2
43
A. Nextera XT transposome
with adapters is combined
with template DNA
B. Tagmentation to fragment
and add adapters
C. Limited cycle PCR to add
sequencing primer
sequences and indicies
NexteraXT design
44
Alternative methods allow increased multiplexing on MiSeq and HiSeq
However, more modifications are required…
Method for Multiplexing >96 samples
Option #3 - the Caporaso et al method (ISME 2012)
45
16s metagenomics using the Caporaso et al
(2012) ISME protocol
Libraries are prepared by direct amplification primer sets
that amplify the region (v4 in this case) and add the
clustering and sequencing primer regions.
Protocols developed for both the MiSeq and HiSeq
platforms – available on the Earth Microbiome (EMP) web
site (www.earthmicrobiome.org).
Average 15M reads on the MiSeq (2x150) and 70M on the
HiSeq (2x100).
Employs a 12-base index tag (Golay code).
47
Wildlife Disease
Laboratories
– Bruce Rideout (Director)
– Josephine Braun (Scientist)
– Goal is to remove disease as
a roadblock to conservation
– Carry out following tasks
Disease surveillance efforts
Diagnostic test development
Outbreak investigations for
the animals at the San Diego
Zoo, San Diego Zoo Safari
Park, and field conservation
programs
Collaboration with San Diego Zoo Global
Conservation of the Desert Tortoise
48
Desert Tortoise 1 - Healthy
Species Desert Tortoise
Individual 18919
Disease Severity 0 HEALTHY CONTROL
Sample Name DNA-732
Origin Nasal flush
Total # Sequences PF 322,916
Unclassified (Chorny & Probert) 44,769
Classified (Chorny & Probert) 278,147
Rank Species
Total #
sequences in
sample
Total % sequences
in sample
Cumulative %
total Notes
1 Calothrix parietina 106,004 38.11% 38.1% Blue-green algae on rocks.
2 Rickettsia sp. 15,631 5.62% 43.7% Tick, flea, lice borne - some diseases.
3 Acinetobacter sp. 5,764 2.07% 45.8% Soil bacteria.
4 Symploca atlantica 4,917 1.77% 47.6% Algae.
5 Proteobacteria 4,109 1.48% 49.0% Variable. Some pathogens known. Needs more detail.
6 Thiomonas thermosulfata 3,847 1.38% 50.4% Extremophile.
7 Bacteria 3,789 1.36% 51.8% ??
8 Blautia sp. 3,705 1.33% 53.1% Gut bacteria.
9 Blautia coccoides 3,330 1.20% 54.3% Gut bacteria.
10 Pedobacter 2,881 1.04% 55.4% Soil bacteria.
Total 153,977
Other Species of Note
Mycoplasma agassizii 138 0.05%
49
Desert Tortoise 6 - Diseased
Species Desert Tortoise
Individual 16025
Severity 4 DISEASED
Sample Name DNA-1283
Origin Nasal flush
Total # Sequences 198,672
Unclassified (Chorny & Probert) 23,077
Classified (Chorny & Probert) 175,595
Rank Species
Total # sequences in
sample Total % sequences
in sample Cumulative %
total Notes
1 Mycoplasma agassizii 31,924 18.2% 18.2% Proven as an etiologic agent of URTD in Desert Tortoise (Brown et al., 1994).
2 Myroides odoratus 11,654 6.6% 24.8%
3 Flavobacteriaceae 11,481 6.5% 31.4%
4 Chelonobacter sp. 11,384 6.5% 37.8% Associated with diseased tortoises (Gregersen et al., 2009).
5 Pedobacter sp. 7,894 4.5% 42.3% Soil bacteria.
6 Flavobacterium swingsii 5,492 3.1% 45.5%
7 Proteus penneri 5,263 3.0% 48.5% Found intestinal tract. Invasive pathogen. MDR. Infects urinary tract.
8 Pseudomonas brenneri 4,538 2.6% 51.0% Water borne. Biofilms.
9 Pseudomonas sp. 4,193 2.4% 53.4% Water borne. Biofilms.
10 Pseudomonas marginalis 4,160 2.4% 55.8% Water borne. Biofilms.
Total 97,983
50
Desert Tortoise 8 - Diseased
Species Desert Tortoise
Individual 13498
Severity 4 DISEASED
Sample Name DNA-1300 (S10)
Origin Nasal flush
Total # Sequences PF 271,248
Unclassified (Chorny & Probert) 25,323
Classified (Chorny & Probert) 245,925
Rank Species Total # sequences
in sample
Total % sequences in
sample Cumulative %
total Notes
1 Chelonobacter sp. 141,734 57.6% 57.6% Associated with diseased tortoises (Gregersen et al., 2009).
2 Chelonobacter oris 37,883 15.4% 73.0% Associated with diseased tortoises (Gregersen et al., 2009).
3 Mycoplasma agassizii 7,639 3.1% 76.1% Proven as an etiologic agent of URTD in Desert Tortoise (Brown et al., 1994).
4 Granulicatella adiacens 5,825 2.4% 78.5% Normal commensal human mucosal membranes.
5 Flavobacteriaceae 5,443 2.2% 80.7% Water borne.
6 Myroides odoratus 4,855 2.0% 82.7% Human nosocomial infection.
7 Pedobacter sp. 4,108 1.7% 84.4% Soil bacteria.
8 Deinococcus sp. 2,051 0.8% 85.2% World's Toughest Bacteria. Soil, water.
9 Flavobacterium swingsii 1,800 0.7% 85.9% Water borne.
10 Proteus penneri 1,625 0.7% 86.6% Found intestinal tract. Invasive pathogen. MDR. Infects urinary tract.
Total 212,963
51
Mycoplasma agassizii present in majority of tortoises
– 8 of 9 diseased Desert Tortoise contain pathogen
– 7 of 9 diseased Desert Tortoise nasal flush samples this pathogen is in top ten
bacteria
Tortoise-16025 it is number one bacteria (18.2%)
Tortoise-18963 it is number two bacteria (13.9%)
Tortoise-13498 it is number three bacteria (3.1%)
– Previously recognized as etiologic agent of URTD in Desert Tortoise
Chelonobacter sp. also present in many of the tortoises
– 8 of 10 Desert Tortoise nasal flush samples this bacteria is in top ten
– Chelonobacter previously linked/associated with URTD
Healthy Control Desert Tortoise is dominated by diverse microbiome of soil and
water borne bacteria
– Points to “healthy” microbiome constituent species?
– Healthy Control Desert Tortoise also contains M. agassizii trace at 0.05%!
Summary
Nasal Microbiome of Desert Tortoises
52
MiSeq HiSeq 2000/2500
NextSeq 500 HiSeq XTen
Focused Power Flexible Power Production Power Population Power
New Illumina Sequencing Portfolio
$1000 human genome and
extreme throughput for
population-scale sequencing
Power and efficiency for
large-scale genomics
Speed and simplicity
for personal scale
genomics
Speed and simplicity
for targeted and small
genome sequencing
53
A new sequencer that combines high throughput NGS applications with the
speed, ease of use and affordability of a desktop sequencer
The most flexible applications of any desktop sequencer
– Exome, transcriptome, whole genome sequencing in a single run
– Industry-leading SBS chemistry: >75% >Q30, no homopolymer issues
Sample size flexibility
– 2 output modes: high and mid flow cells and reagents
Push-button simplicity
– Load & Go workflows
– Integrated sample-to-results solution: streamlined
informatics on-premise or in cloud
Accessible affordability
– Runs starting at $1,000 (Human Genome ~$4k)
– System list price $250K
Introducing NextSeq 500
54
Fast Applications
2 x 150bp 2 x 75bp 1 x 75bp
Exome | T-ome
18 | HOURS
NIPT | GEx
12 | HOURS
Human Genome
30 | HOURS