next generation dna sequencing platforms: evolving tools for cancer research next generation dna...
TRANSCRIPT
Next Generation DNA Sequencing
Platforms:Evolving Tools for Cancer Research
Norma NeffBioengineering / Quake LabSequencing Core Director
Stem Cell InstituteSIM1 G1115 / [email protected]
Emulsion PCR-based Sequencing Technologies
Sequencing Technologies
Sequencing By Synthesis
Single Molecule Sequencing Technologies
Recommended Reviews:Michael Metzker (2010) Nature Reviews Genetics 11:31Quail et al (2012) BMC Genomics Jul 24;13:341.
Outline of Today’s Presentation:Sequencing by Synthesis
Next Gen Sequencing Sample or Library Preps
Review of Seq Technologies
Comparisons of Different Platforms
Summary and Final Thoughts
Design of Sequencing Samples or Libraries
Adapters are Ligated to Sample DNA to be sequenced = LibraryAdapters are short (30-50bp) double-stranded oligos
Sequences of the adapters are specific to each seq platform
A1 A2
Sites for PCR primers to bind to amplify the Library
A1 A2
A1 A2
Sites for seq primers to bind to seq the sample DNA
A1 A2BC1 BC2
Bar codes (6-12bp) for multiplexing libraries in a seq run
3’ OH
Sequencing by Synthesis:Bases are added to DNA Molecules at the 3’ OH end of the Chain
Emulsion PCR – Library DNA is amplified in an Oil Droplet
•Beads are spun into wells on a plate•Flows one dNTP at a time•Detects PPi Release•By Coupled Luciferase Rxn• Light Intensity = Base addition
•Beads are spun into wells of chip•Flows one dNTP at a time•Detects H+ Release•pH change = Base addition
GS JuniorRoche 454 GS FLX+ Titanium
Roche 454 Benchtop Sequencers – 400bp Readlengths / Reliable ChemistryRequires most time from Library to Machine Loading
First Technology to Incorporate Bar Coding of Libraries
Output = 1 Millions Reads; 400 -700MbRead Length = 400bases (700bases)Run Time = 8-23 hoursError Profile = Indels Homopolymers
Output = 70k-100k Reads; 30MbRead Length = 400 basesRun Time = 10 hoursError Profile = Indels Homopolymers
Ion Torrent = Desktop Sequencers for Low and High Sequence Output
Ion Proton I
PGMOutput 10-500M basesRead Length = 200 basesRun Time = 1-3 hoursError Profile = Indels Homopolymers
Output 10 G basesRead Length = 200 basesRun Time = 4 hoursError Profile = Indels Homopolymers
Coming soon:Proton II and III
300-400 base reads
OH H
5’
3’2’
OO
O-
P
O
O-
P
O
O-
PO
O OHAdenosine
dATP vs ATP
H H
5’
3’2’
OO
O-
P
O
O-
P
O
O-
PO
O OHAdenosine
ddATP vs dATP
Irreversible TerminatorSanger Sequencing
H
5’
3’2’
OO
O-
P
O
O-
P
O
O-
PO
O OHAdenosine
Reversible Terminators & Cleavable Fluorescent Tags
ON3
X
X
ON3
Solid Phase Amplification – Library DNA binds to Oligos Immobilized on Glass Flowcell Surface
V3 HiSeq
•Clusters are Linearized•Seq primer annealed•All four dNTPs added at each cycle•Error Profile = substitutions•Each dNTP has a different
**Fluorescent Tag**•Intensity of different Tags = Base call
Evolution of Solexa / Illumina Sequencing Platform
GA II (2006) HiSeq 2000 (2010)
V3
Output 30 - 40 Gb / laneRead Length = 100 bases SR
Or 2x100 PRAccommodates Dual Bar codes
Run Time = 2-14 daysError Profile = substitutions
HISeq 2500 = 2x150 (2x250)600 million reads / 39 hours
Output 1 million 1x36bp reads / laneImproved chemistry to 10 million / lanePaired end reads to 2x150bp
MiSeq – QC Libraries and 250bp Reads
V3 HiSeqMiSeq
V1 Runs 1x50bp + I or 2 bar codes (6 hrs)2x150bp + bar codes (28 hrs)
10M reads = 1G bases
V2 Runs – Use Top and Bottom of Lane2x250bp + bar codes (39 hrs)
15M reads = 7G bases
Accommodates Dual Bar Codes
•Uses single reagent cassette and buffer bottle•Same paired end libraries on all Illumina seqs•Has additional options for Base Space datastorage system and alignment software•Real time run monitoring and data sharing
Single Molecule Imaging: Heavy Metal Battle RoyaleShort Reads & High Output vs Long Reads & Low Output
Helicos Genetic Analysis System
14Company Confidential
SamplePreparation
dA Tailing TdT
HeliScope™Sample Loader
Oligo dT on Flowcell
HeliScope™Single Molecule
Sequencer
HeliScope™AnalysisEngine
>GATAGCTAGCTAGCTACACAGAGAT >GATAGACACACACACACACAGCGCA >GTACTACACACAGCGACACAGTCTA >GTCGAACACACATGAACACATGAGC >GTGTCACACACGACTACACATGCAT >TAGTGACACACGTAGACACGACAGT >TCTCGACACACTATCACACGACTCA>TGCACACACACTCGTACACGAGACG
Output
Does not useligation or PCRamplification
600 – 900Million AlignedBases per lane
X 50 lanes
20Tb
•33bp Avg Reads; 1-10 Gb; 8 day Run•Use Terminal transferse to add poly dA tail•Flows one nucleotide at a time – Error Profile = Indels•DNA quality not an important factor – ancient DNA•Can do Direct RNA Sequencing – 3’ ends•Custom Seq Capture Flowcells•Primarily a sequencing service company
Adapter
Mapped Read Length
Subreads
Pacific Biosciences RS: Real Time Movies of Nucleotide-binding by DNA Polymerase
PacBio Technology Makes Base Calls on How Long the Base Stays in the Active Site
Output = 50k Reads; 100 Mb per SMRT Cell (16 max per run)Read Length = 2000 basesRun Time = 90min per SMRT CellError Profile = Indels
Year Mean Mapped Subread
Mean Mapped
Readlength
Mapped Reads
per Cell
Mean Mapped SubRead Accuracy
Mapped bases
per cell
Movies per
SMRT cell
Max Time per
Movie
Strobe Seq
2011 470bp 550bp 14k 85% 5Mb 1 30min yes
2012 675bp 1914bp 48k 89% 92Mb 2 90min no
Update of PacBio Progress 2011 – 2012
Cost of $equencingReagentsLibrary ConstructionQuality AssessmentAccessory Equipment and SuppliesLabor to get samples on the machineMachine maintenance / service contractsComputational RequirementsData Storage
Technology Run Type Est Cost / MB
Roche 454 Full Plate 400bases $30
Ion Torrent PGM 316 chip 200bases $3
Illumina HiSeq 2x100 bases $0.01
Illumina MiSeq 2x150 bases $2
PacBio 45,000 / 90min cell $0.20
Two Strategies for Sequencing: Depth of Coverage vs Speed
Depth of Coverage (20-100 million reads with good quality scores) = Discoveryvs
Speed (1-24 hours run time) = Validation and Diagnosis
Accuracy / Seq Error Profiles / Bioinformatic Tools
Summary and Final Thoughts
Sequencing Technologies Keep Evolving
Plan your sequencing experiments based on the data set you needConsider size of data set, accuracy reads, cost and speed
Choose your platform appropriately
Work smarter – be imaginative and what seems impossible today can be the standard tomorrow