genome biology for programmers lecture series: illumina sequencing
DESCRIPTION
Genome Biology for Programmers Lecture Series: Illumina Sequencing. Chris Daum JGI Illumina Group Lead April 1, 2011. Outline. Workflow Overview Process Science Sample Prep & qPCR quantification Cluster Generation Sequencing Sequencer instruments: GA & HiSeq Illumina Developments - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/1.jpg)
Genome Biology for Programmers Lecture Series: Illumina Sequencing
Chris DaumJGI Illumina Group Lead
April 1, 2011
![Page 2: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/2.jpg)
Outline
• Workflow Overview• Process Science
– Sample Prep & qPCR quantification– Cluster Generation– Sequencing
• Sequencer instruments: GA & HiSeq• Illumina Developments• Illumina quality & continuous improvement
![Page 3: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/3.jpg)
Illumina Workflow
Sample Preparation
AnalysisAnalysisClustering SequencingSample Quantification
![Page 4: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/4.jpg)
Sample PreparationLibrary Preparation – Main Goals:
• Prepares sample nucleic acids for sequencing
• Many library types and creation procedures exist
• However, all preparation results in the same general template structure:– Double-stranded DNA flanked by two different adapters– Variables include:
• Sequencing Application & Starting material (e.g. gDNA, mRNA, Mate Pair, Active Chromatin, ChIP-Seq)
• Insert Size• Adaptor type• Index for multiplexing
![Page 5: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/5.jpg)
Example Sample Prep Workflow:TruSeq Paired-end Library
DNA RNA
![Page 6: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/6.jpg)
Library Quantification - qPCR• Real-time qPCR allows accurate quantification of DNA templates:
– qPCR is based on the detection of a fluorescent reporter molecule that increases as PCR product accumulates with each cycle of amplification
– By using primers specific to the Illumina universal adapters in a qPCR reaction containing library template, only cluster-forming templates will be amplified and quantified
![Page 7: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/7.jpg)
Library Quantification - qPCR
Take home: qPCR mimics what is happening on the surface of the flowcell during cluster generation and allows for determining optimal loading concentrations.
Phases of qPCR: Geometric phase – amplicons doubling every cycle; greatest precision & accuracy for quantitation
Cq – Cycle of Quantification
Threshold of florescence for amplicon to produce a Cq
Cycl
e Th
resh
old
Log initial concentration
Plot Standard curve using controls and determine concentration of library
![Page 8: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/8.jpg)
Cluster Generation
• Process occurs on cBot instrument:
– Aspirates DNA samples into flow cell
– Automates the formation of amplified clonal clusters from the DNA single molecules
– 1000x amplification generates clusters
– Hybridizes sequencing primer(s)
![Page 9: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/9.jpg)
Illumina cBot• Cluster Generation 2.0
– Automated system significantly reduces workload for generation of flowcells
– Compact design saves lab space
– Reagent cartridge reduces prep time
![Page 10: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/10.jpg)
Flowcell
![Page 11: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/11.jpg)
Cluster Generation Prep• Prepare reagents and denature & dilute library:
• The goal is to have the perfect cluster density to maximize yield (bp), this is achieved via optimized loading concentrations as determined by qPCR
• Considerations:– Too low density: Fewer clusters, less sequence generated– Too high density: Overlapping clusters, removed by analysis filters, poor quality
![Page 12: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/12.jpg)
Cluster Generation Chemistry
• Cluster generation Chemistry:– Hybridization– Amplification– Linearization– Blocking– Primer hybridization
![Page 13: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/13.jpg)
Cluster Generation Chemistry
• Hybridize Sample fragments & extend:
![Page 14: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/14.jpg)
Cluster Generation Chemistry• Bridge Amplification:
![Page 15: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/15.jpg)
Cluster Generation Chemistry
• Linearization, Blocking & Sequencing Primer Hybridization:
![Page 16: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/16.jpg)
Sequencing
• Main Goals:
– Translate the chemical information of the nucleotides into fluorescence information which can be captured optically
– The optical information is then transformed into text, which can be searched, aligned, or otherwise mined for biologically relevant data
![Page 17: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/17.jpg)
Sequencing Workflow
HiSeq Run Type
Approx. Run Days
1x50 Flowcells 22x100 Flowcells 92x150 Flowcells 13
![Page 18: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/18.jpg)
Sequencing by Synthesis
• Clustered Flowcell is loaded on Illumina sequencer:
![Page 19: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/19.jpg)
Sequencing Chemistry: First Cycle Base Incorporation
• To initiate the first sequencing cycle, add all 4 fluorescently labeled reversible terminators and DNA polymerase enzyme to the flowcell.
• The complementary nucleotide will be added to the first position of each cluster.
• A laser is then used to excite the attached fluorophore.
![Page 20: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/20.jpg)
Sequencing Chemistry: First Cycle Imaging
![Page 21: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/21.jpg)
Sequencing Chemistry: Cycle 2 and so on…
![Page 22: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/22.jpg)
Sequencing Read 2
• Resynthesis of second strand for Read 2 occurs on sequencer without removing flowcell:
Paired-End Sequencing: When performing a paired-end run, after the initial cycles (Read1), an additional cluster generation is performed on the analyzer, and the template is sequenced in the opposite direction, as depicted in the figures below.
![Page 23: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/23.jpg)
Index for Multiplex Sequencing• Sample multiplexing involves 3 reads:
– A: Sample Read 1 is sequenced
– B: Read 1 product removed and Index Read is sequenced
– C: Template strand used to generate complementary strand, and sample Read 2 is sequenced
• Analysis software identifies the index sequence from each cluster so that the sample reads 1 & 2 can be assigned to single sample
![Page 24: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/24.jpg)
Illumina HiSeq2000 Sequencer
Nifty Lights
![Page 25: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/25.jpg)
HiSeq2000 Reagents
![Page 26: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/26.jpg)
1 HiSeq = 2 GAs
![Page 27: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/27.jpg)
HiSeq2000 Fluidics
Fluidics were the Achilles heel of the GA, and now 2X in the HiSeq
![Page 28: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/28.jpg)
HiSeq2000 Fluidics
![Page 29: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/29.jpg)
FY11 Service Metrics: Pareto
29
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0
1
2
3
4
5
6
7
8
9
Cum
ulati
ve P
erce
nt
Incid
ents
Service Request Categories
Pareto: FY11 Service Requests
![Page 30: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/30.jpg)
HiSeq: Temperature control
• 3 mechanisms:– Heat extraction via liquid coolant– Flow cell temperature control via Peltier– Maintain reagents temperature via cooled compartment
Flow cell sits on Peltier blocks, and is water cooled (heat extraction from underneath)
Reagent Chiller:• All reagents cooled at 4C• Condensation Pump runs every 4 min
for 30 sec
![Page 31: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/31.jpg)
HiSeq Flowcell Loading
![Page 32: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/32.jpg)
HiSeq Imaging
![Page 33: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/33.jpg)
HiSeq Optics
![Page 34: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/34.jpg)
HiSeq Lasers
![Page 35: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/35.jpg)
HiSeq Software Interface
![Page 36: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/36.jpg)
HiSeq Software Interface
![Page 37: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/37.jpg)
HiSeq – Real Time Metrics
![Page 38: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/38.jpg)
HiSeq vs GA
![Page 39: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/39.jpg)
Cost & Throughput Comparison
Run Type 1x36 2x36 2x76 2x150 1x50 2x50 2x100 2x150
Seq Prep Reagents 2,292$ 4,012$ 4,012$ 4,012$ 2,442$ 3,747$ 3,747$ 3,737$ Seq Reagents 864$ 1,728$ 3,456$ 6,912$ 1,436$ 2,872$ 5,175$ 6,611$
Seq Prep & Seq Total 3,156$ 5,740$ 7,468$ 10,924$ 3,878$ 6,619$ 8,922$ 10,348$
Avg. Bases (Gb) 8.0 19.1 35.9 70.4 20.8 41.6 83.3 124.9Avg. Reads (Millions) 222.2 265.0 236.3 234.6 416.0 416.0 416.4 416.3
Cost per lane 451$ 820$ 1,067$ 1,561$ 554$ 946$ 1,275$ 1,478$ Cost per 1 Gb 395$ 301$ 208$ 155$ 186$ 159$ 107$ 83$
Cost per Million reads 14$ 22$ 32$ 47$ 9$ 16$ 21$ 25$
GAIIx HiSeq
Notes:•Throughput metrics are averages from runs performed in FY11 for each of the run types to date•Italicized HiSeq Bases & Reads throughput metrics are estimates based on 2x100 run type since we have limited data on other run types•Only vendor reagent costs shown here; library creation and overhead costs are not included, but are roughly equal and are mostly independent of run type•Cost per million reads goes up with the longer run types, but the readlength increases as well and this makes each read more valuable for some assembly applications•HiSeq 2x150 run type not yet supported & the current HiSeq chemistry has worse quality beyond 80-100bases than compared to GA•The HiSeq platform is still new and we are experiencing a higher number of hardware failures than GA; Illumina does replace reagents for failed runs and we rerun failed flowcells immediately whenever possible.
![Page 40: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/40.jpg)
HiSeq Development
40
Coming in early Summer:
![Page 41: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/41.jpg)
HiSeq Development
41
![Page 42: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/42.jpg)
HiSeq Development
42
![Page 43: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/43.jpg)
Introducing MiSeq
43
![Page 44: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/44.jpg)
MiSeq: all-in-one
44
![Page 45: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/45.jpg)
MiSeq: Fast, low throughput
45
![Page 46: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/46.jpg)
Providing Quality Sequence
0%
10%
20%
30%
40%
50%
60%
3-May 10-May 17-May 24-May 31-May 7-Jun 14-Jun 21-Jun 28-Jun 5-Jul 12-Jul 19-Jul 26-Jul 2-Aug
% o
f att
empt
s tha
t fai
led
Illumina Process Metrics by week: Cluster & Run failure rates
Cluster Failures
Run failures
0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0% 90.0% 100.0%
Illumina12
Illumina11
Illumina10
Illumina09
Illumina08
Illumina07
Illumina06
Illumina04
Illumina03
Illumina02
*Illumina05
*Illumina01
FY10 Q3 Illumina Utilization
Problematic instruments with multiple run failures; 06 is being replaced & 07 had significant service work
Incident Reporting & Resolution (JIRA)
Instrument Status & real-time run monitoring
Instrument Utilization & Efficiency
Throughput Goals & Metrics
Failure Tracking & SPC Charts; RQC
Troubleshooting Procedures
163
489
0
50
100
150
200
250
300
350
400
450
500
Flow
cells
FY11 Cumulative Flowcells
FC Cumulative
FC Goal
7993.547
39696.000
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
45,000
Tota
l Bas
es (
Gb)
FY11 Cumulative Bases (Gb)
Bases Cumulative
Base Goal
Continuous Improvement - Lean Six Sigma
![Page 47: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/47.jpg)
LLNL – Six Sigma Training• Tools and methodologies to:
– Improve work quality– Improve process efficiencies & eliminate waste– Improve employee and customer satisfaction
• Lean Six Sigma is about:– Eliminating waste and improving process flow– Focusing on reducing variation and improving process yield
by following a problem-solving approach using statistical tools
![Page 48: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/48.jpg)
What is Six Sigma?• A Six Sigma process is literally one that’s
statistically 99.99966% successful.
• This is not always cost effective to achieve, so as a methodology it’s about gaining control of a process and implementing improvements.
![Page 49: Genome Biology for Programmers Lecture Series: Illumina Sequencing](https://reader030.vdocument.in/reader030/viewer/2022013012/5681672f550346895ddbd62b/html5/thumbnails/49.jpg)
What is Six Sigma?• Six Sigma is a data driven problem solving approach
where process inputs (Xs) are identified and optimized to impact the output (Y)
• The output is a function of the inputs and process– Y: Output– f: function– X: variables that must be controlled to consistently predict Y
Y = f(x)