next generation sequencing platforms –sequencing by synthesis (sbs) 454/pyrosequencing...
Post on 19-Dec-2015
234 views
TRANSCRIPT
Next Generation Sequencing Platforms
– Sequencing by synthesis (SBS)• 454/Pyrosequencing• Illumina/Solexa• Helicos • Pacbio• (Charge-based detection system, Now-sequencing)
– Sequencing by hybridization• SOLiD
Pyrosequencing: Sequencing-By-Synthesis
CSB2008 August 2008 UCSC Sequencing Center
454 Sequencing SystemChemistry & PlatformSequencing-by-synthesisLibrary PreparationEmulsion Based Clonal Amplification
SOLiD BioanalyzerChemistry & PlatformLibrary ConstructionEmulsion PCR
ApplicationsFragment SequencingTranscriptome StudiesPaired-End SequencingTargeted sequencingSmall RNA
CSB2008 August 2008 UCSC Sequencing Center
1) Prepare Adapter Ligated ssDNA Library (A-[insert]-B) 2) EmPCR: Clonal Amplification on 28 µ beads followed by enrichment
4) Perform sequencing-by-synthesison the 454 Sequencer
3) Load beads and enzymes in PicoTiter Plate™
Overview of The 454 Sequencing System
CSB2008 August 2008 UCSC Sequencing Center
Mix DNA Library & capture beads(limited dilution)
Emulsion Based Clonal Amplification
“Break micro-reactors”Isolate DNA containing beads
• Generation of millions of clonally amplified sequencing templates on each bead• No cloning and colony picking
Create “Water-in-oil”
emulsion
+ PCR Reagents
+ Emulsion Oil
Perform emulsion PCR
Adapter carrying library DNA
A
BMicro-reactors
CSB2008 August 2008 UCSC Sequencing Center
Centrifuge Step
Load Enzyme Beads
44 μm
Load beads into PicoTiter™Plate
Depositing DNA Beads into the PicoTiter™Plate
CSB2008 August 2008 UCSC Sequencing Center
Sequencing By Synthesis
DNA Capture BeadContaining Millions of
Copies of a Single Clonal Fragment
A A T C G G C A T G C T A A A A G T C A
Anneal Primer
Simultaneous sequencing of the entire genome in hundreds of thousands of picoliter-size wells
Pyrophosphate signal generation
T
PPi
ATP
Light + oxy luciferin
Sulfurylase
Luciferase
APS
luciferin
Sequencing-By-Synthesis
CSB2008 August 2008 UCSC Sequencing Center
Sequencing Workflow Overview
Clonal amplification of fragments bound to beads in microreactors
One Bead
One Read 400,000 reads per run
One Fragment
Generation of small DNA fragments via nebulization
Ligation of A/B-Adaptors flanking single-stranded DNA fragments
Emulsification of beads and fragments in water-in-oil microreactors
Sequencing and base calling
Sample input: Genomic DNA, BACs, amplicons, cDNA
CSB2008 August 2008 UCSC Sequencing Center
Sequencing Workflow Library Preparation
8.0 h
7.5 h
4.5 h and 10.5 h
DNA library preparation and titration
emPCR Sequencing
sstDNA library created with adaptorsA/B fragments selected using streptavidin-
biotin purification
Genome fragmented by nebulization
CSB2008 August 2008 UCSC Sequencing Center
Sequencing Workflow Emulsion PCR
8.0 h
7.5 h
4.5 h and 10.5 h
DNA library preparation and titration
emPCR Sequencing
Emulsion-based clonal amplification
Anneal sstDNA to an excess of DNA Capture Beads
Emulsify beads and PCR reagents in water-in-oil microreactors
Break microreactors, enrich for DNA-positive beads
Clonal amplification occurs inside microreactors
CSB2008 August 2008 UCSC Sequencing Center
Sequencing WorkflowLoading of PicoTiterPlate
Device
• Well diameter: average of 44 µm• > 400,000 reads obtained in parallel• A single clonally amplified sstDNA
bead is deposited per well
Depositing DNA beads into the PicoTiterPlate device
Quality filtered bases
Amplified sstDNA library beads
CSB2008 August 2008 UCSC Sequencing Center
Sequencing WorkflowSequencing by Synthesis
• Bases (TACG) are flowed sequentially and always in the same order (100 times for a large GS FLX run) across the PicoTiterPlate device during a sequencing run.
• A nucleotide complementary to the template strand generates a light signal.
• The light signal is recorded by the CCD camera.
• The signal strength is proportional to the number of nucleotides incorporated.
8.0 h
7.5 h
4.5 h and 10.5 h
DNA library preparation and titration
emPCR Sequencing
FlowgramKey sequence
CSB2008 August 2008 UCSC Sequencing Center
GS FLX Data AnalysisFlowgram Generation
Flowgram
Key sequence = TCAG for signal calibration
Flow Order
1-mer
2-mer
3-mer
4-mer
TACG
TTCTGCGAA
CSB2008 August 2008 UCSC Sequencing Center
GS FLX Data AnalysisOverview
GS De Novo Assembler
GS Reference Mapper
GS Amplicon Variant Analyzer
Image capture
Image processing
Signal processing GS Run Browser
T
AG
CT
T
AG
CT
CSB2008 August 2008 UCSC Sequencing Center
GS FLX System PerformanceRead Length
CSB2008 August 2008 UCSC Sequencing Center
The Genome is comprised of repeat regions
• Depending upon the specific genome characteristics, microreads (~25 bp’s) cover only a portion of the genome– In human – 25 base pair reads can only be mapped
uniquely to 80% of the genome
• Short reads are limiting in known genomes – What about unknown genomes?
– Mapping versus de novo assemblies– Mapping will miss genome rearrangements– Mapping is only as good as the reference
CSB2008 August 2008 UCSC Sequencing Center
Why Does Length Matter?
• Longer sequencing reads mean more applications
– Identify and characterize small and short RNA’s– Full length cDNA sequencing for expression levels and variations– Amplicon resequencing for genetic variation including somatic mutations– Sequencing of micro-organisms in a single instrument run– Sequencing of complex genomes – mammalian & plant– Sequencing of complex samples – Metagenomics, Ancient DNA
CSB2008 August 2008 UCSC Sequencing Center
Longer sequencing reads mean more applications
– HIV Studies (3) – ChIP-Sequencing (8)
• Boyle et al, Cell: Mixed technologies for mapping open chromatin– Metagenomics (12)
• Palacios et al, New England Journal of Medicine, Pathogenic Virus Detection
– Whole Genome Sequencing (30) • Velasco et al, PLoS: Pinot Noir Genome
– Paired-End sequencing• Detecting Structural Variations across two human genomes
– Technology and Bioinformatics (11)• Meyer et al, NAR,: Using Picogram quantities of sample
– Transcriptome studies – cDNA (17)– Small RNA (32)– Amplicon and Methylation Studies (9)
CSB2008 August 2008 UCSC Sequencing Center
Applications of Whole Genome, Ultra Broad and Ultra Deep HT- Sequencing
HT- Sequencing Technology Applications
Whole Genome Sequencing
VirusBacteriaFungusHigher EukaryotesHuman
Ultra DeepSequencing
Population Biology
HIV
Bacterial 16S
ResistanceTropism
Amplicons
Ultra Broad Sequencing
Small RNASAGE/CAGE
Metagenomics
Expression
Novel strain ID
Transcriptome
HLA Typing
CSB2008 August 2008 UCSC Sequencing Center
The power of Metagenomics
• How to Identify an environment based upon the microbial organisms that are present– Microbial Population Structures in the Deep Marine Biosphere
• Huber et al., Science, 318, p97, 2007
• Determining the state of an environment based upon the presence and mixture of microbial organisms
– The interdependence of Coral and it’s microbial environment• Wegley et al., Environmental Microbiology, 9, p2707, 2007
• Detecting viral pathogens – quickly and accurately – Less than 12 months from first identification of affected hives to possible
pathogen• Cox-Foster et al., Science, 2007
– Transplant victims from Australia• Palacios et al, New England Journal of Medicine, 2008
CSB2008 August 2008 UCSC Sequencing Center
Transcriptome AnalysisWorkflow Comparison
GS FLX (clonal sequencing ensured through emPCR)
Sanger (E. coli cloning, often concatemerization)
cDNA libraries(short tag library, EST library)
Concatemerization,insert fragments into vectors and clone into bacteria
Grow, pick colonies
Template Generation
Sequencing
Time: Weeks
emPCR
Sequencing Time: Days
cDNA libraries(short tag library, EST library)
CSB2008 August 2008 UCSC Sequencing Center
• Sequencing of approximately 400,000 small RNAs from C. elegans
• Another 18 unknown miRNA genes were detected
• Thousands of endogenous siRNAs acting preferentially on transcripts associated with spermatogenesis and transposons were identified
• A new class of small RNAs was identified: 21U-RNAs. They all begin with an U and are precisely 21 nt long.
CSB2008 August 2008 UCSC Sequencing Center
Multiplex Identifier Basics• What is it?
– Two new kits, each with 6 different library adapters (total of 12 adapters)– Each MID library adapter has an added, specially encoded 10-base region– Used to “bar-code” up to 12 different genomic library samples to be run in
the same region of a single sequencing run
Primer A MID 1Key Library fragment Primer B
Seq. primer Read
#bases: 15 4 10
Primer A Key Library fragment Primer B#bases: 40 4
MID Library
Standard Library Seq. primer Read
Primer A MID 2Key Library fragment Primer B
Primer A MID nKey Library fragment Primer B
CSB2008 August 2008 UCSC Sequencing Center
Paired-End Applications
• ~100 bp sequencing tags separated by 3 kb spacing• Use for de novo assembly
– Order contigs• Use for Structural Variation studies
– Inversions, Deletions, Insertions…– High resolution detection – 3kb spacing vs 10 to 40 kb
CSB2008 August 2008 UCSC Sequencing Center
Paired-Ends workflow
CSB2008 August 2008 UCSC Sequencing Center
Targeted Enrichment of Human gDNA
gDNAExon 1 Exon 2 Exon 3 Exon 4 Exon 5
Fragment and hybridize to NimbleGen capture array
HT-SequencingAnalyze
Exon Sequences
Elute
CSB2008 August 2008 UCSC Sequencing Center
Sequencing all the known exons from the human genome
• “Direct selection of human genomic loci by microarray hybridization,” – Albert et al., Nature Methods, (4) 11, 903 -905, 2007
• ~6,700 gDNA loci selected• BRCA1 region
2 MB Region
CSB2008 August 2008 UCSC Sequencing Center
Another Sequence-Capture Example
• 19 Kb region from Chromosome 4
GS FLX Seq Reads
Sequencing Coverage
Seq-Cap Array Probes
Targeted Exons
CSB2008 August 2008 UCSC Sequencing Center
SOLiD Library Preparation
The SOLiD system uses either a fragment library or a mate-paired library depending on the user’s desired information or application.
CSB2008 August 2008 UCSC Sequencing Center
Emulsion PCR and Bead Enrichment
PCR takes place in oil in water microreactors. Post-PCR, templated beads are separated from non-templated beads, and modified at the 3’ end to allow covalent
linkage to the SOLiD sequencing slide.
CSB2008 August 2008 UCSC Sequencing Center
Bead Deposition
Beads are deposited into 1,2,4, or 8 segmented chambers on a slide.
CSB2008 August 2008 UCSC Sequencing Center
Sequencing By Ligation and Data Analysis
Primers hybridize to adaptors and a set of 4 dye labeled probes competes for ligation to the primer with probe specificity determined by the 4th and 5th base interrogation
during each ligation series, for 5-7 rounds.
After each round of ligation, a new primer offset by 1 base is hybridized for a new round of ligations. 25-35bp are generated through 5 sequential primer reset and
ligation rounds.
CSB2008 August 2008 UCSC Sequencing Center
Library Construction
2 different libraries can depending on the application and desired information.
CSB2008 August 2008 UCSC Sequencing Center
Fragment Library
DNA is fragmented and PCR primer adaptors are ligated to the DNA
CSB2008 August 2008 UCSC Sequencing Center
Mate-Pair Library
DNA is sheared, selected for a desired input size, and circularized around an internal adaptor.
CSB2008 August 2008 UCSC Sequencing Center
Mate-Pair Library (cont.)
The circularized DNA is enzymatically cleaved to yield 2 DNA fragments separated by an internal adaptor. PCR primer adaptors are ligated on to the end of this piece of DNA.
CSB2008 August 2008 UCSC Sequencing Center
Emulsion PCR (ePCR)
PCR takes place in oil in water microreactors containing P1-coupled beads, templates,
primers, and all required PCR reaction components..
CSB2008 August 2008 UCSC Sequencing Center
ePCR
CSB2008 August 2008 UCSC Sequencing Center
Emulsion PCR yields both monoclonal (unique templates and polyclonal beads (multiple templates), as well as some non-templated beads.
CSB2008 August 2008 UCSC Sequencing Center
Clonal Amplification
CSB2008 August 2008 UCSC Sequencing Center
Post-Emulsion and ePCR
CSB2008 August 2008 UCSC Sequencing Center
Bead Enrichment
Templated beads are separated from non-templated beads via polystyrene beads
CSB2008 August 2008 UCSC Sequencing Center
Pre- and Post-Bead Enrichment P2-hybridization
CSB2008 August 2008 UCSC Sequencing Center
Bead Deposition
Templated beads are modified at their 3’-end and covalently attached to a glass slide.
CSB2008 August 2008 UCSC Sequencing Center
Slide Configurations
CSB2008 August 2008 UCSC Sequencing Center
CSB2008 August 2008 UCSC Sequencing Center
SOLiD Sequencing Chemistry
CSB2008 August 2008 UCSC Sequencing Center
4-color Ligation Reaction
CSB2008 August 2008 UCSC Sequencing Center
A complementary dye-labeled probe hybridizes is ligated to the universal sequencing primer.
CSB2008 August 2008 UCSC Sequencing Center
De-phosphorylation
A phosphatase cleaves the 3’-phosphate from the universal sequencing primer preventing ligation to templates where no probe was ligated.
CSB2008 August 2008 UCSC Sequencing Center
Visualization
The fluorescently labeled dyes attached to each probe are visualized.
CSB2008 August 2008 UCSC Sequencing Center
Cleavage
Fluorescent dye labeled nucleotides are cleaved from hybridized probe.
CSB2008 August 2008 UCSC Sequencing Center
2nd Ligation Cycle
The next set of labeled probes hybridize and are ligated to the previously hybridized probe.
CSB2008 August 2008 UCSC Sequencing Center
2nd Cycle Visualization
CSB2008 August 2008 UCSC Sequencing Center
2nd cycle cleavage
CSB2008 August 2008 UCSC Sequencing Center
Every 5th base is interrogated
CSB2008 August 2008 UCSC Sequencing Center
Reset
Template is stripped
CSB2008 August 2008 UCSC Sequencing Center
1st cycle after reset
A new set of universal sequencing primers is hybridized, offset by (n-1).
CSB2008 August 2008 UCSC Sequencing Center
1st cycle after reset
CSB2008 August 2008 UCSC Sequencing Center
2nd round
CSB2008 August 2008 UCSC Sequencing Center
Sequential Rounds of Sequencing with Fragment Library
In each round of sequencing ………….??????????????
CSB2008 August 2008 UCSC Sequencing Center
Sequential Rounds of Sequencing with Mate-Paired Library
In each round of sequencing ………….??????????????
CSB2008 August 2008 UCSC Sequencing Center
Di-Base Encoding
CSB2008 August 2008 UCSC Sequencing Center
Advantages of di-Base Encoding
CSB2008 August 2008 UCSC Sequencing Center
Advantages of di-Base EncodingReal SNPs
CSB2008 August 2008 UCSC Sequencing Center
Advantages of di-Base EncodingMiscall
CSB2008 August 2008 UCSC Sequencing Center
Only Certain Transitions are Allowed for Real SNPs
CSB2008 August 2008 UCSC Sequencing Center
Only Allowed Transitions
CSB2008 August 2008 UCSC Sequencing Center
Two color changes not allowed
CSB2008 August 2008 UCSC Sequencing Center
Benefits or solid
CSB2008 August 2008 UCSC Sequencing Center
Thank you for your attention