cost-effective optionseffective options using ... cost... · cost-effective optionseffective...
Post on 11-Sep-2018
223 Views
Preview:
TRANSCRIPT
Cost-Effective Options using SureSelectXT SystemCost Effective Options using SureSelect System in Exome Sequencing and Targeted Applications
XTXT
Josh Wang, Ph.D.Field Application ScientistField Application ScientistFeb 8th, 2011
Cost-effective Options using SureSelectXT System
• Introduction to SureSelectXT Target Enrichment System
• SureSelectXT All Exon Kits and MultiplexingSureSelect All Exon Kits and Multiplexing
• SureSelectXT Targeted Kits and Multiplexing
• Custom capture kitsp
• Human Kinome kits (DNA capture & RNA capture)
• Bravo Automation for Library Preparation and Capture
SureSelectXT Target Enrichment SystemAddresses a major workflow bottleneck with a complete offering
gDNA Extraction
Library Prep
j p g
<3µg
SureSelectXT
Library Prep- low DNA input (<3ug)- high-fidelity Herculase - automatable
0.5µg
24 hoursSureSelect Captureb itbaits- cRNA probes - long (120 bases) - biotin labeledfavorable RNA-DNA hyb- fast 24 hr hyb- automatable
qPCR/BioA QC Protocols for Bravo automationProtocols for Bravo automation
SureSelectXT Target Enrichment System
Illumina GAIIxIllumina HiSeq SOLiD 4 Roche 454SOLiD 5500XL
Illumina ABI SOLiD Roche 454Illumina ABI SOLiD Roche 454Exome and Targeted SureSelect yes yes yesmultiplexing yes yes no
SureSelectXT Target Enrichment Kit ConfigurationsTarget amountProduct Target amount
(catalog number) Product Definition
Human All Exon v1 38 Mb CCDS Sept. 2008
CCDS Sept 2009Human All Exon v2 44 Mb CCDS Sept. 2009 + additional RefSeq
Human All Exon 50Mb 50 Mb GENCODE content – most comprehensive coverage
Human All Exon Plus 38-50 Mb + up to 6.8 Mb of custom content
add custom content to All Exon catalog content
Mouse All Exon 50 Mb mm9 build, Ensembl + RefSeq
Kinome 3.2 Mb kinases and kinase related genes
RNA capture Kinome 3.4 Mb transcripts for the kinome
<0.2 Mb,
Indexed custom content
0.2-0.49 Mb, 0.5-1.49 Mb, 1.5- 2.9 Mb 3 – 6.8 Mb
cost-saving custom offering, up to 20 Mb content in one custom library on
request
Comparison: Target Enrichment System Applications
Application Agilent SureSelectXT Brand N Brand I
Exome Sequencing All Exon, All Exon v2, All Exon 50Mb, Exome v1, ExomeExome Sequencing All Exon Plus; mouse All Exon Exome v2 Exome
GWAS follow-up custom SureSelectXT chip format only
Cancer Biomarker Discovery/Profiling Kinome DNA and RNA capture
Targeted Kinome RNA capture, custom RNA gtranscriptome
p ,SureSelect
Clinical Research custom SureSelectXT (DNA, RNA)
Microbial/viral seqdiscovery custom SureSelectXT
All A il t S S l tXT kit i l ti b d d t ti f i dl• All Agilent SureSelectXT kits are in-solution based and automation-friendly
SureSelect eArray Web for Target Enrichment Applications
Cost-effective Options using SureSelectXT System
• Introduction to SureSelectXT Target Enrichment System
• SureSelectXT All Exon Kits and MultiplexingSureSelect All Exon Kits and Multiplexing
• SureSelectXT Targeted Kits and Multiplexing
• Custom capture kitsp
• Human Kinome kits (DNA capture & RNA capture)
• Bravo Automation for Library Preparation and Capture
SureSelectTM Human All Exon Kit Comparison
All Exon v1 All Exon v2 All Exon 50Mb
CCDS Sept 2008
CCDS Sept. 2008 + additional RefSeq content GENCODE and Sanger
(includes CCDS and BroadCCDS Sept. 2008 qincluding CCDS Sept. 2009
exons
(includes CCDS and Broad defined v2 content as well)
Target Size 38Mb 44Mb 50MbCCDS (Nov. 2010) 89.6% 98.2% 99.5%CCDS (Nov. 2010) 89.6% 98.2% 99.5%RefSeq Genes (Nov. 2010) 85.9% 96.9% 99.0%
miRNA (miRBase v14) 90.0% 90.0% 92. 8%Ensembl (Aug 2010) 79 9% 90 9% 96 2%Ensembl (Aug. 2010) 79.9% 90.9% 96.2%GENCODE v4 80.9% 92.9% 97.0Addition of Custom Exome (<6.8 Mb) Yes Yes Yes
Developed with Broad Broad Sanger
• exome is ~1.5% of the genome and is primarily protein-coding regions• Mendelian and complex disease research can benefit from exome sequencingp q g
All Exon Kit Performance: Accuracy and Repeatability
human all exon v12x76bp ~ 4Gb
val(
sam
ple1
)
180160
~70% on-target~90% targeted bases at >10X~80% targeted bases at >20X
2dept
hpe
rint
erv 140
120
100
80
DNA Sample NA18507 NA10831Hom Het Hom Het
T t l SNP 49 556 10 195 50 174 9 637 R2 = 0.956
age
Cov
erag
ed
6040
20
0
Total SNPs 49,556 10,195 50,174 9.637% Covered (> 1 read) 99.30 99.22 99.37 99.19
% Called 95.33 81.90 96.00 83.36% Concordant
Ave
ra 00 20 40 60 80 100 120 140 160180
Average Coverage depth per interval (sample2)
% Concordant Calls 99.56 99.94 99.58 99.96
Human All Exon 50Mb – 5Gb coverage
Most comprehensive Human All Exon content available
Sequencing capacity: • 0.5-1 sample / lane GAIIx• 1-3 samples / lane HiSeq• 1-3 samples / lane HiSeq• 5-10 samples /flowcell SOLiD4
Chemistry recommended:• PE 2x76bp Illumina• PE 50+25 SOLiD
Multiplexing:Multiplexing:• Illumina• SOLiD
Page Page 1111 Agilent SureSelect™ PlatformEnabling Products for the
Comparison of SNP calls with HapMapGenotype Concordance vs HapMapG t S iti it H M
99.1% 99.2%98.4% 98.0%
95.7% 94 9%
100%99.8% 99.7%
98.2% 98.1%98.5% 98.3%99.4% 99.3%
95%
100%
Genotype Concordance vs. HapMapGenotype Sensitivity vs. HapMap
94.9%
90%
95%
90%
95%
80%
85%
80%
85%
70%
75%
70%
75%
70%Human All Exon v2 Human All Exon
50MbGT is REFGT is variant HOM
Human All Exon v2 Human All Exon 50Mb
GT is REF GT is variant HOMGT i i t HET OVERALLGT is variant HET GT is variant HET OVERALL
De novo mutations of SETBP1 cause Schinzel-Giedion syndrome
• Exome sequencing used to elucidate the causative mutations for a dominant Mendelian disorder• Exomes of 4 affected individuals were enriched using gSureSelect All Exon kit and subjected to SOLiD sequencing• Achieved 65-72% on target and ~ 85% of targeted bases at 10x depth (average depth at 43x)• A number of prioritization steps resulted in 12 unknown• A number of prioritization steps resulted in 12 unknown variants in 2 genes
• One gene, CTBP2, was determined to be a false positive p• The second candidate, SETBP1, contained 4 different variants in a 11 bp region (affects 3 out of 4 highly consecutive amino acids)
Sequencing in another 11 clinical samples confirms• Sequencing in another 11 clinical samples confirms SETBP1 mutations in all samples
Nature Genetics 2010
SureSelect All Exon Publications
d t… and many more to come
SureSelect Human All Exon 50Mbmost comprehensive coverage in the market
SureSelect All Exon 50Mb Brand N Brand I
CCDS (11/2010) 99 5% 98 4% 95%CCDS (11/2010) 99.5% 98.4% 95%
RefSeq (11/2010) 99.0% 98.2% 91%
GENCODE v4 97.0% ? ?
target regions size 50Mb 44 Mb 62 Mb (?)
bait (probe) length 120 80-85 95
Insert size 150-200 bp 150-250 bp 300-400 bp
Hyb. Time 24 hrs 72 hrs 24 hrs + 24 hrs
Seq. output required 4-5 Gb ~ 5 Gb 10-12 Gb
Advantages of SureSelect All Exon Sequencing
• Average human exon size is ~ 150 bp
• SureSelect protocol recommends DNA shearing size to be 150-200 bp, and sequencing reads at 2x76 bp, matching nicely with average exon size
~ 150 bp
with average exon size
• An exome DNA library with 300-400 bp shearing size results in off-target sequencing and additional sequencing cost
Exon
Exon2x76 bp
and additional sequencing cost
• SureSelect All Exon Kits requires only ~ 5Gb sequencing, allowing more exomes per run and reducing sequencing cost accordingly asExon
Exon
2x76 bp
2x150 bp
and reducing sequencing cost accordingly as sequencing is the most expensive part of the workflow
Expanding Exome Design into Animal ModelsExon definition derived from Ensembl + RefSeqExon definition derived from Ensembl RefSeqDesigned against mm9 reference from UCSCNumber of genes covered: 24,306Number of exons: 221,784T t l i f d i 50 MB
Mouse All Exon
90%100%
Performance Mouse All Exon Design with 5Gb sequencing
Total size of design: 50 MB
50%60%70%80%
%
0%10%20%30%40%
0%% Reads On-
Target +/- 200bpUniformity (3/4
mean with upper tail)
% Bases 1X Coverage
% Bases 10X Coverage
% Bases 20X Coverage
Promega C3H DBA PWK 15NIH36a 15NIH49a
Multiplexing Option for SureSelect All Exon Kit
HiSeq2000 GAIIx SOLiD 5500xl SOLiD 4
Read length Output Output Output (nano)
Output (micro) Outputg p p (nano) (micro) p
PE (2x76 bp) < 150 Gb < 45 GbPE (75/35 or
50/25 b ) < 270 Gb <160 Gb < 80 Gb50/25 bp) 270 Gb 160 Gb 80 Gb
exomes/run ~ 30 ~ 9 ~ 54 ~ 32 ~ 16lanes 16 8 12 12 2 slides
throughput (Gb/day) < 25 < 7.5 < 40 < 23 < 7
exomes/day ~ 5 ~ 1.5 ~ 8 ~ 4.5 ~ 1.4multiplexing supported 12 12 16 16 16
• assuming ~ 5Gb sequence output needed per exomeassuming 5Gb sequence output needed per exome
Cost-effective Options using SureSelectXT System
• Introduction to SureSelectXT Target Enrichment System
• SureSelectXT All Exon Kits and MultiplexingSureSelect All Exon Kits and Multiplexing
• SureSelectXT Targeted Kits and Multiplexing
• Custom capture kitsp
• Human Kinome kits (DNA capture & RNA capture)
• Bravo Automation for Library Preparation and Capture
eArray web/eArray XD for SureSelect Custom Design
• eArray/eArray XD are free tools to design and ordercustom Microarrays and SureSelect libraries
• customer is the owner of a custom design• SureSelect enrichment on eArray web features:
• search existing designs/baits• download design filesdownload design files• share designs• upload custom bait designs• add supplemental baits to catalogue content
• eArray XD is a desktop version with extra flexibility• support bait design on custom genome• support multiple genome builds of same speciesspecies• has companion SureSelect Quality Analyzer
SureSelect Custom Design in eArray WebsiteSearch by gene accession number, chromosomal location
SureSelect Custom Design in eArray Website
1. Enter a name for the design job.
2. Design Strategy: To use the parameters previously optimized for general bait tiling, leave the checkmark on
this option. To change the parameters, uncheck this option.
3. Species: Select the species for which the target intervals were designed. The Genome Build will
then automatically be populated by eArray.
List of speciesList of species eArray supports
6.Submit:Select Submit when
Options and Details are completed
4. Genomic Target Intervals: Either type in or upload the genomic intervals for the targets to be enriched. If using exon or
interval finders they show up automatically
5. Genomic Avoid Intervals:• Choose to avoid the standard repeat masked regions by leaving a checkmark next to this option (based on the UCSC RepeatMasker track)• Add additional intervals to avoid with the baits by completed.interval finders, they show up automaticallyformat as the target intervals.typing those intervals in or uploading them in the same format as the target intervals.
Detecting inherited mutations for breast and ovarian cancer using genomic capture and massively parallel sequencing
• a genomic assay using custom SureSelect to capture and detect 21 genes, including BRCA1 and BRCA2, with inherited mutations predisposing to breast or ovarian cancer
• zero false-positive calls of nonsense frameshift mutations or genomic rearrangements for any gene in any test sample while detecting all classes of mutation including largefor any gene in any test sample, while detecting all classes of mutation including large deletions and duplications in the range of 160 bp to 101K
• will enable widespread genetic testing and personalized risk assessment for breast and ovarian cancer
PNAS 2010
SureSelect Custom Design: Targeted Next-Generation Sequencing of a Cancer Transcriptomeq g p
• designed bait for 467 cancer-related genes
• constructed K 562 CML cDNA library• constructed K-562 CML cDNA library
• SureSelect resulted in huge increase in specificity, 98% reads mapping to target transcripts vs 5% without hyb selectiontranscripts vs 5% without hyb selection
• improved detection of SNPs, splicing variants and novel fusion transcripts
• suitable for large-scale tumor-profiling studies in clinical or research settings
Genome Biology 2009
Multiplexing Option for Custom/Targeted Kits• Pay only for the region you capture for cost saving• Pay only for the region you capture for cost-saving
• < 0.2 Mb• 0.2 - 0.5 Mb • 0.5 - 1.5 Mb • 1.5 - 3 Mb • 3 - 6.8 Mb
• For optimal performance• CaptureCapture• Index• Pool (qPCR and BioAnalyzer quantitation)• Sequence indexed samples
C bi lti l l i l / d• Combine multiple samples per sequencing lane/quad• Pre-capture Indexing
• pre-capture indexing shown on pooling of 2 samples with potential to pool up to 6 ( no real data to back it up)to poo up to 6 ( o ea data to bac t up)• did not show data on tracking rare heterozygous SNPs or InDels• pre-capture indexing requires much more sequencing to maintain sensitivity and allelic balance, increasing sequencing cost
Multiplexing Option for Custom/Targeted Kits
HiSeq2000 GAIIx SOLiD 5500xl SOLiD 4
Read length Output Output Output ( )
Output ( ) OutputRead length Output Output (nano) (micro) Output
PE (2x76 bp) < 150 Gb < 45 GbPE (75/35 or < 270 Gb <160 Gb < 80 Gb50/25 bp) < 270 Gb <160 Gb < 80 Gb
lanes 16 8 12 12 2 slidesthroughput < 9 5 < 5 5 < 22 5 < 13 3 < 10/quadg
(Gb/per lane) < 9.5 < 5.5 < 22.5 < 13.3 < 10/quad
coverage for targeted
S S l
100 x (<680 Mb)
100 x (<680 Mb)
100 x (<680 Mb)
100 x (<680 Mb)
100 x (<680 Mb)SureSelect (<680 Mb) (<680 Mb) (<680 Mb) (<680 Mb) (<680 Mb)
multiplexing supported 12, 48, 96 12, 48, 96 16, 32, 96 16, 32, 96 16, 32, 96
Illumina Indexing - Uniform Representation of Reads0.2 MB library - 12 indexes in 1 Illumina lane (~2Gb)
0.2 Mb Capture Library (with 12 indexes per lane)
ATCACG8090
100
Bas
es % On-Target
y ( )
0.91
1.071.23
1.27
ATCACGCGATGTTTAGGCTGACCAACAGTGGCCAAT40
50607080
f Rea
ds o
r B
1.00
0.91
1.000.94
1.00
GCCAATCAGATCACTTGAGATCAGTAGCTT
010203040
rcen
tage
of
1.320.94
1.00 GGCTACCTTGTA
0
238
197
198
215
185
207
161
191
252P
e
Mb Coverage by Index
• Shown are12 indexed samples captured with a 0.2 Mb SureSelect library• Performing individual SureSelect captures for each sample before addition of index tags results in uniform representation in sequencing data
SOLiD Barcoding – Even Barcode Distribution7 barcodes of 3.0Mb Capture in 1 SOLiD Quad, 50M reads
100%
7 barcodes of 3.0Mb Capture in 1 SOLiD Quad Barcode Representation in Single SOLiD Quad
70%
80%
90%Percentage reads in targeted regions: Percentage
1.001.02
Index A
Index B
40%
50%
60% reads in regions +/‐ 100bp:
Percentage reads in regions +/‐ 200bp:
1.10
0 91
0.86
Index B
Index C
Index D
Index E
Index F
10%
20%
30%0.96
1.04
0.91Index G
Fold Representation Relative to the Median0%
Index A Index B Index C Index D Index E Index F Index G
Fold Representation Relative to the Median
SureSelectTM RNA Target Enrichmentanalyzing a subset group of transcripts
start with 0.1-0.5ug RNAconstruct a cDNA NGS library
protocols available for Illumina and SOLiD, for individual or multiplexed samples
24 hoursRNA Capture - Illumina RNA Capture - SOLiD
RNA Capture – Illumina(Multiplexing)
RNA Capture – SOLiD(Multiplexing)(Multiplexing) (Multiplexing)
SureSelectTM RNA Target Enrichment• study entire biochemical pathways or gene families in one experiment • get deeper coverage per transcript with less cumbersome data analysis • discover splice junctions fusion transcripts SNPs and allelic-specific expressiondiscover splice junctions, fusion transcripts, SNPs and allelic-specific expression• analyze more samples per run with cost-effective multiplexing and automation• quantify gene expression with sensitivity similar to qPCR
get
% O
n-Ta
rg
>90% on-target reads deeper coverage with enriched library
SureSelectTM RNA Target Enrichmentar
y
Gene Expression Levels
ary
1
Gene Expression Levels
nric
hed
libra
R2=0.885
riche
d Li
bra
R2=0.998
En
Unenriched library
Enr
Enriched library 2
• high correlation to unenriched RNA library provides reliable gene expression results• high reproducibility provides reliable gene expression resultsexpression results • even index distribution is critical for multiplexing cost-saving
SureSelect Kinome Kit and Kinome RNA Kit: biomarker discovery/profiling in disease and/or drug response y p g g p
• Kinome Kit (3.2 Mb) targets exons and UTRs• Kinome RNA kit (3 4 Mb) targets transcripts of theKinome RNA kit (3.4 Mb) targets transcripts of the same genes• 518 putative kinases• 12 PI3K domain-containing genes
9 i it l l h h t ki• 9 inositol polyphosphate kinases• 28 genes frequently mutated in human cancer• 19 genes known to be mutated in breast cancer• total of 612 genesg• stratification of patient populations in clinical trials to better predict drug efficacy• save time and money in biomarker discovery related to disease state and/or drug response
S i 298 1912 (2002)
to disease state and/or drug response
Science 298 1912 (2002)
Kinome DNA Kit Performance –3-5 samples per GAIIx lane / SOLID quad
Even Index Representation Across Single Lane
90%100%
Reproducible Performance Across Indexes
1.151.15
Even Index Representation Across Single Lane
Kinome Index 1
Kinome Index 2
50%60%70%80%90%
0.84
1.00
0.98Kinome Index 3
Kinome Index 4
Kinome Index 5
10%20%30%40%50%
Uniform Read Depth Distribution
0%10%
Kinome Index 1
Kinome Index 2
Kinome Index 3
Kinome Index 4
Kinome Index 5
% on target +/- 200bp% Bases 1X Coverage% Bases 10X Coverage% Bases 20X Coverageg
Kinome RNA Kit Performance
• HeLa cells and spleen tissueHeLa cells and spleen tissue
Cost-effective Options using SureSelectXT System
• Introduction to SureSelectXT Target Enrichment System
• SureSelectXT All Exon Kits and MultiplexingSureSelect All Exon Kits and Multiplexing
• SureSelectXT Targeted Kits and Multiplexing
• Custom capture kitsp
• Human Kinome kits (DNA capture & RNA capture)
• Bravo Automation for Library Preparation and Capture
Bravo Automation Configuration
BRAVO2R 4R Mini-HubPlate-loc
V-spin
Bravo Automation for SureSelectXT SystemSureSelectXT Library Prep
Shear Genomic DNA
Shear Genomic DNA
SureSelect Oligo Capture
Library
SureSelect Oligo Capture
Library
SureSelect Library Prep
SureSelectXT
Target EnrichmentBravo Bravo
Repair EndsRepair Ends
3’-dA Addition3’-dA Addition
Library Hybridization
Library Hybridization
AutomationAutomation
Bravo Automation
Bravo Automation
Adapter LigationAdapter Ligation
PCRPCR
Illumina & SOLiD
Sequencer
AutomatedBead Capture
AutomatedBead Capture
Bravo Automation
Bravo Automation
Bravo Automation
Bravo Automation
PCR Enrichment
PCR Enrichment
Prepped Prepped
QAQA
AutomationAutomation Bravo Automation
Bravo Automation
Bravo Automation
Bravo Automationpp
Librarypp
Library
Automated Library Prep Yields are More Consistent and Generates Consistent Sequencing Performance
1200140016001800
DN
A
Automated Prepped Library Yields
1200140016001800
DN
A
Manual Prepped Library Yields
200400600800
10001200
Nan
ogra
ms
D
Row H %CV = 11.12%
200400600800
10001200
Nan
ogra
ms
D
Manual %CV = 18.93%
01 2 3 4 5 6 7 8 9 10 11 12
Column Position
0200
Manual 1 Manual 2 Manual 3 Manual 4 Manual 5 Manual 6Sample Number
100%Performance Results of a 1.0 Mb Capture
65%70%75%80%85%90%95%
100%
% reads in regions +/- 200bp% Regions Covered at 1x% Bases Covered at 10x
50%55%60%65%
202 Mb 164 Mb 195 Mb 139 Mb 129 Mb 208 Mbb 143 Mb 229 Mb 158 Mb 236 Mb 175 Mb 146 Mb 124 Mb
H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 Manual
% Bases Covered at 10x% Bases Covered at 20xUniformity (1/2 mean with upper tail)
Total Number of Mapped Bases (Mb)
Summary
More Resources and Support
• Agilent Free Design Day Hopkins, NIH, Houston, Stanford, Philly, Boston, etcp y
• live and recorded eSeminar series
• genomics agilent com
Resources• genomics.agilent.com
• local Agilent sales and FAS
• sureselect.support@agilent.com
local Agilent sales and FAS
Support• 1-800-227-9770 options 3x4x4
Questions?
top related