massive parallel sequencing applications in the pharma ......–installation of roche gs-flx in...
TRANSCRIPT
Massive parallel sequencing applications in the
pharma environment
Peter Verhasselt
Translational Genomics & Genetics,
Johnson & Johnson Pharmaceutical Research & Development, Belgium
June 16th 2010
Illumina user meeting, Brussels
1
Global R&D Pharma structure @ Johnson & Johnson
2
NeuroscienceCardiovascular
& Metabolism
Infectious
DiseasesImmunology Oncology
Research Capabilities Organisation (RCO)
Biotechnology Center of Excellence
External Innovation
Global Development Organisation
Global Medical Organisation
Cross-Pharma Blobal Regulatory Affairs
Virco-Tibotec @ MechelenJanssen Pharmaceutica @ Beerse
3
• First commercial customer
(October 2003)
• Identification of mechanism of
resistance against antimicrobial
compound
• Identification of novel drug
target
• 454 Life Sciences
Andries K. et al., Science 307, 223-227Published online 9 December 2004
History of Next-Generation sequencing at J&J (1)
History of Next-Generation sequencing at J&J (2)
• March 2008
– Installation of Roche GS-FLX in Virco-Tibotec (Mechelen)
– Study minority variants in viral populations in patient plasma
– Throughout 2009: 25 runs for 24 projects / 416 samples
• Summer 2008
– Installation of SOLiD in Pharmacogenomics group (US)
• December 2009
– Installation of Illumina Genome Analyzer IIx in Beerse
– 3 Runs executed so far
4
Presentation overview
• Who we are and where we fit in the organisation
• History of next-gen sequencing in Johnson & Johnson
• Experiences with the Illumina platform
– miRNA sequencing vs. µ-array analysis
– shRNA library sequencing
– Whole Exome enrichment on SureSelect platform
– Targeted resequencing of subset of oncogenes after enrichement
by hybridization
5
Run 1 : Whole Exome Sure Select enrichment
• Paired-end sequencing sample preparation kit
• Paired-end cluster generation on cBot
• 2* 76 bases paired-end read
6
Lane Sample Yield (Gb) % PF clusters
1 2 pM PhiX 1.3 89
2 2 pM NCI-H522 1.1 91
3 2 pM A549 0.9 91
4 8 pM PhiX 1.8 80
5 4 pM NCI-H522 1.6 84
6 4 pM A549 1.5 88
7 8 pM NCI-H522 1.7 70
8 8 pM A549 1.7 73
11.6
Coverage plots
NCI-H522
A549
100x coverage
Gene density across chromosomes
1 2 3 4 5 6 7 X 8 9 10 11 12 13 14 15 16 17 18 19 20 Y 22 21
Coverage plots versus gene density
NCI-H522
A549
1 2 3 4 5 6 7 X 8 9 10 11 12 13 14 15 16 17 18 19 20 Y 22 21
Coverage across genesB
RC
A1
BR
CA
2
Run 2 : miRNA + sh-library PCR pool
• Small RNA sequencing sample preparation
• Single-end cluster generation on cBot
• 1* 36 bases single-end read
– Tubestrip hybridization protocol
11
Lane Sample (6pM) Primer Yield (Gb) % PF clusters
1 miRNA 1 smRNA/GEX-DpnII 0.8 76
2 miRNA 2 smRNA/GEX-DpnII 0.7 80
3 miRNA 3 smRNA/GEX-DpnII 0.8 77
4 PhiX genomic 0.7 79
5 miRNA 4 smRNA/GEX-DpnII 0.8 80
6 miRNA 5 smRNA/GEX-DpnII 0.7 82
7 miRNA 6 smRNA/GEX-DpnII 0.9 77
8 Sh-library PCR custom 0.2 37
5.6
Short-hairpin RNA library PCR pool sequencing
• shRNA as a tool for RNA interference
• Library pool of 250 shRNA containing plasmids
• Equimolar mixing of plasmid DNA
• Propagation in Bacteria
• Transfection into virusses
• Infection of tumor cell-lines
• Injection in mice to generate xenografts
• Impact of drugs on cell-lines or xenografts
• Barcoding
12
shRNA
P7bc
3 ntP5custom sequencing primer
Barcodes :AGG (WT)
AAG
CCG
GGG
TTG
ACG
CGG
GTG
TAG
13
Short hairpin representation across barcodes
Effect of barcode on amplification/sequencing efficiency ?
Equimolar mixing of PCR pools ?
Short hairpin representation within sample
14
Representation is independent of barcode
miRNA sequencing on Illumina platform
• Small RNA’s (22 – 30 nt)
• Each miRNA can regulate multiple genes
• miRNAs identified as biomarkers
• miR-base 14 contains 721 human entries
• µArray based analysis platforms
– Exicon, Illumina, Affymetrix, Assuragen, Illumina bead array
15
Comparison overview
16
Technologies across cell-lines
17
Run 3 : Custom + whole exome SureSelect
• (Multiplexed) paired-end sequencing sample preparation
• Paired-end cluster generation on cBot
• 2* 76 bases paired-end read + 7 bases index read
– Read 2 primer from multiplexed paired-end kit added to read 2 primer from standard
paired-end sequencing kit
18
Lane Sample (6pM) Yield (Gb) % PF clusters
1 Pool 1 0.04 16
2 Pool 2 0.04 20
3 Pool 3 1.7 85
4 Multiplex PhiX 1.0 54
5 Sample 3 1.7 87
6 Sample 3 1.7 87
7 Sample 4 2.3 82
8 Sample 4 1.3 76
9.8
POP study : Targeted re-sequencing of human genes
19
• Goal :
Simultaneously analyse a subset of the human genome for the
presence of oncology-related mutations in cell-lines or tumour
biopsies
• Technology requirements :
– Specificity
– Uniformity
– Flexibility
– Fit with downstream sequencing technology
– Cost
• Feasibility study to select best combination
20
Target enrichment : Hybridization in solution
47 genes
558 targets
89.698 bp
5118034 1390
Performance of SureSelect + Illumina
69.7 99.9 49.8 99.5 99.3 99.0 55.5
8 samples on 1 GAIIx lane
21
PCR versus hybridization based enrichment
22
PCR + 454 (24) Hyb + Illumina (8)
# reads per sample 92.868 5.118.034
Length 300 b 76 b
Coverage 92.5 x 1390 x
Completeness 98.7 % 99.9 %
Specificity 91.5 % - 73.3 % 69.7 % - 49.8 %
Acknowledgements
• Translational Genomics &
Genetics
– Jeroen Aerssens
– Ina Vandenbroucke
– Carl Van Hove
– Amy Axel
• Bioinformatics
– Herwig Van Marck
• Informatics
– Steven Osselaer
• Functional Genomics
– Kurt Van Baelen
23
• Illumina
– John Baeten
– Herman Blok
• Agilent
– Winfried Van Eyndhoven
– Emily Le Proust