whole genome sequencing in drug discovery research: a one

29
Marc Sultan, September 24th, 2015 Biomarker Development, Translational Medicine, Novartis On behalf of the BMD WGS pilot team: Robert Bruccoleri, Stine Buechmann-Moller, Nicole Cheung, Anita Fernandez, Nicole Hartmann, Yunsheng He, Xiaoyu Jiang, Li Lei, Bolan Linghu, Thomas Morgan, Nirmala Nanguneri, Thomas Schlitt, Kevin Sloan, Jill Somers, Marc Sultan, Frank Staedtler, Joseph Szustakowski, Marie Waldvogel, Daniela Wieser, Fan Yang, Xiaojun Zhao Whole genome sequencing in drug discovery research: a one fits all solution?

Upload: others

Post on 11-Apr-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Whole genome sequencing in drug discovery research: a one

Marc Sultan, September 24th, 2015

Biomarker Development, Translational Medicine, Novartis

On behalf of the BMD WGS pilot team:

Robert Bruccoleri, Stine Buechmann-Moller, Nicole Cheung, Anita Fernandez, Nicole Hartmann, Yunsheng

He, Xiaoyu Jiang, Li Lei, Bolan Linghu, Thomas Morgan, Nirmala Nanguneri, Thomas Schlitt, Kevin Sloan, Jill

Somers, Marc Sultan, Frank Staedtler, Joseph Szustakowski, Marie Waldvogel, Daniela Wieser, Fan Yang,

Xiaojun Zhao

Whole genome sequencing in drug discovery research: a one fits all solution?

Page 2: Whole genome sequencing in drug discovery research: a one

Outline

| MipTec | Marc Sultan | September 24th 2015 | WGS Pilot | Business Use Only 2

o Introduction

o WGS Pilot study results

o Summary & Challenges

Page 3: Whole genome sequencing in drug discovery research: a one

3

Introduction

| MipTec | Marc Sultan | September, 10th 2015 | WGS Pilot | Business Use Only

Page 4: Whole genome sequencing in drug discovery research: a one

Why Whole Genome Sequencing?

4

Illustration by Pete Ellis/www.drawgood.com

| MipTec | Marc Sultan | September, 10th 2015 | WGS Pilot | Business Use Only

Page 5: Whole genome sequencing in drug discovery research: a one

5

WGS Principle

| MipTec | Marc Sultan | September, 10th 2015 | WGS Pilot | Business Use Only

Page 6: Whole genome sequencing in drug discovery research: a one

6

Finding The Differences

| MipTec | Marc Sultan | September, 10th 2015 | WGS Pilot | Business Use Only

Page 7: Whole genome sequencing in drug discovery research: a one

On market since January 2014

Designed & marketed for population-scale sequencing projects

HiSeq X ten | 160 Genomes | system | 3 days

7 Provided by Illumina | MipTec | Marc Sultan | September, 10th 2015 | WGS Pilot | Business Use Only

Page 8: Whole genome sequencing in drug discovery research: a one

8

Patterned Flowcell

Nanowell substrate | Billions of ordered wells • Defined feature size

• Optimal cluster spacing

Exclusion amplification • Delivers single template per well

• Simultaneous seeding and amplification

| MipTec | Marc Sultan | September, 10th 2015 | WGS Pilot | Business Use Only

Page 9: Whole genome sequencing in drug discovery research: a one

WGS, WES, OmniExome chip

9

WES ($1000)

exon1 exon3 exon2

WGS ($1500-2500)

X X X X X X X X X X X

Omnichip($500)

| MipTec | Marc Sultan | September, 10th 2015 | WGS Pilot | Business Use Only

Page 10: Whole genome sequencing in drug discovery research: a one

WGS pilot goals: assessing the utility of WGS

10

Can WGS replace other profiling platforms?

Can small and large structural variants be accurately called?

Can variants be accurately called in HLA and ADME genes?

| MipTec | Marc Sultan | September, 10th 2015 | WGS Pilot | Business Use Only

Page 11: Whole genome sequencing in drug discovery research: a one

WGS Pilot

11

Study Questions N

NIST “GiB” • Basic proficiency

• Comparison to highly curated variant calls 3

Clinical Study

• Comparison to candidate genotyping

• Comparison to SNP chips (Omni Bead Chip)

• Comparison to exome sequencing

• Can we detect large structural variant in a particular

gene?

77

Academic

collaboration

• Confirmation of a particular deletion? 2

Clinical Study • Comparison to ADME chip data (DMET plus chip) 13

ADME reference • Evaluation of challenging ADME genes 8

HLA reference panel • Evaluate the quality of HLA calls from X10 data? 10

Questions to be answered

113 DNA samples from 6 sub-projects

30x genome-wide coverage with ~1 billion short DNA reads per sample

| MipTec | Marc Sultan | September, 10th 2015 | WGS Pilot | Business Use Only

Page 12: Whole genome sequencing in drug discovery research: a one

Candidate genes/

Pathways (2~3 weeks)

analysis

WGS Pilot Workflow

12

Sample

logistics WGS

Variant

calling Rare disease

pedigree (1-2 months)

Exploratory genome

analysis (> 2 months)

@ Novartis

@ Broad Institute

(12-16 weeks)

| MipTec | Marc Sultan | September, 10th 2015 | WGS Pilot | Business Use Only

Page 13: Whole genome sequencing in drug discovery research: a one

Statistics

13

Key statistics for raw data

# samples 113

Raw data size from Broad 23 TB

Paired-end read length 151 base

Ave. total reads per sample 1.06 Billion

% of reads after filtering 60%

Ave. coverage per sample 32x

Coverage on Agilent exome

targeted region

38x

99.5% > 10x

82% > 30x

| MipTec | Marc Sultan | September, 10th 2015 | WGS Pilot | Business Use Only

Page 14: Whole genome sequencing in drug discovery research: a one

14

WGS Pilot Results

| MipTec | Marc Sultan | September, 10th 2015 | WGS Pilot | Business Use Only

Page 15: Whole genome sequencing in drug discovery research: a one

NIST “Genome in a Bottle”

| IDD | Marc Sultan | June 10th 2015 | WGS Pilot | Business Use Only 15

99.9% concordance with GiB in high confidence regions

WGS data quality of Broad is higher than Macrogen

0.1% error rate in variant calls as estimated by Mendelian

inheritance errors

Page 16: Whole genome sequencing in drug discovery research: a one

16

99.7% concordance for common genotypes

~97% variants called found Omni chip are found in WGS

Variant calls

from WGS

~3,000,000

Variant calls by Omni chip

~12,000,000 ~96,000

| IDD | Marc Sultan | June 10th 2015 | WGS Pilot | Business Use Only

Comparison of WGS with OmniExome5 Chip

Page 17: Whole genome sequencing in drug discovery research: a one

17

99.6% concordance for common genotypes

~94% variants found in WES are found in WGS

Variant calls by WGS

~2,000,000

Variant calls by WES

~13,000,000 ~140,000

| IDD | Marc Sultan | June 10th 2015 | WGS Pilot | Business Use Only

Comparison of WGS with Exome-seq (WES)

Page 18: Whole genome sequencing in drug discovery research: a one

WGS vers DMET Chip

| IDD | Marc Sultan | June 10th 2015 | WGS Pilot | Business Use Only 18

Drug Metabolizing Enzymes and Tansporters

99.5% concordance for ~12,000 (959*13) common genotypes

WGS was be in some cases more accurate or specific (probe design)

Main Advantage of DMET platform: software for star allele prediction and phenotype prediction (high complexity)

Main Advantage WGS: additional sites easily accessible

Page 19: Whole genome sequencing in drug discovery research: a one

Example Mismatch DMET and WGS

19

Probe AM_10799 designed to detect A or T allele

DMET.genotype WGS.genotype

A/A A/G

I386V

AM_10799 (CYP1A2): Designed to detect *4 allele: A/T (I386F)

No star allele definition

for A/G genotype

| IDD | Marc Sultan | June 10th 2015 | WGS Pilot | Business Use Only

Page 20: Whole genome sequencing in drug discovery research: a one

HLA alleles called with WGS data

| IDD | Marc Sultan | June 10th 2015 | WGS Pilot | Business Use Only 20

• WGS can be used to genotype classical HLA genes with accuracy

>95% compared to results generated from conventional methods.

• Accuracy of typing can be influenced by type of software used.

• Caveat: sample size in this evaluation is small, considering large

number of polymorphisms in HLA gene

OpiTYPE Omixon Athlates

HLA gene Fraction of

correct allele call

Accuracy

(%)

Fraction of

correct allele call

Accuracy

(%)

Fraction of

correct allele call

Accuracy

(%)

HLA-A 20/20 100% 19/20 95% 20/20 100%

HLA-B 20/20 100% 20/20 100% 19/20 95%

HLA-C 20/20 100% 18/20 90% 19/20 95%

HLA-DQA1 NA NA 20/20 100% NA NA

HLA-DQB1 NA NA 20/20 100% 19/20 95%

HLA-DPB1 NA NA 20/20 100% NA NA

HLA-DRB1 NA NA 20/20 100% 19/20 90%

Opitype: open source; Omixon: currently have license; Athlates: evaluation license

Page 21: Whole genome sequencing in drug discovery research: a one

Confirmation of known large structural variants

21

ARMS2 3’UTR deletion in clinical study

deletion of CYP2D6 gene in 3 ADME reference samples.

1364-bp deletion in KRT77 gene in collaboration study

240-bp tandem duplication in KIAA1109 gene in

collaboration study

| IDD | Marc Sultan | June 10th 2015 | WGS Pilot | Business Use Only

Page 22: Whole genome sequencing in drug discovery research: a one

Duplication and deletion of CYP2D6

| IDD | Marc Sultan | June 10th 2015 | WGS Pilot | Business Use Only 22

• Samples1-3 have 2 copies (depth:86x, 56x, 47x), samples 4,5 have > 2 copies(depth:67x, 82x),

samples 6-8 have one copy(depth:32x, 29x, 28x)

2X

>2X

2X

2X

1X

>2X

1X

1X

CYP2D6

deletion

Page 23: Whole genome sequencing in drug discovery research: a one

| IDD | Marc Sultan | June 10th 2015 | WGS Pilot | Business Use Only 23

Summary & Challenges

Page 24: Whole genome sequencing in drug discovery research: a one

Summary of results

| IDD | Marc Sultan | June 10th 2015 | WGS Pilot | Business Use Only 24

Can WGS replace other profiling platforms?

• WES, Omni, targeted genotyping: yes

• DMET chips: WGS need better software support

Can small and large structural variants be accurately called?

• Small variants: yes

• Large variants: challenging but promising

Can variants be accurately called in HLA and ADME genes?

• HLA: yes

• ADME: yes for small variants, challenging for large structural variants

Page 25: Whole genome sequencing in drug discovery research: a one

Summary

| IDD | Marc Sultan | June 10th 2015 | WGS Pilot | Business Use Only 25

WGS is technically successful

• High cross-platform concordance of WGS data (>99%)

• Discover more variants than other technologies (coding/non-coding variants, structural variants)

• Analysis algorithms are rapidly improving

WGS is valuable for generating and testing new hypothese in clinical studies

• standarized experimental procedure that enables retrospective analyses (dictionary approach)

• Key to interpret ‘‘big WGS’’ data is to filter and integrate on diverse sources

Page 26: Whole genome sequencing in drug discovery research: a one

WGS opportunities in clinical studies

26

Familial genetic studies

High-priority or competitive programs requiring quick

interrogation of genetic data in response to new

discoveries

High-priority studies with a priori genetic hypotheses -

candidate genes and pathways

Strategic disease indications where heritability is

moderate-to-high (Asthma, COPD) as part of Pan-Omic

strategy

| IDD | Marc Sultan | June 10th 2015 | WGS Pilot | Business Use Only

Page 27: Whole genome sequencing in drug discovery research: a one

| MipTec | Marc Sultan | September, 10th 2015 | WGS Pilot | Business Use Only

Pinpointing a small subset of disease causal variants is a non-trivial task.

Large number of variants per individual make association tests impossible for «typical» sample sizes

limited scope for «hypothesis free» approaches

Huge data: an efficient strategy is required to store, organize, and query the data.

| IDD | Marc Sultan | June 10th 2015 | WGS Pilot | Business Use Only

> 4 million variants

per patient

A small subset of

candidate disease

causal mutations

?

27

Interpretation of results – too much data? Interpreting numerous mutations in small samples is challenging

Page 28: Whole genome sequencing in drug discovery research: a one

Challenges

| IDD | Marc Sultan | June 10th 2015 | WGS Pilot | Business Use Only 28

Data volume: large amounts of data are generated (WGS pilot: 23TB raw data plus data generated during analysis)

Long term storage costs

File transfer times are considerable

Analysis not yet standard, best practices are rapidly changing

Data generation and analysis takes longer

Ethics/legal concerns: incidental findings, consent, cloud based storage?

Page 29: Whole genome sequencing in drug discovery research: a one

| IDD | Marc Sultan | June 10th 2015 | WGS Pilot | Business Use Only 29

WGS pilot team

Robert Bruccoleri

Stine Buechmann-Moller

Nicole Cheung

Anita Fernandez

Nicole Hartmann

Yunsheng He

Xiaoyu Jiang

Li Lei

Bolan Linghu

Thomas Morgan

Nirmala Nanguneri

Thomas Schlitt

Kevin Sloan

Jill Somers

Marc Sultan

Frank Staedtler

Joseph Szustakowski

Marie Waldvogel

Daniela Wieser

Fan Yang

Xiaojun Zhao