sequence analysis in the regulated domain - a pistoia alliance debates webinar - 2019-09-21
TRANSCRIPT
September 20, 2016
Sequence Analysis in the Regulated Domain
A Pistoia Alliance Debates Webinar
Chaired by Etzard Stolte - Roche
This webinar is being recorded
© P
isto
ia A
llian
ce
3
Etzard Stolte - Global Head Knowledge Management - RocheEtzard leads the global Knowledge Management effort in Pharma Technical Development. He has worked at the interface of the Life- and Computer-Sciences for more than 20 years. Before joining Roche, Etzard worked as CIO at the Jackson Lab and was CTO for the Life Sciences at HP. Etzard has earned academic degrees in both Biology and Informatics, with a PhD in Computer Science from ETH Zurich. Most of his relatives are medical doctors, but that did not stop Etzard of doing something useful with his life :).
George Asimenos – Vice President - DNAnexusGeorge Asimenos has an undergraduate degree in Electrical and Computer Engineering from the National Technical University of Athens, and a PhD in Computer Science at Stanford University. He is currently VP at DNAnexus and is headed the precisionFDA development effort. Outside of computation biology, George is a cryptography and security enthusiast, and has received the DEFCON Black Badge award.
Les Mara– Co-Founder - DatabiologyLes has 35 years of international business leadership experience in the global technology and IT services industry, having held several executive leadership positions at both Capgemini and at HP. With an equally broad knowledge of the Life Sciences Sector gained over that time, Les with his co-founder Georges Heiter set up Databiology in 2013 as a specialist global software company to address the Biomedical big data needs of the global life sciences enterprise market.
Dr. Martijn Brugman Ph.D. – Process Research, Cell and Gene Therapy, GSKMartijn has a long standing interest and 15 years of academic research experience in immunohematology and gene therapy. As part of his postdoctoral gene therapy research, he was involved in setting up deep sequencing assays to determine vector integration profiles and the analysis of vector integrations in patients treated with gene therapy, which also is his current focus in the Cell and Gene Therapy unit at GSK.
© P
isto
ia A
llian
ce
Agenda
• The challenges – a 10,000 foot view - Etzard• Pharma use case – Martijn • Industry perspectives - George & Les• Q&A - All
4
The challengeA 10,000 foot view
Etzard Stolte
Partner logo if required
© P
isto
ia A
llian
ce
Sequence Analysis in the Regulated Domain
• Sequence analysis is maturing, issues remain– Data quality -> from reference genomes to sharing– Evolving algorithms -> unstable code base– Massive compute requirements -> require tradeoffs
• There are old and new regulatory challenges– Old: e.g. sample management & preservation– New: e.g. privacy & code to ensure anonymity
• A unique sequence analysis challenge– The scale-out fiction -> underlying stochastic methods– Reproducibility -> research versus clinic
6
© P
isto
ia A
llian
ce
Example FDA: Use of Standards in FDA Regulatory Oversight of Next-Generation Sequencing (NGS)-Based In Vitro Diagnostics (IVDs) Used for Diagnosing Germline Diseases – Recommendations, July 6, 2016
• Accuracy - Demonstrate accuracy by measuring positive percent agreement (“PPA”), negative percent agreement (“NPA”), technical positive predictive value (“TPPV”), and rate of “no calls” or “invalid calls.” FDA recommends that PPA, NPA, and TPPV be set at no less than a point estimate of 99.9 percent with lower bound of 95 percent confidence interval (“CI”) of 99 percent for all variant types.
• Precision - Evaluate precision (reproducibility and repeatability) for variant and wild type calls, using a threshold of 95 percent for the lower bound of the 95 percent CI per variant type.
• Limit of Detection (“LoD”) - Establish and document the minimum and maximum amount of DNA that will enable the test to provide expected results in 95 percent of test runs with an acceptable level of invalid or “no calls.”
7
© P
isto
ia A
llian
ce
Sequence Analysis in the Regulated Domain
• As sequence analysis is maturing & entering the clinic, real world compute issues and their regulatory implications become important
• Agencies are aware of these challenges -> some new guidance• E.g. FDA (drafts)
– Use of Standards in FDA Regulatory Oversight of Next Generation Sequencing (NGS)-Based In Vitro Diagnostics (IVDs) Used for Diagnosing Germline Diseases
– Use of Public Human Genetic Variant Databases to Support Clinical Validity for Next Generation Sequencing (NGS)-Based In Vitro Diagnostics”
• E.g. ICH - INTEGRATED ADDENDUM TO ICH E6(R1): GUIDELINE FOR GOOD CLINICAL PRACTICE
• We asked the US, European and Swiss regulatory agencies to join us for this webinar, but for different reasons they declined
• Reminder of the introduction / motivation– A practical show case from GSK (Martijn Brugman)– Some typical compute challenges from Databiology (Les Mara)– A playful approach by the FDA from DNAnexus (George Asimenos)– Then Q&A and discussion (All)
• If you would like to continue to be involved in this highly dynamic topic, please join our interest group on IP3
8
© P
isto
ia A
llian
ce
Poll Question 1: Within your organization, who understands the regulatory obligations in sequence analysis in the regulated domain?A. Only I personally and/or some of my colleaguesB. I do personally as well as my companyC. Neither I nor my company understandD. I don’t know
Martijn Brugman, PhDCMC, Cell and Gene Therapy
Gene Addition Gene Therapy Use Case
Ex vivo gene therapy
11
Integrated vector
Host genome
Host genome The gene therapy vector adds a functional copy of a gene that
is dysfunctional in the patient to the genome of a patient cell.
12
Vector integrations are clonal markers... the wet lab part.
Integrated vector
Host genome
Host genome
•A library of vector-genome junctions is generated and analysed using deep sequencing•From these sequences, vector integration locations and frequencies are determined. •Samples are collected over time. Patient datasets should be updated when new data arrives.
Therapeutic gene
Vector-genome
Bioinformatic part
13
– Processing– Trim linker, primer and gene
therapy vector sequence– Align remaining sequence to
genome– Remedy alignment collisions and
non-aligned sequences
– Annotate vector insertion sites – Keep track of number of reads,
number of different fragments or linker cassette barcodes
– Report– Prominent clones in each sample.– Recurring clones– Quantities of clones and their
development over time.
Trim
Align
Annotate
Report
14
Situation• Need to identify Virus Insertion Sites after transplantation.• An insertion site can be used as a marker for a hematopoietic clone. • Determine Insertion Site frequency to track clones over time.• Currently many different tools are manually brought together and
maintained. It is time consuming to maintain and track.Target
• A standardised reliable easy to use workflow.• Contains the required bioinformatics components• Compliant with a regulated environment.
Proposal• Investigate how these methods can be developed • Ensure the required bioinformatics components are in place.
© P
isto
ia A
llian
ce
Poll Question 2: Within your organization, how pervasive is large-scale genomic computing under a regulatory (GxP) regime?A. This is how we do businessB. We are testing the watersC. We are constrained by resourcesD. Not pervasive at all
Databiology point-of-view on globally federated and distributed biomedical data analysis
Les Mara September 20th 2016
Pistoia panel discussion
Copyright ©2016. All Rights Reserved. Confidential Databiology Ltd.
What you are doing everyday…
Capture and Catalog Data Build and Run Analysis
So you got something to work with… Doing the work…
Manage Technology
To make it work…
Copyright ©2016. All Rights Reserved. Confidential Databiology Ltd.
Copyright ©2016. All Rights Reserved. Confidential Databiology Ltd.
EXABYTES OF DATA
Internal / External Organizational TopologyCROs, Hospitals, Labs, Providers, Payers, Public / Commercial Reference Archives, Consortia, …
VARYING COMPUTING INFRASTRUCTURES
10,000 + ANALYSIS TOOLS
DATA CUSTODIANS
Omics, Chemistry, Imaging, Functional, Phenotypic, Behavioral, Medical Records, Wearables, Literature, …
Copyright ©2016. All Rights Reserved. Confidential Databiology Ltd.
Architectural Considerations
DISTRIBUTED BIG DATA
BIG Data Warehouse
CENTRALIZED BIG DATA ANALYTICS DISTRIBUTED BIG DATA ANALYTICSVS
Biomedical data is:Dynamic andInterdependent
Copyright ©2016. All Rights Reserved. Confidential Databiology Ltd.
Community Considerations
Usually more than one standard
Adoption can be a significant investment
Technology moves faster than standards can be agreed
Specification vs. Adoption
BUT….
STANDARDISATION
Copyright ©2016. All Rights Reserved. Confidential Databiology Ltd.
Technology Considerations
APIVirtualisation
Infrastructure Capacity
Infrastructure AbundanceStorage and network costs keep reducing- Also encourages us to become increasingly ambitious
Virtualisation can make everything portableContainers, Volumes, Orchestration- Great, but requires shift in organisation and approach
We now have an API Economy - True, but: standards, volatility, lots of custom development and maintenance
Copyright ©2016. All Rights Reserved. Confidential Databiology Ltd.
Databiology Philosophy
Federated search with centralised catalog− Propagate events− Independence of syntax and semantics
Bring the analysis to the data− Cataloguing distributed data rather than centralizing data − Move analysis stack to the data when possible− Just in time transport for non-local data
Centralised application registry− Ease of maintenance and consistency− Orchestrate and distribute to infrastructure of choice
(regulatory compliance, proximity to data)
Data projection− Provide data in the form expected by the application
APPSDATA
INFRASTRUCTURE
© 2016 Databiology Inc. All Rights Reserved
The Global Network For Genomics™
®
PrecisionFDAA community platform built on top of DNAnexus technology, for NGS assay evaluation and regulatory science exploration.GEORGE ASIMENOS, VP
28
2929
30
31
32
33
34
35
36
37
® 38
DNAnexus Compliance
Compliant infrastructure
(CSPs)
Governance, risk
management, and
compliance framework(DNAnexus)
Compliant applications(Customer)
38
Poll Question 3: How likely are you/your organisation to participate in one of the Precision FDA challenges?A. We already haveB. We are planning toC. Not sure, but worth consideringD. No we are not
Audience Q&APlease use the Question function in GoToMeeting
41
Next Pistoia Alliance Debates Webinar will be on Wednesday November 16, 2016 on topic of
CRISPR, moderated by Alvis Brazma of EMBL-EBI
MARK YOUR CALENDARS
[email protected] @pistoiaalliance www.pistoiaalliance.org
Many thanks for your attendance and engagement