big data in translational science - amazon web … data in translational science albert wang...

20
Big Data in Translational Science Albert Wang Associate Director, Translational R&D IT Bristol-Myers Squibb 2015 AAPS Annual Meeting

Upload: buiduong

Post on 08-Jun-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Big Data in Translational Science

Albert Wang

Associate Director, Translational R&D IT

Bristol-Myers Squibb

2015 AAPS Annual Meeting

Agenda

• Perspectives on Big Data

• Big Data in Translational R&D

• Selected Initiatives at Bristol-Myers Squibb

Why are we talking about Big Data?

Why are we talking about Big Data?

4

Author | 00 Month Year Set area descriptor | Sub level 1

#1: Because our capability for generating data is growing - everywhere.

The Five ‘V’s of Big Data

5

IBM Institute for Business Value Analytics: The real-world use of big data

e.g., sentiment, social media, weather conditions, etc.

Value Deriving value from data

Until we can turn data into value, it is useless.

Why are we talking about Big Data?

6

#2: Because new technologies have proven applicable to big data problems across a variety of domains.

MapReduce

Virtualized/Distributed Storage

Artificial Intelligence (machine learning, NLP, etc.)

NoSQL Databases

Why are we talking about Big Data?

7

#3: Because there are still significant problems to solve.

Data Agility (Real-Time) Data privacy/security

Usability/Accessibility

Development Discovery

What does this mean for Translational R&D?

Translational Research

Translational Medicine

Bench

Bedside

Translational R&D leverages a variety of data types

Pharma-cology data

Clinical Trial Data

Epidemiological data

Genomic Data

Mole-cular Data

Medical/Hospital Data Tissue

Data

Real World Data

Patient Genotype

TR&D Analytics

Complex Interdependency of Data

Increased Need for Data Sharing Across Partners

• Efficient information sharing across

multiple partners

• Easy access to the data

• Transparency with collaborators

11

Photo: ©iStockphoto.com/123render

Photo: 123rf.com

Institute

Academia

Hospital

Foundation

CRO Partner

6 Areas of Pharma and Healthcare

Related Big Data

Life Science Data Owner: Academic ,

Government Example datasets: • NGS data • Imaging data • Literature, &

conferences text

• Signalling Pathway Data / Models

• Epidemiology

Business Intelligence Data Owner: Pharma / Biotech / Academia Example datasets: • News, Blogs, Social Media, Patents, Literature, Web pages, Financial Reports

Time

Data Entry and storage

Improved Query and Navigation

Enhanced Analysis and Visualization

Benefits

Data Access Retrieval

Doing What We Can Now - Building for the Future

Data Integration

How do we grow an infrastructure to support TR&D Big Data?

Data mining, Analysis,

Modeling, and Interpretation

Decision Support

Future State Infrastructure

Dat

a So

urc

es

Inte

grat

ion

D

ata

fee

ds

Ou

tpu

ts

Sample properties, availability, location,…

Clinical: Subject demographics, treatment, response….

Biomarker data Real World data, EMR, Claims,

Discovery data.

Informatician/ Data Scientist

Clinician External

Collaborator

Use

rs

Biomarker

Scientist Project team Discovery Scientist

Data curation, standardization, integration

analysis, mining, modeling, interpretation

Decision Support: query, analysis, knowledge capture

Data

Information

Knowledge

Insights

Decisions

Structured and unstructured data

An

alys

is

15

Case Study: Sensor data in atrial fibrillation INFORMED STUDY DESIGN ENABLED STUDY EXECUTION

… vs. Holter monitor

Medtronic SEEQ patch...

Wear each patch up to one week Patches replaced by subjects at home Mobile base station transmits ECG data to cloud in real time Alerts if safety event detected or device not reapplied correctly

Wear up to two days Replaced by ECG technician at clinical site Base station stores data; needs to be docked No real time alerts

CHALLENGED DATA MANAGEMENT

Patients carry wireless transmitters

ECG data streamed to

the cloud

Reports generated

Clinical database could not accept reports: conversion process required

Clinical DB

Subjects needed before querying

sensor dataset: 160

Subjects needed after querying sensor

dataset: 80

BMS used real-world pacemaker data analysis to revise the required # of subjects and observation period for

this study

Observation period reduced from 4 weeks to 2 weeks of continuous monitoring while on therapy. ANALYSIS = W.I.P.

16

Biosensor Data Consumption Data Processing Scheme

Common Sandbox Distributed Sandbox

17

Biomarker-driven Translational Research

TR Biology

GBS

GCR

DM Physician Clinical Biomarker

Biomarker Tech Matrix Bix

DWG Lead

DM Physician

Translational Research Scientist

Clinical Biomarker

Biomarker Tech Matrix

Discovery Teams

Clinical Teams

Purchased Tissue Collections

Specimen

Repositories

Clinical Trials

IHC Flow Cytometry Gene Expression Genetics Cytokine Profiling

Viral Sequencing

Metabolomics

Proteomics

Other assays…

Patient Information

Sample Information

Patient Information

18

Biomarker Repository: Conceptual Architecture

Biomarker Repository

Biomarker File

Repository

IHC

Fl

ow

G

eno

mic

s

Pro

tein

Clin

ical

TR&D Data Hub (Hadoop)

Sam

ple

s Specimen Biorepository

Clinical Data (CDMS, SAS

environment, etc.)

Visualization Analytics

Ad hoc biomarker

data Genomic Data

Virtual Systems Pharmacology

19

Acknowledgements

Translational R&D IT

Anastasia Christianson

Kaushal Desai

Peter McDonald

Marko Miladinov

Chris Nalbone

Xiao Shao

David Witt

Other Groups

Alex Simmonds

Fraz Ismat

Jeff Guss

Tarek Leil

Sergey Ermakov

Russ Towell

Brian Wong