transmart community meeting 5-7 nov 13 - session 5: the accelerated cure project ms repository...

27
The ACP MS Repository A Case Study tranSMART Community Meeting, Nov. 2013 Stephen Wicks, Ph.D.

Upload: david-peyruc

Post on 19-Jan-2015

252 views

Category:

Health & Medicine


0 download

DESCRIPTION

tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study The Accelerated Cure Project MS Repository Dataset as a Case Study Stephen Wicks, Rancho Biosciences The Accelerated Cure Project for Multiple Sclerosis is a non-profit focused on accelerating research for a cure for MS. One of their major projects over the last decade has been the generation of the ACP Repository, a collection of biological samples and associated clinical data from approximately 3200 case or control participants. More than 75 studies are underway or have been completed, in both industry and academic settings, using samples from the ACP Repository. Rancho BioSciences has partnered with ACP through Orion Bionetworks to curate and load these datasets and associated clinical CRFs into tranSMART. In this talk, we will describe the rich ACP dataset and discuss our experiences in preparing the data for analysis in tranSMART

TRANSCRIPT

Page 1: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

The ACP MS

Repository

A Case Study

tranSMART Community Meeting, Nov. 2013 Stephen Wicks, Ph.D.

Page 2: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

The ACP Repository: a Case Study

What is Multiple Sclerosis? Chronic inflammatory/demyelination disorder

affecting the CNS. (about 0.1%) Leading cause of neurological disability in young

adults. Symptoms are variable and significant. They include

vision, cognition, locomotion, pain, disorientation, dexterity, mood, bowel/bladder control, others.

Generally progressive, but progression is idiosyncratic. (CISRRMSSPMS, vs. CISPPMS etc.)

Complex etiology

Page 3: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

The ACP Repository: a Case Study

What is the cost of MS?

Difficult and costly to diagnose (MRI, symptom variability leads to extensive differential diagnosis)

Treatments can slow progression, but are expensive.

Many different drugs exist, but patient stratification for maximum efficacy and minimum side effects is non-existent. “Role the dice”

Often strikes early in life, and is a life-long disability.

Average Diagnosis at about 30. 5% before 16.

Page 4: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

ACP is a founding member of Orion. Orion seeks to cure MS by harnessing the power

of computational modeling of disease progression. ACP will provide its data to Orion in tranSMART to

facilitate this goal. Rancho BioSciences will curate and harmonize the

ACP data for Orion

Orion Bionetworks

Page 5: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

The ACP Repository: a Case Study

ACP and the MS Repository

Founded in 2001 by an MIT entrepreneur with MS

ACP MS Repository started in 2006. The goal was to identify the cause of MS.

ACP MS Repository enrollment shut down this year. Approximately 3200 participants enrolled.

Biosamples, demographics, medical history etc.

Research data OPT-UP

Page 6: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

Repository Enrollment Status (6/21/2013)• 3,220 subjects enrolled; 467 longitudinal visits completed

6

• DNA, RNA, Plasma, Serum, PBMCs + data from 52 page CRF

The ACP Repository: a Case Study

Page 7: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

ACP Repository

3200+ participantsBiosamples & Datasets

$13 millionInvested

MS Researchers Worldwide

Academia & Industry

77 sets of biosamples+data 

(b,m)illions of datapoints,From 36 studies, so far

“Matchmaker”Database Graphical

User Interface

Allowing MS Researchers Worldwide to Explore the

ACP Repository Database

MS Discovery Forum

Reviewing Developments in the MS FieldCommunicating with MS Researchers

Insights and Results

Mechanisms Diagnostics Causes Treatments

The ACP EngineThe ACP Repository: a Case Study

Page 8: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

The ACP Repository: a Case Study

ACP MS Repository Open-access collection of highly annotated blood-derived

samples plus data from MS, related diseases, & control subjects gathered from 2006-2013.

Requirement for research data derived from samples to be deposited (with a provision for IP protection).

Contributes to MS+ research in many ways: Enables studies that might not be conducted otherwise

(academic & commercial) Creates a common results database for studies from

multiple bio-analytical techniques on overlapping sets of subjects.

Approximately 3200 participants.

“Working with them (ACP) allowed us to obtain critical samples and confirm our results for only $20,000. If I had to obtain these samples from scratch, it would have cost $1 million and added 5 years to the project.”

- Thomas M. Aune, PhD, Molecular Biology, Vanderbilt University School of Medicine

(from Scientific American)

Page 9: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

Case Report Form (CRF)Curation challenges

The ACP Repository: a Case Study

Page 10: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

ACP Case Report Form 48 Page (first visit) and 38 page (second

visit) complete clinical workup Form completed with the assistance of a

clinical research associate over a several hour interview (with sample draw and lab workup)

Broad data: 80 distinct tables in an SQL database

Deep data: in flat data files, more than 20 million cells

The ACP Repository: a Case Study

Page 11: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

CRF Sample Fields

Illustrates some of the problems associated with curating this dataset.

The ACP Repository: a Case Study

103 distinct textual responses. “Betseron”, beta-seron, betaseron, BETASERON, etc.Study drugs “CS-0777”, or drug trail enrollment “BG00012 (FUMARATE) OR PLACEBO”Inappropriate (sometimes lethaly so) drug units No consistent measure of frequency. “First Drug”, “Second Drug” etc… ordinal order was meaningless.

Page 12: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

DMD Curation Solutions We applied drug ontologies and mapping

vocabularies where needed. We repaired and consolidated dose,

frequency, etc. to a single measure with 3 values (high, standard, low)

We re-formatted the data to eliminate the ambiguous cardinal ordering of reporting

The ACP Repository: a Case Study

Page 13: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

CRF Sample Fields

The ACP Repository: a Case Study

Multiple Drugs (Observations) were addressed with…

Page 14: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

VISIT_NAME application

The ACP Repository: a Case Study

Page 15: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

Controlled Vocabularies (sports)

~5000 responses 779 distinct sports reported When filtered by “ski”,29 “gym”, 45 “walk”, 30; “jog”, 17, “run”,

40

The ACP Repository: a Case Study

Page 16: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

Controlled Vocabularies (sports)

All sports mapped to a 29 term vocabulary.

The ACP Repository: a Case Study

Page 17: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

Controlled Vocabularies (pets)

~6500 pets reported 600 distinct pets reported When filtered by “dog”, 112, however, this

misses mispellings (“diog”, “dot”, “pubs”, dog-like pets “wolf”, “half-wolf”, “mutt”, and breeds (“poddle”, “poodle”, “Afghan Hound”, etc.)

59 additional dog-like entries

The ACP Repository: a Case Study

Page 18: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

Controlled Vocabularies (pets)

All pets mapped to a 31 category controlled vocabulary

The ACP Repository: a Case Study

Page 19: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

Medication Curation Challenges

>10,000 medications listed. 2703 distinct medications listed. Mapped these to 614 real medications (e.g.

Amitriptyline) This was split into two tables:

Continuing Medications (541 entities) Stopped Medications (317 entities)

VISIT_NAME was used to represent distinct observations across the whole study

Truly longitudinal measures were reified in the tree hierarchy in the data mapping file.

The ACP Repository: a Case Study

Amitriptaline Amitriptylin Amitriptyline

Amitriptyline HCL Amitroptyline Amitryetyline Amitrypatiline Amitryptailine Amitryptaline Amitryptilin Amitryptiline Amitryptilline Amitryptylene Amitryptyline

Page 20: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

ACP Repository Tree

The ACP Repository: a Case Study

Page 21: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

Date and Time Coding All dates converted to

periods (Months, Years, or Days) prior to the relevant blood draw date.

Dates were represented by International Standard ISO 8601. i.e. YYYY-MM-DD (e.g. 2001-12-15)

Dates in multiple formats:

15/12/200115/Dec/2001Dec-20012001Dec./200112/2001 --/--/------/2001-------------/12/2001

The ACP Repository: a Case Study

Page 22: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

77 studies ongoing or completed 36 studies have returned data to ACP Data types:

Low-D biomarker (antibodies, metabolites, serum markers of inflammation, etc.)

Low-D genotype data High-D SNP/GWAS data Gene-expression studies Whole-genome sequencing (2 distinct studies)

Study types: Etiology Diagnostics Disease activity biomarkers

Repository Usage

The ACP Repository: a Case Study

Page 23: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

Repository Usage

The ACP Repository: a Case Study

Page 24: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

Research Data Curation Challenges

Few guidelines provided to researchers for data formatting or treatment

Often little or no documentation describing how the data was generated or handled (raw vs. normalized, transformations e.g.)

Load study meta-data (contact info, description, etc. at the node level)

The ACP Repository: a Case Study

Page 25: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

Sample Study Results

Biogen gene expression study: Designed to identify gene-expression profiles that discriminate progressiveforms of MS from relapsing-remittingforms of the disease.

The ACP Repository: a Case Study

Page 26: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

Future Directions Rancho BioSciences is providing guidance to ACP

for data-collection practices going forward (e.g. OPT-UP)

We loaded the clinical data and 6 sample study datasets into an Oracle-based tranSMART instance that we host in-house for QC purposes.

The full dataset is slated to be loaded into a 1.1 postgreSQL-based tranSMART instance (hosted by Recombinant by Deloitte for Orion).

This and other data sources (Inst. For Neuroscience at B&W) will be analyzed and modeled by Orion

The ACP Repository: a Case Study

Page 27: tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Project MS Repository Dataset as a Case Study

Thanks for your time! Questions?

The ACP Repository: a Case Study