bioshare: opal and mica: a software suite for data harmonization and federation - vincent ferretti -...

33
A SOFTWARE SUITE FOR DATA HARMONIZATION AND FEDERATION Vincent Ferretti Ontario Institute for Cancer Research

Upload: lisette-giepmans

Post on 16-Apr-2017

535 views

Category:

Health & Medicine


0 download

TRANSCRIPT

Page 1: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

A SOFTWARE SUITE FOR DATA HARMONIZATION AND FEDERATION

Vincent FerrettiOntario Institute for Cancer Research

Page 2: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

The Maelstrom Research Software Suite

Software development started in 2007$3,800,000 CAD of investment so far

Onyx

Opal

Mica DataSHIELD

Collection

StorageManagement Harmonization

Publication Analysis

Page 3: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

Some User’s StoriesName Type Activities Tools

The Canadian Longitudinal Study on Aging (CLSA)

Single study50,000 participants

Collection, management, portal

The Canadian Partnership for tomorrow project (CPTP)

Study consortium5 studies, 300,000 participants

Collection, harmonization, portal

BBMRI-LPC Network >30 studies Cataloguing

Maelstrom Research Research project Cataloguing, harmonization

Interconnect NetworkCataloguing, (harmonization, federated data analysis)

BioSHaRE NetworkCataloguing, harmonization, federated data analysis DataSHIELD

Onyx

OpalMica

Onyx

OpalMica

Mica

OpalMica

Mica

OpalMica

Page 4: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

1 - Data Harmonization with OpalThe Canadian Partnership for Tomorrow Project (CPTP)

5 cohorts with baseline data on ~ 300,000 participants• 5 Different legislations, questionnaires, data access

policies, languages, etc. Project’s objectives

• To create harmonized datasets across the 5 cohorts• To create a data portal to browse harmonized datasets

and request access to themPhase 1

The baseline Health and Risk Factorquestionnaire (CoreQx)• 716 harmonized variables

Page 5: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

Opal SoftwareA database application for integrating and storing data from multiple and heterogeneous sources

•Used by studies to create central data repositories

Page 6: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

Metadata in Opal Projects -> tables -> variables Tables are defined by a customizable dictionaries in Excel

format Variables are annotated with an arbitrary number of attributes

Controlled vocabularies - Taxonomies - (e.g. ICD-10) Maelstrom Research variable classification

More than 130 terms in 17 classes (e.g. Reproduction, Physical Measures)

Variable Name Attribute Name Attribute Value

Cancer_type Diseases NeoplasmAsthma_ever Diseases Respiratory system (J00-J99)Ever_smoke Question label [EN] Have you ever smoked?

[FR] Avez-vous déjà fumé?Ever_smoke Health

behaviorsTobacco

Page 7: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research
Page 8: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

Data DerivationOpal derive new variables by executing custom JavaScript code

Useful for data validation, curation and harmonisation

User-friendly interfaces for recoding variables

JavaScript API for more advanced derivation

Page 9: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

JavaScript code executed by Opal when needed

Derived data is not persisted – Views or Virtual tables

Page 10: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

Deriving the CoreQx datasets with Opal

Page 11: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

Deriving the CoreQx datasets with Opal

Page 12: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

Deriving the CoreQx datasets with Opal

How to query and access these harmonized datasets?

Page 13: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

The Mica Software Software to create web data portals for individual studies or for study consortiaStudy catalogue• MR Standard description of

longitudinal studies• Publication workflow

Datasets• Data dictionaries, data

harmonization, • database federation

Data Access• Online forms, requests

management workflow with roles

Data Persistence

MongoDB

Opal Server

Mica Server

Mica2New client-server

architecture

Page 14: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

The CPTP Data Portal

Page 15: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

Study Catalogue

Page 16: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research
Page 17: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

Querying Opal Servers for Metadata and Aggregated Data

Page 18: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

Dictionary Faceted Search

Page 19: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

Variable Page

Real time summary statistics

Page 20: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

Harmonization Result

Page 21: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

Data Access Requests

Researcher account registration

Customized application form Application review workflow Email notifications Multi-languages

Page 22: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

2 - Advanced Cataloguing with MicaMaelstrom-research.org

Maelstrom Research web site is powered by Mica Includes a catalogue of international networks and studies with annotated dictionaries

Current version • 6 Networks• 129 Studies• 222 datasets• 182,622 Variables

Page 23: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

Search Harmonisation Potential

Page 24: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

Multi-dimensional Search Tool

Page 25: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

3- Data AnalysisThe BioSHaRE Healthy Obese Project

10 studies from 7 European countries

200,000 subjects The HOP dataset - 103

harmonized variables

How to analyze these datasets

» without pooling data » without accessing

individual-level data?

Page 26: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

A Federated Approach

Page 27: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

Real Time Cross Tabulation on Harmonized Data

Page 28: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

New Improved Version

Page 29: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

Real Time Advanced Queries on Harmonized Data

Page 30: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

More Advanced Analyses with R

Page 31: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

R Studio Web Consolerstudio.bioshare.eu

Page 32: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

More Information

www.maelstrom-research.org www.obiba.org Code available at github.com/obiba

Let us know and acknowledge Maelstrom Research if you are using our software, it’s important for our funding and our ability to provide support

Page 33: BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research

Acknowledgement

Yannick Marcon and his software developer teamThe Maelstrom Research scientific team

The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n°261433 (Biobank Standardisation and Harmonisation for Research Excellence in the European Union - BioSHaRE-EU)