transmart community meeting 5-7 nov 13 - session 1: chilly-mazarin meeting objectives

44
TranSMART Core From tool to ecosystem Kees van Bochove tranSMART Workshop Amsterdam June 17, 2013

Upload: david-peyruc

Post on 10-May-2015

189 views

Category:

Health & Medicine


0 download

DESCRIPTION

tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives Sherry Cao and Keith Elliston

TRANSCRIPT

Page 1: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

TranSMART CoreFrom tool to ecosystem

Kees van BochovetranSMART Workshop Amsterdam

June 17, 2013

Page 2: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

Today, we have a chance to write history.

Page 3: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

•Microarray data analysis support•Load public microarray data from GEO•Store and retrieve saved analyses•Search on gene name, disease name etc.•Genomic variants and VCF support•Load TCGA studies we have access to•Load 1000 Genomes data

$$$$$$$$$$$$

•Microarray data analysis support•Load public microarray data from GEO•Store and retrieve saved analyses•Search on gene name, disease name etc.•Genomic variants and VCF support•Load TCGA studies we have access to•Load 1000 Genomes data

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

Page 4: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

There has to be a better way.

Page 5: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

costs $ 0!

No-brainer!

Ehm.. wait a minute…

Page 6: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

Let’s have a look how these scientists in academia are doing.

They love to collaborate right?!

Page 7: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

In 2003…(Ancient history; before Facebook)

Page 8: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

Yet Another ‘New’ Web-based Solution for the Management of Microarray Data ?!

Page 9: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

Not Invented Here Syndrome

Image from Rob Hooft, CTO Netherlands Bioinformatics Centrehttp://nothinkingbeyondthispoint.blogspot.nl/2011/11/decision-tree-for-scientific.html

Page 10: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

What about all these great FP6, FP7, IMI, … projects?

Page 11: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

Source code of major projects isreadily available on GitHub

Page 12: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

But… I’m afraid it’s still up to you and me to put the pieces together.

Page 13: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

Phenotype DatabaseWritten in Grails, supports several types of omics data, provides data integration and visualization, has R, Groovy and PHP API’s. Sounds familiar?

http://phenotypefoundation.org

Page 14: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

share

reuse

specialize

Page 15: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

Writing good software is hard.

Page 16: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

So far…

• TranSMART has a huge business potential. It’s no silver bullet though.

• Scientists sometimes have trouble re-using each others’ work. Especially when it comes to open source software.

Page 17: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

Do they?

Time to look at some succes stories.

Page 18: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

R and Bioconductor

Who doesn’t love R?

Page 19: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

Website looks as if dates from Stone Age.Must be those LaTeX-loving physicists.

Page 20: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

Very active community, and…lots of packages.

Page 21: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

Governance of R community

Brian Ripley: “The R Project is governed by a self-perpetuating oligarchy, a group with a lot of power. R was principally developed for the benefit of the core team.”

As cited on http://blog.revolutionanalytics.com/2011/08/brian-ripley-on-the-r-development-process.html

Page 22: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

Galaxy

Page 23: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

Galaxy is the most widely used open source bioinformatics web interface AFAIK.

Probably in no small amount thanks to their continuous dedication to

improving the UI.But there’s something else.

Page 24: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

Galaxy Toolshed

Page 25: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

• An open source CMS (Content Management System) written in Python, nowadays backing thousands of production grade websites

• Started by 2 developers in 2000, now an active open source project with hundreds of active developers

• In 2004, the Plone Foundation was formed to formalize IP and secure the future of Plone

• Plone Collective has hundreds of plugins

Page 26: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives
Page 27: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

What do all these success stories have in common?

Bioconductor PackagesGalaxy ToolshedPlone CollectiveDrupal Modules

Page 28: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

Lessons for tranSMART

TranSMART needs a marketplace and a thriving community to survive.

To get to a functioning marketplace, we need a well-designed core.

Page 29: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

There is also another reason.

Page 30: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

TranSMART Contributions - Pharma

• Janssen– Initial version of tranSMART– Genomics viewer using IGV and GenePattern– Faceted Search interface (results browsing)

• Millenium– Loading TCGA and many GEO studies– R interface for interacting with data directly in R– Several R analyses available directly in GUI

Page 31: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

TranSMART Contributions - Pharma

• Sanofi– Cleaner user interface– Added metadata layer for all concepts– Study/Program categorization & file management

• Pfizer– GWAS upload (VCF), data storage and analysis– Enhanced data export capabilities

Page 32: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives
Page 33: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

This is a mess.

Another reason why we need that core.

Page 34: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

Start the Core: I2B2 Refactoring

1. I2B2 was integrated with tranSMART, but the I2B2 API abstractions were leaked all over the place in the tranSMART application.

2. We agreed in the London meeting that all parties would set some time apart for working on the core.

3. Combined, it made sense to start working at the clinical data API, properly using the I2B2 API where possible, and re-implement all I2B2 functionality in a new ‘core-db’ plugin.

Page 35: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

The first version of core-integration was completed half April.

By then, all webservice calls to what formerly was an outdated version of the

I2B2 Ontology and CRC cells, were handled by the newly implemented core-db plugin.

Also, a set of tests was written in the process and API documentation generated.

Page 36: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

In the long run, I believe forming a good distributed working group on the core API is a more important delivery of this workshop

than crunching out a stable 1.1 version.

That’s how we write that history

Page 37: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives
Page 38: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

Kees van Bochove - The Hyve

Current tranSMART Architecture

Page 39: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

TranSMART’s Strong Points

• Powerful, ready to go user interface for common analyses (survival analysis, gene expression heatmaps etc.)

• Leverages i2b2 data model for clinical data and offers unified view over different studies

• Uses a lot of good open source technology under the hood (Grails, R, SOLR, Pentaho) leveraging existing community developments

Page 40: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

TranSMART Building Blocks

• R: open source statistics package with CRAN, an active repository in which many algorithms and statistical packages are published

• Grails: a rapid application development framework in Groovy leveraging Java technology such as Hibernate, Spring, Quartz

• I2b2: domain specific open source package for storing and querying clinical data

• GenePattern, maybe soon: Galaxy, KNIME?

Page 41: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

TranSMART’s Weaknesses

• Large monolithic codebase with little modularization beyond the standard Grails MVC setup

• Code quality is problematic, especially JavaScript• Test coverage is low, no functional / web tests

and little unit and integration tests• No clear internal API’s, only a service level that

does the plumbing.• I2b2 integration violates i2b2 abstractions

Page 42: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

tranSMART Plans

• Use a clearly modularized architecture with separation of clinical, high dimensional, search and metadata storage; workflow execution enginges and knowledge repository

• Define clear API and rewrite current implementations with good test coverage

• Use i2b2 data model, re-harmonize with latest i2b2 APIs, and don’t use i2b2 binaries directly

• Separate analysis definitions and abstract from workflow execution engine

http://prezi.com/t6twshyctdsk/transmart-core-refactoring

Page 43: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

Kees van Bochove - The Hyve

Target tranSMART Architecture

Page 44: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

Further reading

• Description of core API efforts: http://thehyve.nl/rewiring-transmart

• In depth description of i2b2 refactoring: http://thehyve.nl/inital-work-on-transmarts-core

• Overview of tranSMART Core API so far: http://thehyve.github.io/transmart-core-api/

• Example of continuous integration test suite (of core-db): https://ci.ctmmtrait.nl/browse/TM-COREDB-JOB1-51/test