an exemplar for data integration in the biomedical domain...

Post on 19-Apr-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

An exemplar for data integration in the biomedical domain driven by the ISA framework

Shannan Ho SuiAMIA, March 19, 2013

http://stemcellcommons.org

This is a story about collaboration...

ISA

ISA

• Inconsistent data formats, experimental descriptions and results

Disparate Stem Cell Resources

Disparate Stem Cell Resources

• Inconsistent data formats, experimental descriptions and results

The Stem Cell Commons

• A shared data and analytical resource

• Bioinformatics support for research at the HSCI

• A community

Data repository

Analysis system

Support/consults

Susanna-Assunta Sansoneisacommons.org

user community

General-purpose, configurable format, designed to support the use of several standards checklists, terminologies and conversions to (a growing number of) other me t ad a t a formats , u s ed by publ i c repositories, e.g.

MAGE-Tab

SRA-xml SOFT

Pride-xml

Rationale for developing ISA

Capture all salient features of the experimental workflow

Make annotation explicit and discoverable

Support data provenance tracking

Use community standards

Susanna-Assunta Sansoneisacommons.org

ISA

Manual merging process

53 studies

1098 assays

87 studies

1179 assays

Curator

148 studies

2356 assays

ISA

Conversion driven by ISA-Tab

53 studies

1098 assays

87 studies

1179 assays

ISA-Tab

148 studies

2356 assays

Data uploads and annotation

Current Data Statistics

Filtering data using metadata as search facets

Experiment description

Experimental protocols and data downloads

ISA-Tab metadata downloads and export

Linking data to the Galaxy workflow engine

Refinery: An analysis and visualization framework

In development

Viewing and selecting samples in list view

Viewing and selecting samples in matrix view

Initiating workflows

Monitoring progress

Integration with the IGV genome browser

Challenges• Changing research culture(s) to recognize the value

of data sharing

• Manually curating the data for consistency and completeness

• Managing large volumes of data

• Standardizing workflows

• Ensuring interoperability when integrating multiple systems and tools

• Technical complexity of software development effort

Refinery

Psalm HaseleyNils Gehlenborg Richard Park Ilya SytchevPeter Park Shannan Ho Sui

ISA Commons

Philippe Rocca-Sera

Eamonn MaguireSusanna Sansone

Oxford e-Research CentreA growing community that uses the ISA metadata tracking framework to facilitate standards-compliant collection, curation, managementand reuse of datasets.

WikiPathways

Meet the TeamCenter for Stem Cell Bioinformatics

Winston HideProgram Leader

Shannan Ho SuiAnalytics

Oliver HofmannCore services

Ilya SytchevBioinformatics Developer

John HutchinsonHSCI Analyst

Sudeshna DasRepository

Stéphane CorlosquetBioinformatics Engineer

Emily MerrillBioinformatics Analyst

• Nils Gehlenborg• Richard Park• Psalm Haseley• Peter Park

Collaborators

• Eamonn Maguire• Philippe Rocca-Sera• Susanna Sansone

top related