an exemplar for data integration in the biomedical domain...

31
An exemplar for data integration in the biomedical domain driven by the ISA framework Shannan Ho Sui AMIA, March 19, 2013 http://stemcellcommons.org

Upload: others

Post on 19-Apr-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

An exemplar for data integration in the biomedical domain driven by the ISA framework

Shannan Ho SuiAMIA, March 19, 2013

http://stemcellcommons.org

Page 2: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

This is a story about collaboration...

Page 3: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

ISA

Page 4: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

ISA

Page 5: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

• Inconsistent data formats, experimental descriptions and results

Disparate Stem Cell Resources

Page 6: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

Disparate Stem Cell Resources

• Inconsistent data formats, experimental descriptions and results

Page 7: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

The Stem Cell Commons

• A shared data and analytical resource

• Bioinformatics support for research at the HSCI

• A community

Data repository

Analysis system

Support/consults

Page 8: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

Susanna-Assunta Sansoneisacommons.org

user community

Page 9: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

General-purpose, configurable format, designed to support the use of several standards checklists, terminologies and conversions to (a growing number of) other me t ad a t a formats , u s ed by publ i c repositories, e.g.

MAGE-Tab

SRA-xml SOFT

Pride-xml

Page 10: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

Rationale for developing ISA

Capture all salient features of the experimental workflow

Make annotation explicit and discoverable

Support data provenance tracking

Use community standards

Susanna-Assunta Sansoneisacommons.org

Page 11: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

ISA

Manual merging process

53 studies

1098 assays

87 studies

1179 assays

Curator

148 studies

2356 assays

Page 12: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

ISA

Conversion driven by ISA-Tab

53 studies

1098 assays

87 studies

1179 assays

ISA-Tab

148 studies

2356 assays

Page 13: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent
Page 14: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

Data uploads and annotation

Page 15: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

Current Data Statistics

Page 16: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

Filtering data using metadata as search facets

Page 17: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

Experiment description

Page 18: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

Experimental protocols and data downloads

Page 19: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

ISA-Tab metadata downloads and export

Page 20: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

Linking data to the Galaxy workflow engine

Page 21: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

Refinery: An analysis and visualization framework

In development

Page 22: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

Viewing and selecting samples in list view

Page 23: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

Viewing and selecting samples in matrix view

Page 24: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

Initiating workflows

Page 25: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

Monitoring progress

Page 26: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

Integration with the IGV genome browser

Page 27: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

Challenges• Changing research culture(s) to recognize the value

of data sharing

• Manually curating the data for consistency and completeness

• Managing large volumes of data

• Standardizing workflows

• Ensuring interoperability when integrating multiple systems and tools

• Technical complexity of software development effort

Page 28: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

Refinery

Psalm HaseleyNils Gehlenborg Richard Park Ilya SytchevPeter Park Shannan Ho Sui

Page 29: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

ISA Commons

Philippe Rocca-Sera

Eamonn MaguireSusanna Sansone

Oxford e-Research CentreA growing community that uses the ISA metadata tracking framework to facilitate standards-compliant collection, curation, managementand reuse of datasets.

Page 30: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

WikiPathways

Page 31: An exemplar for data integration in the biomedical domain ...gehlenborg.com/wp-content/uploads/AMIA2013.pdf · Disparate Stem Cell Resources. Disparate Stem Cell Resources • Inconsistent

Meet the TeamCenter for Stem Cell Bioinformatics

Winston HideProgram Leader

Shannan Ho SuiAnalytics

Oliver HofmannCore services

Ilya SytchevBioinformatics Developer

John HutchinsonHSCI Analyst

Sudeshna DasRepository

Stéphane CorlosquetBioinformatics Engineer

Emily MerrillBioinformatics Analyst

• Nils Gehlenborg• Richard Park• Psalm Haseley• Peter Park

Collaborators

• Eamonn Maguire• Philippe Rocca-Sera• Susanna Sansone