transmart isa-june2012

28
Managing Experimental Metadata using ISA data structures TranSMART-ISA Teleconference June 19th, 2012 Philippe Rocca-Serra Ph.D on the behalf of the ISA Team, University of Oxford http://www.isa-tools.org; http://github.com/ISA-tools http://isacommons.org/ [email protected] Tuesday, 19 June 2012

Upload: philippe-rocca-serra

Post on 26-Jan-2015

117 views

Category:

Education


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: TranSMART ISA-june2012

Managing Experimental Metadata using ISA data structures

TranSMART-ISA TeleconferenceJune 19th, 2012

Philippe Rocca-Serra Ph.D

on the behalf of the ISA Team, University of Oxfordhttp://www.isa-tools.org; http://github.com/ISA-toolshttp://isacommons.org/[email protected]

Tuesday, 19 June 2012

Page 2: TranSMART ISA-june2012

Capture all salient features of the experimental workflow

Make annotation explicit and discoverable

Structure the descriptions for consistency, tracking independent variables dependent variables

using cross reference and resolvable

identifiers

Why ISA format and Tools?

Tuesday, 19 June 2012

Page 3: TranSMART ISA-june2012

Why ISA format and Tools?

–Supporting data provenance tracking–Node/Edge underlying concept–Tabular as a compromise: a presentation layer inspired by

Object model (FuGE,MAGE-OM)–A Generic representation, applied to:

•microarray based experiments (MAGE)• sequencing based experiments (SRA)•flow cytometry based experiments (FuGE-Flow Cyt)•mass spectrometry and NMR spectroscopy experiments

Tuesday, 19 June 2012

Page 4: TranSMART ISA-june2012

TranSMART-ISA TeleconferenceJune 19th, 2012

Why ISA format and Tools?

investigation

assay(s) assay(s)

data data

external  files  in  native  or  other  for-­

mats

pointers  to  data  file  names/location

investigationhigh  level  concept  to  link  related  studies

studythe  central  unit,  containing  information  on  the  subject  under  study,  its  characteristics  and  any  treatments  applied.a  study  has  associated  assays

assaytest  performed  either  on  material  taken  from  the  sub-­ject  or  on  the  whole  initial  subject,  which  produce  quali-­tative  or  quantitative  meas-­urements  (data)

H. Sapiens

33 Years

H. Sapiens

H. Sapiens

H. Sapiens

H1

H1

H2

35

35

33

Years

Years

Years

H1.sample1

H1.sample2

H2.sample1

Labeling

Labeling

H1.sample1.labeled

H2.sample1.labeled

h1-s1.cel

h1-s2.cel

h2-s1.cel

H1

H2

H1.sample1

H1.sample2

H2.sample1

Labeling

Labeling

H1.sample1.labeled

H2.sample1.labeled

h1-s1.cel

h1-s2.cel

h2-s1.cel

H. Sapiens

35 Years

MAGE-Tab Pride-xml

SRA-xml

ISA metadata specifications:•workflow and process orientated•compatible with checklist enforcement•compatible with external vocabulary resources•compatible by design with existing schemas

Currently finalizing conversion to RDF to explore the growing Linked Data universe, in collaboration with the W3C HCLSIG, Toxbank Consortium)

Tuesday, 19 June 2012

Page 5: TranSMART ISA-june2012

ISA syntax and Table definition

• Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled

Extract Name.)

Material Node Material Node

Protocol REF

Parameter Value […]

Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]

Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]

5

Date (day effect)

Performer (operator effect)

TranSMART-ISA TeleconferenceJune 19th, 2012

Tuesday, 19 June 2012

Page 6: TranSMART ISA-june2012

ISA syntax and Table definition

• Data Acquisition & Data Transformations:– Input are Materials or Data and Outputs Data Nodes (Raw Data File, Derived Data File, Derived Array Data

Matrix File)

Protocol REF

Material Node Data File Node

Parameter Value […]

Comment[…]Characteristics[…]Factor Value[…] (independent variables)Comment[…]Material Type

6

Date (day effect)

Performer (operator effect)

TranSMART-ISA TeleconferenceJune 19th, 2012

Tuesday, 19 June 2012

Page 7: TranSMART ISA-june2012

Nanotechnology Informatics Working

Group

Some of the internal projects:Some of the public groups/resources:

A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards-compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including:•environmental health•environmental genomics•metabolomics•metagenomics•nanotechnology•proteomics

• stem cell discovery• system biology• transcriptomics• toxicogenomics• also by communities working to build a library of

cellular signatures

Who uses ISA format and Tools?

Tuesday, 19 June 2012

Page 8: TranSMART ISA-june2012

www.biosharing.org www.isacommons.org

Towards interoperable bioscience data

Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B, Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S, Evelo C, Forster M, Gaudet P, Gilbert J, Goble C, Griffin J, Jacob D, Kleinjans J, Harland L, Haug K, Hermjakob H, Sui S, Laederach A, Liang S, Marshall S, Merrill E, McGrath A, Reilly D, Roux M, Shamu C, Shang C, Steinbeck C, Trefethen A, Williams-Jones B, Wolstencroft K, Xenarios J, Hide W.

Feb 2012www.isacommons.org

doi:10.1038/ng.1054

Development timeline

Community involvement and uptake

Core developments

2008 2009 2010

1st ISA-Tab workshop 3rd ISA-Tab workshop

2nd ISA-Tab workshop

Final ISA-Tab spec Database instance at EBI

ISA software v1

2011

1st public instance: Harvard Stem Cell Discovery Engine

RDF format starts

Conversions to Pride-XML/SRA-XML/MAGE-Tab and more

User workshops/visits - start

Growing number of systems starts to adopt ISA-Tab

Publications

‘Omics data sharing(Science)

ISA-Tab and ISA software suite(Bioinformatics)

Stem Cell Discovery Engine(NAR)

2007 2012

Strawman ISA-Tab spec

Other tools implement ISA-Tab

Workshop reports ISA Commons(Nature Genetics)

Links to analysis tools starts

Tuesday, 19 June 2012

Page 9: TranSMART ISA-june2012

The ISA tools... modular with a suite of supporting tools

Create

Experimentalist uses editor to report investigation.

Configure

Curator creates template

Validate

Convert from ISA

Check adherance to template

Users browse investigations, query and view experimental metadata, and access associated data files

Curator stores metadata in database using BII data management tool

Load

Convert to MAGE-TAB, PRIDE-ML, SRA-XML for submission to international public repositories

Browse

Requires Configuration XMLPerform analysis of data in context with the metadata using the Galaxy or R analysis engines.

Analyze

isacreator

converter

Convert to ISA

Convert from MAGE-Tab to ISATab. More formats coming soon...

converter

TranSMART-ISA TeleconferenceJune 19th, 2012

Tuesday, 19 June 2012

Page 10: TranSMART ISA-june2012

Create configuration xml files

TransMART-ISA TeleconferenceJune 19th, 2012

Tuesday, 19 June 2012

Page 11: TranSMART ISA-june2012

The ISAconfigurator...

TranSMART-ISA TeleconferenceJune 19th, 2012

Tuesday, 19 June 2012

Page 12: TranSMART ISA-june2012

The ISAconfigurator...

TranSMART-ISA TeleconferenceJune 19th, 2012

Tuesday, 19 June 2012

Page 13: TranSMART ISA-june2012

Use of the configuration xml

In technical terms, configuration xml schema (XSD) is consumed by an XML beans goal in maven and Java stubs are created which are then used to load the XML files into memory

The configuration is also used to define the form view using a similar mechanism....

<xml><field>sample</field><field>protocol ref</field><field>extract name</field><field>label</field>...</xml>

Java ObjectTableReferenceObject

XML definition(s) Import into Java Object Model using classes created by XML beans

Construct spreadsheet model. Columns, rows, etc.

Assign cell editors. Ontology terms are given the ontology selection tool as a cell editor, file fields are given a file chooser etc.

TranSMART-ISA TeleconferenceJune 19th, 2012

Tuesday, 19 June 2012

Page 14: TranSMART ISA-june2012

isacreatorCreate & Edit ISA-Tab

TranSMART-ISA TeleconferenceJune 19th, 2012

Tuesday, 19 June 2012

Page 15: TranSMART ISA-june2012

Data Reporting Scenarios

1. Starting from scratch: spreadsheet function2. Mapping from 3rd-party tab data: mapping/ETL tool3. Templating based on study design information: wizard(*)

(*)(“early intervention is best”)

TranSMART-ISA TeleconferenceJune 19th, 2012

Tuesday, 19 June 2012

Page 16: TranSMART ISA-june2012

isacreator

Developed to be a user friendly way to enter standards-compliant metadata: it has lots of features...

But these are just some of them...we also have a data entry wizard and an import utility...

TranSMART-ISA TeleconferenceJune 19th, 2012

The ISAcreator...

Tuesday, 19 June 2012

Page 17: TranSMART ISA-june2012

Ontologies in ISAcreator

We use the NCBO Bioportal and the EBI’s OLS to do searching and browsing on ontologies.

Ontology Resource ManagerThe resource manager provides seamless searching of ontology resources, regardless of their origins, their underlying

data schema or the mechanism (REST, SOAP or local file store) through which they are accessed.

NCBOBioPortal

Ontology Lookup Service (OLS)

Plugin

Ontology browsing & searching

Ontology tagging

Search, Hierarchy and Annotator services

Ontology field restriction

ISAcreator manages ontology metadata such as version information as well as individual term accessions, source, uri and so forth.

Ontology search code is usable outside of ISAcreator. In fact, the ISAconfigurator imports ISAcreator as a maven dependency and reuses it’s components to do ontology restriction...plugins can also make use of our ontology search and browse functionalities

TranSMART-ISA TeleconferenceJune 19th, 2012

Tuesday, 19 June 2012

Page 18: TranSMART ISA-june2012

Plugins in ISAcreator

•Plugins can be developed for 3 different purposes:

In ISAcreator, we use the Apache Felix implementation of the OSGi framework...it’s really good.

Search (adds extra search space for ontology tool)

Custom cell editors (for spreadsheet)

Extra general functionality (which appears in a plugin menu)

•2 Examples of ISA plugins:

• Access to local metadata stores: Novartis Plugin to Ontology Widget

• Annotation of findings: Metabolite Identification Plugin (Metabolights Repository

contribution to ISA project).

TranSMART-ISA TeleconferenceJune 19th, 2012

Tuesday, 19 June 2012

Page 19: TranSMART ISA-june2012

Plugins...example Novartis Metastore Search

Search function on the Novartis Metastore... integrates search results on the metastore in the Ontology search tool.

So, with the Novartis plugin in your Plugin directory, you’ll be able to search the Novartis metastore directly within ISAcreator, and it will handle all the tasks involved with recording term source, etc.

TranSMART-ISA TeleconferenceJune 19th, 2012

Tuesday, 19 June 2012

Page 20: TranSMART ISA-june2012

ISAcreator - Metabolite Identification plugin

5 Credits: Kenneth Haug: Metabolights

Tuesday, 19 June 2012

Page 21: TranSMART ISA-june2012

Summary• All Open Source, Open Access Project (https://github.com/ISA-tools)

• OSGI Plugin Architecture: Apache Felix

• Ontology Support: Select, Browse, Tag from public or private metadata stores

• Annotation of Molecular finding: Metabolite Identification Plugin for ISAcreator

• Several libraries (java, python, perl, R,) for parsing ISA files.

• Integration with R: R-ISATAB package

TranSMART-ISA TeleconferenceJune 19th, 2012

Tuesday, 19 June 2012

Page 22: TranSMART ISA-june2012

Summary: TransMART - ISA

• ISA Study maps to TransMart

• Samples and Timepoint

• Study Groups

• Subject Demographics

• ISA assays map to TransMART Biomarkers

• ISA already has configurations supporting OMICS data:

• microarray

• NGS

• RNA-Seq, ChIP-Seq, MeDIP-Seq

• microbial diversity

• protein/metabolite profiling using Mass spectrometry

TranSMART-ISA TeleconferenceJune 19th, 2012

Tuesday, 19 June 2012

Page 23: TranSMART ISA-june2012

Why integrating ISA with tranSMART ?

• Susie Stephens (J&J): "A use case: someone was viewing results of analyses in TranSMART, and then wanted to go back to the raw or processed data and the experimental information in the ISA system. Or where results make a scientist curious to know whether a different/similar data set exists”

• Michael R. Barnes (Director of Bioinformatics, Queen Mary University of London): "We are now quite bought in to TranSMART as we will be running it for a large funded MRC collaboration. The benefit of interoperability between TranSMART and ISA tools would be self evident. The fewer different standards used in a workflow the better, although TranSMART might be able to integrate diverse data sources, if the sources don't all contain the same fields then combined analysis is reduced to the common denominator fields between data sets. ISA-Tab could be a 'standard of choice' for TranSMART, although it could not be an exclusive standard."

TranSMART-ISA TeleconferenceJune 19th, 2012

Tuesday, 19 June 2012

Page 24: TranSMART ISA-june2012

Preparing for Linked Open Data

✴ ISA2RDF (Toxbank collaboration) contribution to an ecosystem of software tools supporting the ISA syntax

✴ reliance to internet resolvable identifiers

✴ W3C bio/life science Note on Gene Expression RDF - (PMID: 22449719)

✴ TODO:

✴ Specify comparator groups + analysis methods and resulting measurements and statistical measures

TranSMART-ISA TeleconferenceJune 19th, 2012

Tuesday, 19 June 2012

Page 25: TranSMART ISA-june2012

Preparing for Linked Open Data

✴ ISA2RDF (Toxbank collaboration) contribution to an ecosystem of software tools supporting the ISA syntax

✴ reliance to internet resolvable identifiers

✴ W3C bio/life science Note on Gene Expression RDF - (PMID: 22449719)

✴ TODO:

✴ Specify comparator groups + analysis methods and resulting measurements and statistical measures

TranSMART-ISA TeleconferenceJune 19th, 2012

Tuesday, 19 June 2012

Page 26: TranSMART ISA-june2012

Preparing for Linked Open Data

✴ ISA2RDF (Toxbank collaboration) contribution to an ecosystem of software tools supporting the ISA syntax

✴ reliance to internet resolvable identifiers

✴ W3C bio/life science Note on Gene Expression RDF - (PMID: 22449719)

✴ TODO:

✴ Specify comparator groups + analysis methods and resulting measurements and statistical measures

TranSMART-ISA TeleconferenceJune 19th, 2012

Tuesday, 19 June 2012

Page 27: TranSMART ISA-june2012

TranSMART-ISA TeleconferenceJune 19th, 2012

Our next steps...as a community

Analysis

blood serum

SCAN

HYB

TRANS

LABEL

EX

SAMP

SCAN

TRANS

SAMP

missing protocols and no information about what was being measured.

well described process from sample to data file.

Making visual comparisons is straightfor-ward using this approach. The longest path is constructed based on all other known datasets in the pool of workflows being compared.

liver kidney blood serum blood plasma

low doseaspirin

SCAN

HYB

TRANS

LABEL

EX

SAMP

SCAN

TRANS

EX

SAMP

SCAN

HYB

TRANS

LABEL

EX

SAMP

SCAN

HYB

TRANS

LABEL

EX

SAMP

SCAN

HYB

TRANS

LABEL

EX

SAMP

SCAN

HYB

TRANS

LABEL

EX

SAMP

SCAN

HYB

TRANS

SAMP

SCAN

TRANS

SAMP

liver kidney blood serum blood plasma

kidney

x5 x5 x5 x5

x5 x5 x5 x5

x5 x5

RDF export & Visualisation Further adoption

Tuesday, 19 June 2012

Page 28: TranSMART ISA-june2012

TranSMART-ISA TeleconferenceJune 19th, 2012

Questions??

You can email [email protected]

View our bloghttp://isatools.wordpress.com

Follow us on Twitter@isatools

View our websitehttp://www.isa-tools.org

Thanks for listening...

View our Git repo & contributehttp://github.com/ISA-tools

Tuesday, 19 June 2012