eagle bioinformatics symposium: 4. philippe rocca-serra: don't forget the small data:...

Post on 27-Jan-2015

103 Views

Category:

Healthcare

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Reporting experimental plans should not be a second thought. Instrument output without relevant and accurate descriptors is of little benefit to the community. ISA infrastructure is a suite of tools geared towards facilitating good dataset stewardship by providing the necessary means for data managers to annotate, curate, report and ultimately publish their scientific results. In this presentation, we will highlight key features, ongoing development and collaborations to demonstrate the value and flexibility of the resources.

TRANSCRIPT

1

Philippe Rocca-Serra Ph.D

University of Oxford e-Research Centre, UK

Don’t forget the “little data”..Context and Provenance are essential

philippe.rocca-serra@oerc.ox.ac.uk

Eagle’s 4th Symposium, UK, March, 27th 2014

Babraham Research Campus, Cambridge

1

2

Provenance

2

2

Provenance

2

2

Provenance

2

3

MAIN THEME: PROVENANCEIt is all about structuring experimental information to make it available to computer

and software agents to enable:

Traceability, whichrelates to the notion of planning, assessment and evaluation

relates to the notion of accountability, reliability, trust, evidencerelates to the notion of conservation, preservation, storage, archiving and mining

But let’s proceed gradually…

3

3

MAIN THEME: PROVENANCEIt is all about structuring experimental information to make it available to computer

and software agents to enable:

Traceability, whichrelates to the notion of planning, assessment and evaluation

relates to the notion of accountability, reliability, trust, evidencerelates to the notion of conservation, preservation, storage, archiving and mining

But let’s proceed gradually…

Notes in Lab Books(information for humans)

3

3

MAIN THEME: PROVENANCEIt is all about structuring experimental information to make it available to computer

and software agents to enable:

Traceability, whichrelates to the notion of planning, assessment and evaluation

relates to the notion of accountability, reliability, trust, evidencerelates to the notion of conservation, preservation, storage, archiving and mining

But let’s proceed gradually…

Notes in Lab Books(information for humans)

Facts as RDF statements(information for machines)

3

3

MAIN THEME: PROVENANCEIt is all about structuring experimental information to make it available to computer

and software agents to enable:

Traceability, whichrelates to the notion of planning, assessment and evaluation

relates to the notion of accountability, reliability, trust, evidencerelates to the notion of conservation, preservation, storage, archiving and mining

But let’s proceed gradually…

Notes in Lab Books(information for humans)

Spreadsheets and Tables( the compromise)

Facts as RDF statements(information for machines)

3

Contextual Data & Experimental Metadata?4

4

Contextual Data & Experimental Metadata?

• “Data about the Data”–description of the data (descriptive metadata)

4

4

Contextual Data & Experimental Metadata?

• “Data about the Data”–description of the data (descriptive metadata)

• How much metadata is needed?–CNL_MOA1_C2_LD_TP1_EWR.fastq.gz–“it is all in the file name” approach

4

4

Contextual Data & Experimental Metadata?

• “Data about the Data”–description of the data (descriptive metadata)

• How much metadata is needed?–CNL_MOA1_C2_LD_TP1_EWR.fastq.gz–“it is all in the file name” approach

• Is this enough to understand what this experiment is about ....5 years from now?

4

4

5

isacommons

S t e m C e ll C o m m o n sNanotechnology

Informatics Working Group

A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards-compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including:

users and publications:

http://isacommons.org

6

Novartis , Jansen

6

ISA users

• Carcinogenomics Project (EU-FP6 IP)

• Dixa project (EU-FP7 IP)

• ToxBank Project (FP7-HEALTH-2010-Alternative-Testing-Strategies-TAB format + ISA2RDF tool)

• Metabolights Repository (EMBL-EBI)

• ISA-TAB nano for nanoparticle characterisation (NCI caNano)

• Long standing relationship with NCTR FDA (Littlerock)

• Scientific Data NPG

7

7

Why ISA format and Tools?

investigation

assay(s) assay(s)

data data

external files in native or other for-

mats

pointers to data file names/location

investigationhigh level concept to link related studies

studythe central unit, containing information on the subject under study, its characteristics and any treatments applied.a study has associated assays

assaytest performed either on material taken from the sub-ject or on the whole initial subject, which produce quali-tative or quantitative meas-urements (data)

H. Sapiens

33 Years

H. Sapiens

H. Sapiens

H. Sapiens

H1

H1

H2

35

35

33

Years

Years

Years

H1.sample1

H1.sample2

H2.sample1

Labeling

Labeling

H1.sample1.labeled

H2.sample1.labeled

h1-s1.cel

h1-s2.cel

h2-s1.cel

H1

H2

H1.sample1

H1.sample2

H2.sample1

Labeling

Labeling

H1.sample1.labeled

H2.sample1.labeled

h1-s1.cel

h1-s2.cel

h2-s1.cel

H. Sapiens

35 Years

MAGE-Tab Pride-xml

SRA-xml

ISA metadata specifications:•workflow and process orientated•compatible with checklist enforcement•compatible with external vocabulary resources•compatible by design with existing schemas

Currently finalizing conversion to RDF to explore the growing Linked Data universe, in collaboration with the W3C HCLSIG, Toxbank Consortium)

8

8

Essentials about ISA syntax

9

9

Essentials about ISA syntax

• 3 types of files

9

9

Essentials about ISA syntax

• 3 types of files• Investigation file: at max 1 (think executive summary)

– Why? general study description– How? methods / protocol declaration – How? variable declarations (factors and response variable)– Who? contact and affiliation information

9

9

Essentials about ISA syntax

• 3 types of files• Investigation file: at max 1 (think executive summary)

– Why? general study description– How? methods / protocol declaration – How? variable declarations (factors and response variable)– Who? contact and affiliation information

• Study File: true table (think sorting, filtering) – What? Listing all biological materials collected over the study course.

9

9

Essentials about ISA syntax

• 3 types of files• Investigation file: at max 1 (think executive summary)

– Why? general study description– How? methods / protocol declaration – How? variable declarations (factors and response variable)– Who? contact and affiliation information

• Study File: true table (think sorting, filtering) – What? Listing all biological materials collected over the study course.

• Assay File: true table (think sorting, filtering) – Results! Listing all data files collected by a given assay– n files, as many as there are assay types declared

9

9

Features of ISA model10

10

Features of ISA model• generic constructs to describe inputs and outputs for

processes (material processing or data processing) –overall, description of experimental workflow

10

10

Features of ISA model• generic constructs to describe inputs and outputs for

processes (material processing or data processing) –overall, description of experimental workflow

• extensible:–allow support of new assays while reusing existing

components–Need for more semantic support for Assay descriptions

• resources such as OBI. BAO. SIO. to define endpoints and techniques

• Gaps in semantics remains and needs to be tackled

10

10

ISA configurations

Available from:https://github.com/ISA-tools/Configuration-Files

• Assembling workflow archetypes• Setting annotation requirements

– for compliance with database schemas (SRA, MAGE, PRIDE)– for compliance with community based requirements (MIAME, MIAPE,MIMS....)

• Guide users – Provide preassembled templates– specify vocabulary support

ISAconfigurator: Supporting toolhttps://github.com/ISA-tools/ISAconfiguratorhttp://isatab.sourceforge.net/assets/img/tools/tools-table-images/configurator.png

11

11

ISA configurations

Rely on Biosharing to survey the landscape of community requirements

ISAconfigurator: Supporting toolhttps://github.com/ISA-tools/ISAconfiguratorhttp://isatab.sourceforge.net/assets/img/tools/tools-table-images/configurator.png

12

12

ISA configurations

Rely on Biosharing to survey the landscape of community requirements

ISAconfigurator: Supporting toolhttps://github.com/ISA-tools/ISAconfiguratorhttp://isatab.sourceforge.net/assets/img/tools/tools-table-images/configurator.png

12

12

ISAconfigurator Tables13

13

ISAconfigurator Tables14

14

Tools for creating ISA-Tab documents: ISAcreator

15

15

isacreator

Developed to be a user friendly way to enter standards-compliant metadata: it has lots of features...

But these are just some of them...we also have a data entry wizard and an import utility...

The ISAcreator...16

16

Select and Annotate in ISAcreator17

17

ISACreator Wizard: automatic template generation

Prerequisites and Conditions of use:

-supports factorial design experiments, meaning sets of discrete factor levels combined together, to define a treatment 2x2 factorial design as in 2 compounds and 2 time points 2x2x3 factorial design as in 2 compounds, 2 time points, 2 doses-assumes one sample collection event (all samples collected at sacrifice time)-supports some but not all currently available assay types-supports fractional factorial design-supports unbalanced factor group population sizes (ethical considerations for high dose toxic exposures)-generates automatically sample identifiers, human readable & meaning full labels and , if requested, barcodes

18

18

19ISAcreator features: automatic template generation

19

20ISAcreator features: mapping to third party table (ETL function)

20

20ISAcreator features: mapping to third party table (ETL function)

20

Extending ISAcreatorThe Plugin Architecture

21

21

How do ISA tools access Ontology servers?

22

22

Plugins in ISAcreator

•Plugins can be developed for 3 different purposes:

In ISAcreator, we use the Apache Felix implementation of the OSGi framework...it’s really good.

Search (adds extra search space for ontology tool)

Custom cell editors (for spreadsheet)

Extra general functionality (which appears in a plugin menu)

•2 Examples of ISA plugins:• Access to local metadata stores: Novartis Plugin to Ontology Widget

• Annotation of findings: Metabolite Identification Plugin (Metabolights Repository contribution to ISA project).

23

23

Plugins...example 1 Novartis Metastore Search

Search function on the Novartis Metastore... integrates search results on the metastore in the Ontology search tool.

So, with the Novartis plugin in your Plugin directory, you’ll be able to search the Novartis metastore directly within ISAcreator, and it will handle all the tasks involved with recording term source, etc.

24

24

Plugins Example 2 - Metabolite Identification plugin

5 Credits: Kenneth Haug: Metabolights

25

25

ISAcreator features: visualizing experimental workflows

Work completed during investigation of new approach for creation of glyphs with use of taxonomy for guidance. See Maguire et al, Taxonomy-Based Glyph Design – with a Case Study on Visualizing

Workflows of Biological Experiments, IEEE Transactions on Visualization and Computer Graphics, 2012

26 26

27

Making the most of Experimental Plan

• Working prospectively: Programmatic creation of ISA Tables

• ISAWizard to quickly create ISA Tables–a component of ISAcreator–to be expanded to accommodate more advanced study

designs• Use ISAcreator API to manipulate / create ISA tables

–more information on github:• https://github.com/ISA-tools/ISAcreator/wiki/API

27

28

Communication with Instrumentation

• Survey existing software API– understand input and output– are there xml messages that can be harnessed?

– is it possible to have an instrument to read ISA-Table?– is it possible to have an instrument to write to ISA-Table

• Lemnatec instruments– include barcode/qrcode reader

• harness ISAtools ability to create barcode/QRcodes– devise workflows

• identify key nodes (objects) • identify key data types• agreement of patterns

28

29

• https://github.com/ISA-tools/ISAcreator/wiki/API

29

29

This  bit  of  code  indicates  you  need  to  invoke  ISA  configuraBon  which  define  expected  table  layout  in  order  to  proceed

• https://github.com/ISA-tools/ISAcreator/wiki/API

29

OntoMaton: Searching and Tagging

30

30

2

31

3

32

33

•  R"package"available"since"BioConductor"2.11"h:p://www.bioconductor.org/packages/release/bioc/html/Risa.html"

•  Func@onality"for"parsing"ISAFTab"datasets"into"R"objects,"saving"and"upda@ng"them."

•  It"bridges"the"ISAFTab"metadata"to"analysis"pipelines"of"specific"assay"types,"by"building"objects"for"use"in"other"R"packages"downstream"–  "currently"considering"mass"spectrometry"(xmcs"package,"xcmsSet)"

and"DNA"microarray"(Biobase"package,"ExpressionSet)""

Run Assays4

SAMPLE1

SAMPLE2

SAMPLE3

SAMPLE4

SAMPLE5

SAMPLE6

SAMPLE7

SAMPLE8

SAMPLE9

SAMPLE10

SAMPLE11

SAMPLE 1

SAMPLE 2

SAMPLE 3

SAMPLE 4

SAMPLE 5

SAMPLE 6

SAMPLE 7

SAMPLE 8

SAMPLE 9

SAMPLE 10

SAMPLE 11

FILE 1

FILE 2

FILE 3

FILE 4

FILE 5

FILE 6

FILE 7

FILE 8

FIL

FIL

FIL

Experiment Design Analysis

Arabidopsis thaliana

Treatment groups

70% 90% 100%

Collect Samples1 2 3 5

6

33

4

34

Ongoing Work

35

35

5

36

• New open-access, online-only publication for descriptions of scientifically valuable datasets• Only content type: Data Descriptor, narrative + structured parts• Initially focused on the life, environmental and biomedical sciences• Data Descriptor will be complementary to traditional research journals and data repositories• Designed to foster data sharing and reuse, and ultimately to accelerate scientific discovery

www.nature.com/scientificdata

37

Narrative SectionA brief article-like document like with:•Title•Abstract•Background & Summary•Methods•Technical Validation•Usage Notes •Figures & Tables •References

Structured SectionDetailed descriptions of the experimental procedures used to produce the data•Following community-defined minimum information requirements

• for a level of detail sufficient to reproduce the experiments

•Using ontologies & controlled-vocabularies• To maximise consistency of the descriptions

www.nature.com/scientificdata

Data Descriptors served by Scientific Data

38

6

39

ISA2OWL: mapping in the OBO Foundry space and SIO

40

• Make ISA semantics explicit and serialize ISA representation as Linked Data• Maximize Annotation Markups and Ontology Terms• Augment ISA semantics with new constructs (study groups and their size) allowing further

exploration• Semantic Validation

40

ISA2OWL: mapping in the OBO Foundry space and SIO

40

• Make ISA semantics explicit and serialize ISA representation as Linked Data• Maximize Annotation Markups and Ontology Terms• Augment ISA semantics with new constructs (study groups and their size) allowing further

exploration• Semantic Validation

40

ISA2OWL: mapping in the OBO Foundry space and SIO

40

• Make ISA semantics explicit and serialize ISA representation as Linked Data• Maximize Annotation Markups and Ontology Terms• Augment ISA semantics with new constructs (study groups and their size) allowing further

exploration• Semantic Validation

•  Make%the%seman+cs%of%ISA2Tab%explicit%by%conver+ng%ISA2Tab%files%into%

Linked%Data%(using%web%standards%to%connect%related%data)%

•  Triples%of%<subject,%predicate,%object>%with%iden+fiable%en++es%

–  e.g%<lipoprotein>%<par+cipates_in>%<inflammatory%response>%

%%%%%%%%<PRO:212342352>%<BFO_0000056>%<GO:0006954>%

%%%%%%

%

40

41

New graph based web application

41

41

New graph based web application

41

41

New graph based web application

41

41

New graph based web application

41

41

New graph based web application

41

41

New graph based web application

41

42

42

Questions??

You can email us...isatools@googlegroups.com

View our bloghttp://isatools.wordpress.com

Follow us on Twitter@isatools

View our websitehttp://www.isa-tools.org

Thanks for listening...

View our Git repo & contributehttp://github.com/ISA-tools

43

43

top related