cshals 2013
DESCRIPTION
TRANSCRIPT
Alejandra Gonzalez-‐Beltran University of Oxford e-‐Research Centre, UK
The ISA Infrastructure for the biosciences from data curaDon at source to the linked data cloud
Conference on Semantics in Healthcare and Life Sciences (CSHALS)
Boston, USA Feb 27- Mar 1 2013
• The infrastructure : a metadata tracking framework in the biosciences: the format, a set of open source soMware tools and the user community
• The syntax and its implicit semanDcs
• The component of the infrastructure
• for mapping the syntax to ontologies
• A couple of mappings, architecture, conversion
Outline
Contextual informaDon (metadata): • Sample characterisDcs • Technology and measurement types • Instrument parameters • …
Need for a generic representaDon, applied to: •microarray based experiments (MAGE) •sequencing based experiments (SRA) •flow cytometry based experiments (FuGE-‐Flow Cyt) •mass spectrometry and NMR spectroscopy
experiments (Metabolights and PRIDE)
• Assist in the annotaDon and management of experimental metadata at source, supporDng data provenance tracking
• Deal with high-‐throughput studies using one or a combinaDon of omics and other technologies
• Empower users to uptake community-‐defined checklists and ontologies
• Facilitate data sharing, re-‐use, comparison and reproducibility of experiments, submission to internaDonal public repositories
infrastructure ISA soMware suite: supporDng
standards-‐compliant experimental annotaDon and enabling curaDon at
the community level Rocca-‐Serra et al, 2010
BioinformaDcs
A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework
to facilitate standards-‐compliant collecDon, curaDon, management and reuse of invesDgaDons in an increasingly diverse set of life science domains.
Towards interoperable bioscience data Sansone et al, 2012 Nature GeneDcs
syntax (and its implicit semanDcs)
Protocol Process
Characteristics[…] Factor Value[…] (independent variables) Material Type Comment[…]
Date (day effect)
Performer (operator effect)
Parameter Value […]
Derived Data File
Raw Data File
Data File Node
" DATA!
" Material!
Material Node
Sample Name Material Type
HybridizaDon Assay Name Assay Design REF Array Data File Protocol REF Derived Array Data File
sample1 genomic DNA assay1 A-AFFY-107" assay1.cel data normalizaDon assay1.txt
sample2 genomic DNA assay2 A-AFFY-107" assay2.cel data normalizaDon assay2.txt
sample3 genomic DNA assay3 A-AFFY-107" assay3.cel data normalizaDon assay3.txt
Material transformations...
" Material!
" DATA!
Tagging: from free text to ontology-‐based • single intervenDon representaDon, free text annotaDon
• single intervenDon, ontology-‐based annotaDon
11
Source Name CharacterisDcs[organism]
Factor Value[perturbaDon agent]
Factor Value[dose]
Factor Value[duraDon]
individual1 human aspirin high dose 12 weeks
Source Name CharacterisDcs[organismobi:0100026)])
Term Source REF
Term Accession Number
Factor Value[chemical compound CHEBI_37577)]
Term Source REF
Term Accession Number
individual1 Homo sapiens NCBITax 9606 aspirin CHEBI 1231354
Factor Value[dose(OBI_0000984)
Term Source REF
Term Accession Number
Factor Value[Dme (PATO_0000165)] Unit Term Source
REF Term Accession Number
low dose LNC LP30872-‐3 12 week UO “0000034”
Kohonen et al. The ToxBank Data Warehouse: a research cluster of 7
EU FP7 Health systems toxicology and toxicogenomics projects.
Health Care & Life Sciences Interest Group
ToxBank effort developed by Nina Jeliazkova
• Make the semanDcs of ISAtab explicit, including materials & data enDDes & processes & their relaDonships
• Provide incenDves for provision of ontology-‐based annotaDons in ISA-‐TAB datasets; exploit those annotaDons
• Augment ISA syntax with new elements (e.g. groups), facilitaDng the understanding & querying of experimental design
• Facilitate data integraDon & knowledge discovery/reasoning
architecture
ISA-TAB parser isa2owl mapping
parser graph
analysis
Configuration file
• Ontology search and automated tagging (relying on NCBO Bioportal services) on Google Spreadsheets • CollaboraDve annotaDon; support for distributed users • Version control & history
OntoMaton: a Bioportal powered Ontology widget for Google
Spreadsheets Maguire et al, 2013
BioinformaDcs
Expe
rimen
tal
domain
Biomolecular domain
Chemical domain
InformaDon domain
vocabularies
Source Name CharacterisDcs[organismobi:0100026)])
Term Source REF
Term Accession Number
Factor Value[chemical compound CHEBI_37577)]
Term Source REF
Term Accession Number
individual1 Homo sapiens NCBITax 9606 aspirin CHEBI 1231354
Source Name CharacterisDcs[organismobi:0100026)])
Term Source REF
Term Accession Number
Factor Value[chemical compound CHEBI_37577)]
Term Source REF
Term Accession Number
individual1 Homo sapiens NCBITax 9606 aspirin CHEBI 1231354
OBI
GO ChEBI IAO
Open Biological and Biomedical Ontologies
(OBO) Foundry BFO
ISA-‐OBI mapping
ISA-‐SIO mapping
Data subset: LC/MS peaks from the spinal cords of 6 wild-‐type and 6 FAAH (fapy acid amyde hydrolase) knockout mice
faahKO dataset Available in
Bioconductor (with ISA-‐TAB metadata)
Global metabolite profiling
• support different conversion modes (different levels of granularity)
• querying for ISA-‐TAB datasets, across mulDple experiment types
• reasoning exploiDng ontology annotaDons • semanDc validaDon of ISA-‐TAB datasets
• augmented annotaDon over naDve ISA syntax
• idenDficaDon gaps in ontological representaDons • feedback of findings to community ontologies
Increasing level of structure for experimental metadata
Notes in Lab books
Spreadsheets & Tables (ISAtab metadata)
Facts as RDF statements
@isatools @biosharing
isa-tools.org isacommons.org biosharing.org