the electronic notebook ontology
TRANSCRIPT
The ElectronicNotebook Ontology
Stuart J. ChalkDepartment of Chemistry, University of North
VIVO 2015 – August 2015
Motivation Inspiration Electronic Scientific Notebooks The Experiment Markup Language VIVO-ISF Ontology HCLS Community Profiles Analysis Important Questions Ontology Conclusion
Outline
There’s somethingmissing from the big data landscape in science…
VIVO captures data about scientists (faculty)… …but not about the data they produce
HCLS Community Profile outlines metadata for describing datasets but does not mention laboratory notebooks
Electronic laboratory notebooks are set to become the standard way scientists capture data
How do we link these together?
Motivation
Scientists need to move todigital notebooks…
...and record not just the databut the flow and context
Traditional Laboratory Notebooks
How science is doneis important for searching,aggregation, meta-analysis
Developed out of Laboratory InformationManagement Systems (LIMS)
Content Management System for Scientists Storage of
Research data Research resources (instruments, samples, scientists) The story of the scientific endeavor
Link to external resources Display chemical structures Allow aggregation, processing of data Be compliant with industry standard record
keeping
Electronic Laboratory Notebooks
Electronic Laboratory Notebooks
A specification (written in XML) that describes different types of information recorded during the scientific process (http://exptml.sourceforge.net)
Many datatypes (will expand…)
Experiment Markup Language (ExptML)
Sample Solution Space Specimen Substance Task Template Timeline User Vendor
Annotation Api Calculation Chemical Citation Communication Customer Data Dataset Definition
Element Equipment Event Experiment Group Project Protocol Quote Report Result
ExptML Ontology
VIVO-ISF Ontology
https://wiki.duraspace.org/download/attachments/51052811/PeopleOrgsRolesGrants.2014-03-14.png
The Healthcare and Life Science (HCLS) Community Profile is a Note from the Semantic Web HCLS Interest Group Access to consistent, high-quality metadata is critical to
finding, understanding, and reusing scientific data. This document describes a consensus among participating stakeholders in the Health Care and the Life Sciences domain on the description of datasets using the Resource Description Framework (RDF). This specification meets key functional requirements, reuses existing vocabularies to the extent that it is possible, and addresses elements of data description, versioning, provenance, discovery, exchange, query, and retrieval.
Data Descriptions:HCLS Community Profile
http://www.w3.org/TR/hcls-dataset/
Describes three levels for description of datasets Summary Level
Type declaration (rdf:type = dctypes:Dataset)
Title (dct:title = rdf:langString) Description (dct:description =
rdf:langString) Publisher (dct:publisher = IRI)
Version Level Type declaration (rdf:type =
dctypes:Dataset) Title (dct:title = rdf:langString) Description (dct:description =
rdf:langString)
Creator (dct:creator = IRI) Publisher (dct:publisher = IRI) Version identifier (pav:version =
xsd:string) Version linking (dct:isVersionOf =
IRI) Distribution Level
Type declaration (rdf:type = void:Dataset OR dcat:Distribution)
Title (dct:title = rdf:langString) Description (dct:description =
rdf:langString) Creator (dct:creator = IRI) Publisher (dct:publisher = IRI) License (rdf:type = IRI)
Data Descriptions:HCLS Community Profile
http://www.w3.org/TR/hcls-dataset/#datasetdescriptionlevels
Goal: Automated identification of datasets that could be made searchable and/or distributable
When an ELN functions what does it do? Orchestrates access to the system
(authentication) Supplies GUI to allow information to be
Displayed Entered Processed
Processes files to bring them into the system Sends requests to internal/external servers to
get data
Analysis
Is this information a dataset?
Does dataset belong to this author? Is the dataset available? Is there appropriate metadata? At what HCLS levels can this dataset be made
available?
What mechanism is used to make the dataset available?
Important Questions
Actions that deal with datasets Software actions User actions
Clues that something is research data(not metadata or someone else’s data)
Collection of metadata for annotation of datasets
Inference that a HCLS dataset has been created
Dataset Identification
Electronic Notebook Ontology (ENO)
ENO
ENO
Providing a mechanism to link research data to VIVO profiles would Add value to VIVO Provides faculty with a resource for their
data management plans Creates opportunities for automatic aggregation
of research data into institutional repositories
Needs to be implemented in a test ELN…
Take Home
[email protected] Phone: 904-620-1938 Skype: stuartchalk LinkedIn/Slidehare: https://www.linkedin.com/in/
stuchalk ORCID: http://orcid.org/0000-0002-0703-7776 ResearcherID:
http://www.researcherid.com/rid/D-8577-2013
Questions?