data models for preserving and publishing digital research material beyond the pdf
DESCRIPTION
Slides for the Technology Track of ISMB/ECCB 2013 in Berlin on digital publishing, highlighting the Research Object model, Nanopublications, and ISA as a means to capture methods and results when research is carried out digitally. This work was supported by the EU workflow forever project (http://wf4ever-project.org).TRANSCRIPT
Data models for digital preservation and publishing beyond the PDF
Jun Zhao, Mark Thompson, Kristina Hettne, Stian Soiland, Susana Garcia , Marco Roos
Acknowledging Harish Dharuri, Susanna Sansone, Philipe Rocca-Sera,
Alejandra Gonzales-Beltran, Albert Mons, Arie Baak, Erik Schultes, Carole Goble, Barend Mons
The Workflow Forever project (EU FP7 nr. 270192), Digital Libraries and Digital Preservation. (ICT-2009.4.1)
Recording your computational steps…
Bioinformaticians have no labbooks!and no training on digital notekeeping
http://graemefielder.wordpress.com/2010/09/17/lab-books-evolution-required/
State of the art study capture?
How then?Workflows encapsulate in silico analysis
http://ap27-cgla.blogspot.nl/ http://openi.nlm.nih.gov/detailedresult.php?img=2743669_1471-2105-10-252-2&req=4
5
Components to understand an experimentIs a workflow enough?
Workflow: Which biological pathways explain the
associations?
Interpret results(Interaction
pathways in the cell)
Research QuestionGenome Wide Association Studies (GWAS)
In 1000+ people: which gene mutations are associated with metabolic syndrome,
and why?
Download data- External DB
- Existing Knowledge
Hypothesis
Genes involved in inflammation pathways are
involved in the onset of metabolic syndrome.
6
Components to understand an experimentIs a workflow enough?
Workflow: Which biological pathways explain the
associations?
Interpret results(Interaction
pathways in the cell)
Research QuestionGenome Wide Association Studies (GWAS)
In 1000+ people: which gene mutations are associated with metabolic syndrome,
and why?
Download data- External DB
- Existing Knowledge
Hypothesis
Genes involved in inflammation pathways are
involved in the onset of metabolic syndrome. Preserve
PreservePreserve
Preserve
Preserve
Research Object
DataData
Method/Experimental
protocol
Method/Experimental
protocol
FindingsFindings
Types of resources
ISA-TAB/ISA2OWL
Nanopublication
ISA-TAB/ISA2OWLWfdesc
ISA-TAB/ISA2OWLWfdesc
Data Models
Capture more than workflows
Research Object ModelPreservation for understanding
Preserve at least the:– Hypothesis
– A workflow-like sketch
– One or more workflows
– Input data
– Workflow runs
– Results
– Conclusion
My Research Book
9
Fame and Glory
It was me, me,
me!
What I found
How I found
it
HDAC1 interacts with Parvb
Discovered by: me
Nanopublication
AssertionProvenance of Assertion
Metadata of nanopublication
Prototyping the models
• Create: myExperiment• Better: Checklist service• Evolution: Digital Library software• Curation: Quality Monitoring Service• Credit original assertions: LandMark Tool• Applications by private partners
myExperiment- create Research Objects
Prototyping the Research Object Data Model in
Checklist service- make better Research Objects
Prototyping the Research Object Data Model in
http://www.wf4ever-project.org/wiki/display/docs/RO+checklist+evaluation+API
http://www.wf4ever-project.org/wiki/display/docs/RO+checklist+evaluation+API
RELEASE! http://www.wf4ever-project.org/wiki/display/docs/RO+checklist+evaluation+API
Digital Library software- evolution of a Research Object
Prototyping the Research Object Data Model in
Research Object ‘under construction’
Snaphots to record intermediate states
Full copy ‘Ready for Release’
Quality Monitoring Service- Long term curation
Prototyping the Research Object Data Model in
Landmark Claim Tool- mark and credit the first discovery
Prototyping the Nanopublication Model
Landmark Claim Tool
Core data
Attribution
Qualification
Applications from private partners- Robust tools for business stakeholders
Prototyping the Nanopublication Model
Nanopublication applicationsEuretos Company
Copyright Euretos b.v. 2013
48
Releases planned for 2014
Some gory detailData models ‘under the hood’
50
Research Object Model at a glance
Research Object
ResourceResource
Resource
AnnotationAnnotation
Annotation
oa:hasTarget
ResourceResource
Annotation graphoa:hasBody
ore:aggregatesManifest
ore:isDescribedBy
For more information and extensions (Evolution model, MINIM) seehttp://wf4ever-project.org/
Extensions
52
Wf4Ever architecture
Semantic REST API
RDF triple store(RO structure, Annotations)
RO indexUploaded files
PortalChecklist service
Command line
Workflow runner
...
Nanopublication Data Model
Assertion
Nanopublication URL
Provenance PublicationInfo
assertion
opm:was
DerivedFrom
http://rdf.biosemantics.org/…profiles_matching_198
0_2010
opm:wasGene-ratedBy
thisnanopub
dcterms:created
2012-03-28T11:32
^̂ xsd:dateTime
pav:authored-
By
associa-tion
a sio:statis-ticalAssociation
sio:has-measurementValue
Association_1_p_value
a
Sio:probability-value
sio:has-value
6.56 e-5
^̂ xsd:float
sio:refers-to
http://bio2rdf.org/
omim:210600
researcherid.com/rB-6035-
2012
dcterms:DOI
http://dx.doi.org/
….
…http://
bio2rdf.org/geneid:55835
Integrity Key
An Individual association between concepts:• statement or declaration• measurement• hypothetical inference• quantitative or qalitative
Guarantee immutabilityafter publication
Unique, persistent and resolvable identifier
How this assertion came to be, methods,
evidence, context, etc.
• Detailed attribution for authors, institutions, lab technicians, curators
• License info• Publication date
Assertion
http://www.store.net/mynanopub.rdf
Provenance Publication-Info
SoapDenovo 2 increases correct
assembly length by 3-80 times over Soapdenovo 1
pav:authoredBydc:rights
dc:created
A Galaxy workflow
results
slides
hypothesis
ro:aggregate
s
Research object can link to a nanopub as
an experimental result
ro:aggregates
Assertion
http://www.store.net/mynanopub.rdf
Provenance Publication-Info
SoapDenovo 2 increases correct
assembly length by 3-80 times over Soapdenovo 1
pav:authoredBydc:rights
dc:created
A Galaxy workflow
results
slides
hypothesis
ro:aggregate
s
Nanopublication gains detailed
workflow provenance by
linking to RO
ro:aggregates
rdf:describedBy
Assertion
http://www.store.net/mynanopub.rdf
Provenance Publication-Info
SoapDenovo 2 increases correct
assembly length by 3-80 times over Soapdenovo 1
pav:authoredBydc:rights
dc:created
A Galaxy workflow
results
ro:aggregates
slides
hypothesis
ro:aggregate
s
Extend your provenance!
E.g. link the claim to the original data elements
from which it was derived
rdf:describedBy
Assertion
http://www.store.net/mynanopub.rdf
Provenance Publication-Info
SoapDenovo 2 increases correct
assembly length by 3-80 times over Soapdenovo 1
pav:authoredBydc:rights
dc:created
A Galaxy workflow
results
ro:aggregates
slides
hypothesis
ro:aggregate
s
?rdf:describedBy
Community effort
• Research Objectshttp://researchobjects.org/http://wf4ever-project.org/
• Nanopublicationhttp://Nanopub.org/
• ISA-toolshttp://www.isa-tools.org/
• Research Objects Community Group at W3Chttp://w3.org/community/rosc
W3C community group for ROhttp://www.w3.org/community/rosc/
Conclusions (1/2)
• Applications of RO and Nanopublication data models to capture the bioinformatics research process ‘beyond the PDF’
• Data models:ISA, Research Objects, Nanopublications
Conclusions (2/2)
• Reference implementations / first to adopt:myExperiment, DLibra, Checklist service, Curation/monitoring, Landmark tool
• Private partners developing stable nanopublication applications
• Prevent perfectionism of the developers:get involved now!
THANK YOU FOR YOUR ATTENTION
http://researchobject.org/ http://nanopub.org/ http://isa-tools.org/ Research Object Community group at W3C: http://w3.org/community/rosc