how open is open? an evaluation rubric for public knowledgebases
TRANSCRIPT
H O W O P E N I S O P E N ?
A N E V A L U AT I O N R U B R I C F O R P U B L I C
K N O W L E D G E B A S E S
MELISSA HAENDEL
MARCH 28TH, 2017
@ontowonka
THERE ARE OVER 1500 PUBLIC DATABASES IN NUCLEIC ACIDS RESEARCH DATABASE COLLECTION
https://doi.org/10.1093/nar/gkw1188
HOW MANY OF THESE ARE TRULY OPEN?
OPENNESS IS AN NARREQUIREMENT, BUT …
WHY ARE WE STILL FAILING?
OPEN DATA IS FAIR DATA
http://www.nature.com/articles
/sdata201618
Findable Accessible Interoperable Reusable
ANATOMY OF FAIR: FINDABLE
persistent identifier
rich metadata
registered or indexed in a searchable resource
McMurry et al Identifiers for the 21st century
bit.ly/identifiers-2017
ANATOMY OF FAIR: ACCESSIBLE
(meta) data are openly retrievable by their
identifier using a standardized
communications protocol
Metadata are accessible, even when the data
are no longer available
http://api.monarchinitiative.org/api/
ANATOMY OF FAIR: INTEROPERABLE
Use a formal, accessible, shared, and broadly
applicable language for knowledge
representation
Define semantics of all relationships, including
cross references (hint: use the Relations
Ontology!)
ANATOMY OF FAIR: INTEROPERABLE
Picking on the Personal Genome Project (thanks Sasha!)
Do you have a severe genetic disease or rare genetic trait? If so, you can add a description for your public profile.
1. Extreme susceptibility to motion sickness. - answers pertain to this trait2. Pyloric stenosis3. Unusually small feet for my height
ANATOMY OF FAIR: REUSABLE
Meta(data) are described with a plurality of
accurate and relevant attributes
Detailed provenance and use of community
standards
www.obofoundry.orghttps://www.w3.org/TR/hcls-dataset/
https://peerj.com/articles/2331.pdf
Findable Accessible Interoperable Reusable
FAIR-TLC
Traceable Licensed Connected
FAIR-TLC: TRACEABILITY
Provenance is documented and attributed
Contributions to the content (data, tools,
algorithms, sources, etc.) are declared
Documentation on how to cite a record from a
source or the whole resource
FAIR-TLC: LICENSURE
http://peterdesmet.com/posts/analyzing-gbif-data-licenses.html
Not all data resources are free to use, derive, and
redistribute, even if they are publicly funded and
seemingly publicly available.
FAIR-TLC: LICENSURE
http://peterdesmet.com/posts/analyzing-gbif-data-licenses.html
Standard
license171
Non-standar
d license1069
No license10734
NON-STANDARD LICENSES BURDEN SCIENCE bit.ly/reusabledata-forum
FAIR-TLC: CONNECTEDBECAUSE AGGREGATED != INTEGRATED
FAIR-TLC: CONNECTEDBECAUSE AGGREGATED != INTEGRATED
192K datasets….probably more than 38 are relevant to diabetes
FAIR-TLC: CONNECTEDBECAUSE AGGREGATED != INTEGRATED
Similarly, clouds do not integrate data.
http://stonebond.com/wp-content/uploads/2015/05/cloud-data-bullet-points-img.jpg
EVALUATING THE OPEN SCIENCE CANDIDATES Room for
improvement
bit.ly/open-science-prize
Open imaging
DISCUSSION: HOW DO WE DO BETTER?
Make the right thing the easy thing:
- Carrots:
- Tenure & promotion cycles
- Dedicated funding for increasing FAIR-
TLC
- Sticks:
- Publication requirements
- Funding requirements
- Tools:
- Tracking tools
- Documentation tools
- Social tools
ARE JOURNAL DATA SHARING POLICIES HITTING THE MARK ?
Vasilevsky et al.
https://doi.org/10.7287/peerj.preprints.2588v1
TOO TINY A STICK?
Vasilevsky et al.
https://doi.org/10.7287/peerj.preprints.2588v1
REUSABLEDATA.ORG
Curate, evaluate, and provide guidance on
legal and effective data reuse and redistrubiton
Wanna help? Join the google group at:
Seth Carbonbit.ly/reusabledata-forum
T H A N KS T O :
JULIE MCMURRY
ANDREW SU
SETH CARBON