fair data in a um law study · lessons learned fair is not binary (your data is not either fair or...
TRANSCRIPT
FAIR data in a UM Law study
Large-scale analysis of EU court decisions
Kody Moodley, Pedro Hernandez-Serrano, Marcel Schaper, Michel Dumontier, Gijs van Dijck
Lambin et al. Radiother Oncol. 2013. 109(1):159-64. doi: 10.1016/j.radonc.2013.07.007
We need to build a social, ethical and technological infrastructure that
facilitates the discovery and reuse of digital resources
for people and machines
@micheldumontier::IDS-TRAINING:2018-10-30
An international, bottom-up paradigm for the discovery and reuse of digital content
by and for people and machines
Improving the FAIRness of digital
resources will increase their quality
and their potential and ease for
reuse.
@micheldumontier::IDS-TRAINING:2018-10-30
Give unique names for ‘things’ in your data:
Globally unique: not just unique in your dataset
Persistent: don’t keep changing these names
Resolvable: make ‘things’ in your data discoverable on
the Web (e.g. a webpage with more information about it)
Make machine-readable descriptions of your data
so we can use machines to index, search and filter it
Provide metadata describing your data that is accessible beyond its lifetime
Clearly define and communicate access and security protocols for your data
(FAIR != Open)
Represent your data and metadata using machine interpretable formats
Use common vocabularies for representing your data
Link your data to other related datasets
License: who can reuse your data, under what conditions, for what purpose?
Provenance: who generated the data? when and how did they do this?
Community-standards: use the same data sharing, publishing platforms and
data formats, as your peers
(A CDDI pilot study)
Large-scale analysis of EU court decisions
Community for Data-Driven Insights (CDDI)
CDDI investigates how Maastricht
University can become the first FAIR
university (2025) by implementing
eScience, Technology, Expertise, and
Services.
Team for this pilot study
Prof. Michel DumontierIDS @ UM
Project partner
Prof. Gijs van DijckFaculty of Law
Project director
Team for this pilot study
Dr. Kody MoodleyIDS@UM / Faculty of Law
Project manager
Pedro Hernandez-SerranoIDS@UM
Lead Data Scientist
Prof. Marcel SchaperFaculty of Law
Court decision expert
Team for this pilot study
Elden van DelftFaculty of LawCourt decision
expert
Marion MeyersDKE / Faculty of
LawData Scientist
Bogdan CovrigFaculty of LawData Scientist
Andreea GrigoriuIDS @ UM
Faculty of LawData Scientist
Goal
Long term
To build a FAIR data infrastructure that supports empirical legal research
at the Faculty of Law, and makes this kind of research accessible for legal
scholars with limited data science expertise.
Short term
To build a (FAIR) software platform to analyse court decisions
Data sources
2,6 million court decisions
Daily, weekly & monthly updated with decisions
Access via download links on website & API calls
Data
Metadata
Citationss
Case code
Cited laws
Cited cases
Publication date
Court
Data extraction
Data extraction & cleaning scripts
Metadata
Citations
Tested scripts on sample of 2,6 million decisions
Plans to scale the entire data
extraction in the cloud
Data representation
Properties?
Entities?
Relations?
Data representation (common terms)
HCLS Dataset
Descriptions
Bioschemas.org
PROV-O
Dublin Core
PAV
ontology
Ontologies / Controlled Vocabulary (Community maintained)
Data representation (common terms)
EU Vocabularies (EUROVoc)
Common Data Model (CDM) ontology
Data representation (global identifiers)
62014CJ0587 ?
IW/2 1968/2 ?
Case C-16/18 ?
Identifiers for cases can change based on organisation (court) or database
ECLI:NL:CRVB:2014:952
European Case Law IdentifierCountry Court Year ID
Adopt the ECLI convention (uniquely identifies cases on EU level across organisations and databases)
Data representation (multiple formats)entity
attribute
relation
type
instance
Publish our data in both
Relational AND Graph
database formats
Legal Knowledge Graph (long term vision)
Findability & Accessibility
Vary according to the kinds of data, how much free storage and
some added features
Findability & Accessibility
Findability & Accessibility
Next steps
● Extract all citations & metadata for 2.6 million court decisions
● Convert information to graph (RDF) format - Data2Services pipeline
● Publish data in FAIR supporting repositories (Zenodo and OSF)
Lessons learned
● FAIR is not binary (your data is not either FAIR or not FAIR)
● FAIR != open
● A little FAIRness goes a long way
● Findability and accessibility was easier for us
● Interoperability and reusability can be a challenge when there
are few standards in your community
● Steps for making data FAIR may vary depending on the nature
of the project and the data
Thank you!
@MoodleyKody