evolution management for preservation prelida consolidation workshop 17.10.2014 giorgos flouris...
TRANSCRIPT
Evolution Management for PreservationPRELIDA Consolidation Workshop 17.10.2014
Giorgos Flouris (FORTH)[email protected]
Evolution Management ProblemPreservation ↔ Evolution
Change Detection
• Change detection for evolution management– Identifying changes between versions
• Challenges (in DIACHRON)1. Diverse data models2. Dynamic datasets3. Recoverable versions4. Changes as first-class citizens5. Cross-snapshot queries
Evolution in DIACHRON
Pilot dataset DIACHRON
Vers
ion
1
Pilot dataset DIACHRON
Vers
ion
2
Change Types: MotivationWhat a naïve diff will report
Add (Rec, diachron:subject, EFO_001927)Add (Rec, diachron:hasRecordAttribute, rAtt1)Add (rAtt1, diachron:predicate, rdfs:subClassOf)Add (rAtt1, diachron:object, ObsoleteClass)
What the pilot expects
Add_SuperClass (EFO_001927, ObsoleteClass)
Change Hierarchy: Low-level (1/3)
• Low-level changes– DIACHRON model, for internal use– Fixed:
Add, Delete– Just additions and deletions of triples– Simple set difference
Change Hierarchy: Simple (2/3)
• Pilot terminology: – Add_SuperClass
Add_Dimension• Fixed, pre-defined
• Comprising of low-level changes• Partitioning is perfect– Complete and unambiguous
Change Hierarchy: Complex (3/3)
• Pilot terminology: – Add_Synonym, Mark_As_Obsolete
• Totally custom, pilot-specific (defined at run-time)
Using Changes for Evolution Management
• DIACHRON data model contains all versions• Detection based on SPARQL queries– Provided at deployment time (for simple)– Generated at creation time (for complex)
• Recoverability– Allows moving back and forth between versions
Representation Requirements
• Interesting queries– Return the simple changes that dataset X underwent between
versions V1 and V2– Return the changes that resource X underwent in the first
semester of 2014– Give me all resources of type X that underwent change Y– Return all countries for which the unemployment rate of their
capital city increased at a rate higher than the average increase of the country as a whole, between versions V1 and V2
• Access to both the changes and the data is required– Changes are first-class citizens– Allowing preservation
DIACHRON
Data
Changes Ontology
C1
Add_SuperClass
V1
V2
asc_p1
asc_p2
Simple_Change
Change
prov:Activity
Data level
Schema level
EFO_001927
ObsoleteClass
old_version
new_version
diachron:Entity
Add_Synonym
Complex_Change
… …
Conclusion
• Main DIACHRON message – (Linked) data preservation is related to evolution management
• DIACHRON challenges1. Diverse data models2. Dynamic datasets3. Recoverable versions4. Changes as first-class citizens5. Cross-snapshot queries
• Solutions– DIACHRON data model (#1)– Appropriate change definition and detection (#2, #3)– Changes and data represented at the same level (#4, #5)