evolution management for preservation prelida consolidation workshop 17.10.2014 giorgos flouris...

12
Evolution Management for Preservation PRELIDA Consolidation Workshop 17.10.2014 Giorgos Flouris (FORTH) [email protected]

Upload: cory-randall

Post on 31-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Evolution Management for Preservation PRELIDA Consolidation Workshop 17.10.2014 Giorgos Flouris (FORTH) fgeo@ics.forth.gr

Evolution Management for PreservationPRELIDA Consolidation Workshop 17.10.2014

Giorgos Flouris (FORTH)[email protected]

Page 2: Evolution Management for Preservation PRELIDA Consolidation Workshop 17.10.2014 Giorgos Flouris (FORTH) fgeo@ics.forth.gr

Evolution Management ProblemPreservation ↔ Evolution

Page 3: Evolution Management for Preservation PRELIDA Consolidation Workshop 17.10.2014 Giorgos Flouris (FORTH) fgeo@ics.forth.gr

Change Detection

• Change detection for evolution management– Identifying changes between versions

• Challenges (in DIACHRON)1. Diverse data models2. Dynamic datasets3. Recoverable versions4. Changes as first-class citizens5. Cross-snapshot queries

Page 4: Evolution Management for Preservation PRELIDA Consolidation Workshop 17.10.2014 Giorgos Flouris (FORTH) fgeo@ics.forth.gr

Evolution in DIACHRON

Pilot dataset DIACHRON

Vers

ion

1

Pilot dataset DIACHRON

Vers

ion

2

Page 5: Evolution Management for Preservation PRELIDA Consolidation Workshop 17.10.2014 Giorgos Flouris (FORTH) fgeo@ics.forth.gr

Change Types: MotivationWhat a naïve diff will report

Add (Rec, diachron:subject, EFO_001927)Add (Rec, diachron:hasRecordAttribute, rAtt1)Add (rAtt1, diachron:predicate, rdfs:subClassOf)Add (rAtt1, diachron:object, ObsoleteClass)

What the pilot expects

Add_SuperClass (EFO_001927, ObsoleteClass)

Page 6: Evolution Management for Preservation PRELIDA Consolidation Workshop 17.10.2014 Giorgos Flouris (FORTH) fgeo@ics.forth.gr

Change Hierarchy: Low-level (1/3)

• Low-level changes– DIACHRON model, for internal use– Fixed:

Add, Delete– Just additions and deletions of triples– Simple set difference

Page 7: Evolution Management for Preservation PRELIDA Consolidation Workshop 17.10.2014 Giorgos Flouris (FORTH) fgeo@ics.forth.gr

Change Hierarchy: Simple (2/3)

• Pilot terminology: – Add_SuperClass

Add_Dimension• Fixed, pre-defined

• Comprising of low-level changes• Partitioning is perfect– Complete and unambiguous

Page 8: Evolution Management for Preservation PRELIDA Consolidation Workshop 17.10.2014 Giorgos Flouris (FORTH) fgeo@ics.forth.gr

Change Hierarchy: Complex (3/3)

• Pilot terminology: – Add_Synonym, Mark_As_Obsolete

• Totally custom, pilot-specific (defined at run-time)

Page 9: Evolution Management for Preservation PRELIDA Consolidation Workshop 17.10.2014 Giorgos Flouris (FORTH) fgeo@ics.forth.gr

Using Changes for Evolution Management

• DIACHRON data model contains all versions• Detection based on SPARQL queries– Provided at deployment time (for simple)– Generated at creation time (for complex)

• Recoverability– Allows moving back and forth between versions

Page 10: Evolution Management for Preservation PRELIDA Consolidation Workshop 17.10.2014 Giorgos Flouris (FORTH) fgeo@ics.forth.gr

Representation Requirements

• Interesting queries– Return the simple changes that dataset X underwent between

versions V1 and V2– Return the changes that resource X underwent in the first

semester of 2014– Give me all resources of type X that underwent change Y– Return all countries for which the unemployment rate of their

capital city increased at a rate higher than the average increase of the country as a whole, between versions V1 and V2

• Access to both the changes and the data is required– Changes are first-class citizens– Allowing preservation

Page 11: Evolution Management for Preservation PRELIDA Consolidation Workshop 17.10.2014 Giorgos Flouris (FORTH) fgeo@ics.forth.gr

DIACHRON

Data

Changes Ontology

C1

Add_SuperClass

V1

V2

asc_p1

asc_p2

Simple_Change

Change

prov:Activity

Data level

Schema level

EFO_001927

ObsoleteClass

old_version

new_version

diachron:Entity

Add_Synonym

Complex_Change

… …

Page 12: Evolution Management for Preservation PRELIDA Consolidation Workshop 17.10.2014 Giorgos Flouris (FORTH) fgeo@ics.forth.gr

Conclusion

• Main DIACHRON message – (Linked) data preservation is related to evolution management

• DIACHRON challenges1. Diverse data models2. Dynamic datasets3. Recoverable versions4. Changes as first-class citizens5. Cross-snapshot queries

• Solutions– DIACHRON data model (#1)– Appropriate change definition and detection (#2, #3)– Changes and data represented at the same level (#4, #5)