on explicit provenance management in rdf/s graphs
DESCRIPTION
On Explicit Provenance Management in RDF/S Graphs. Panagiotis Pediaditis Giorgos Flouris Irini Fundulaki Vassilis Christophides {pped, fgeo, fundul, christop}@ics.forth.gr. Institute of Computer Science Foundation for Research and Technology – Hellas Heraklion, Greece. - PowerPoint PPT PresentationTRANSCRIPT
23/02/2009 Giorgos Flouris 1
TAPP-09
On Explicit Provenance Management in RDF/S Graphs
Institute of Computer Science Foundation for Research and Technology – Hellas
Heraklion, Greece
Panagiotis PediaditisGiorgos FlourisIrini Fundulaki
Vassilis Christophides
{pped, fgeo, fundul, christop}@ics.forth.gr
23/02/2009 Giorgos Flouris 2
TAPP-09
Provenance Management in RDF/S
Provenance management problem
Mostly addressed in the database contextWe are dealing with why provenance in RDF/S graphs
—Why provenance: identifying the source data that had some influence on the existence of the target data
Three main characteristics (peculiarities of RDF/S)
Triple-based representation—Use quadruples to talk about triples’ provenance
Inference—Assign provenance information to implicit data
Coherence semantics (in updates)—Implicit data is a first-class citizen and should be retained during change,
along with its provenance information
23/02/2009 Giorgos Flouris 3
TAPP-09
Characteristic #1Triple-based Representation
23/02/2009 Giorgos Flouris 4
TAPP-09
RDF Graphs
Paper10
PaperTAPP
Paper
instancerdf:type
subclassrdfs:subClassOf
Giorgos
Author
Person
writes
RDF graph = set of RDF triples
Define classes[Paper rdf:type rdfs:Class][PaperTAPP rdf:type rdfs:Class][Person rdf:type rdfs:Class][Author rdf:type rdfs:Class]Define properties[writes rdf:type rdf:Property[writes rdfs:domain Author][writes rdfs:range Paper]Instantiate (and define) individuals[Paper10 rdf:type PaperTAPP][Giorgos rdf:type Author][Giorgos writes Paper10]Define hierarchies[PaperTAPP rdfs:subClassOf Paper][Author rdfs:subClassOf Person]And other stuff…
23/02/2009 Giorgos Flouris 5
TAPP-09
Provenance in RDF Graphs
Paper10
PaperTAPP
Paper
instancerdf:type
subclassrdfs:subClassOf
Giorgos
Author
Person
writes
Publications Graph
(PUB)
TAPP Graph (TAPP)
PUB: [Paper rdf:type rdfs:Class]TAPP: [PaperTAPP rdf:type rdfs:Class] PUB: [Person rdf:type rdfs:Class]PUB: [Author rdf:type rdfs:Class]PUB: [writes rdf:type rdf:Property]PUB: [writes rdfs:domain Author]PUB: [writes rdfs:range Paper]TAPP: [Paper10 rdf:type PaperTAPP]TAPP: [Giorgos rdf:type Author]TAPP: [Giorgos writes Paper10]TAPP: [PaperTAPP rdfs:subClassOf Paper]PUB: [Author rdfs:subClassOf Person]
23/02/2009 Giorgos Flouris 6
TAPP-09
Named Graphs and Provenance
Create two named graphs and assign an ID (URI) to each
Publications graph (URI: PUB)TAPP graph (URI: TAPP)
Each named graph corresponds to a different source
Need some method to associate named graphs with triples
Triples become quadruples Fourth element is the URI of the named
graph (origin)
Paper10
PaperTAPP
Paper
instancerdf:type
subclassrdfs:subClassOf
Giorgos
Author
Person
writes
23/02/2009 Giorgos Flouris 7
TAPP-09
Quadruples for Provenance
Paper10
PaperTAPP
Paper
instancerdf:type
subclassrdfs:subClassOf
Giorgos
Author
Person
writes
[Paper rdf:type rdfs:Class PUB][PaperTAPP rdf:type rdfs:Class TAPP][Person rdf:type rdfs:Class PUB][Author rdf:type rdfs:Class PUB][writes rdf:type rdf:Property PUB][writes rdfs:domain Author PUB][writes rdfs:range Paper PUB][Paper10 rdf:type PaperTAPP TAPP][Giorgos rdf:type Author TAPP][Giorgos writes Paper10 TAPP][PaperTAPP rdfs:subClassOf Paper TAPP][Author rdfs:subClassOf Person PUB]
All quadruples of the form [s p o PUB] originate from named graph PUB (Publications graph)All quadruples of the form [s p o TAPP] originate from named graph TAPP (TAPP graph)
23/02/2009 Giorgos Flouris 8
TAPP-09
Properties of Named Graphs
The named graph URI can be used to refer to the named graph
Can be used for assignment of metadata[TAPP hasAuthor JamesCheney G]
Granularity of provenance
A triple is the smallest bit of informationThe granularity of provenance achieved
by named graphs is at the triple levelFlexible
—A named graph can contain 0,1, or many triples
—A triple can belong to 0,1, or many named graphs
Paper10
PaperTAPP
Paper
instancerdf:type
subclassrdfs:subClassOf
Giorgos
Author
Person
writes
23/02/2009 Giorgos Flouris 9
TAPP-09
Characteristic #2Inference
23/02/2009 Giorgos Flouris 10
TAPP-09
RDF/S Graphs
RDF Schema: add-on to RDF
RDFS adds inference semantics
Transitivity of subclass/subpropertyImplicit instantiations
Example
[Giorgos rdf:type Author][Author rdfs:subClassOf Person]Inference:
[Giorgos rdf:type Person]
Inferred knowledge is implicit
Paper10
PaperTAPP
Paper
instancerdf:type
subclassrdfs:subClassOf
Giorgos
Author
Person
writes
23/02/2009 Giorgos Flouris 11
TAPP-09
Provenance and Inference
Quadruples:
[Giorgos rdf:type Author PUB][Author rdfs:subClassOf Person TAPP][Giorgos rdf:type Person ???]
Needs:
Shared ownershipA more sophisticated, compound
structureKeeping the connection with the
componentsComposition operator (PT=PUB●TAPP)
—[Giorgos rdf:type Person PT]
—Ok, but see characteristic #3
Paper10
PaperTAPP
Paper
instancerdf:type
subclassrdfs:subClassOf
Giorgos
Author
Person
writes
23/02/2009 Giorgos Flouris 12
TAPP-09
Characteristic #3 Coherence Semantics (in Updates)
23/02/2009 Giorgos Flouris 13
TAPP-09
Foundational Semantics
Foundational viewpoint (pyramid):
Knowledge consists of the explicitly represented knowledgeOnly explicit knowledge can be changedImplicit knowledge is affected indirectly, through the changes in
the explicit knowledge (so that the resulting “pyramid” is “stable”)Explicit knowledge is more important than implicit knowledge
Basic Knowledge
Supported Knowledge
Explicit Knowledge
Implicit Knowledge
23/02/2009 Giorgos Flouris 14
TAPP-09
Coherence Semantics
Coherence viewpoint (raft):
No discrimination between explicit and implicit knowledgeBoth explicit and implicit knowledge can be changedChanges should be made coherently in order for the resulting
knowledge to make sense (so that the “raft” is “stable”)Explicit and implicit knowledge are of the same value
{Knowledge(includes both implicit and explicit knowledge)
23/02/2009 Giorgos Flouris 15
TAPP-09
Deletes
Under coherence semantics
Inferred knowledge needs to be made explicit (when in danger of being lost)
Explicit assignment of shared origin to triples
Explicit shared origin assignment
Cannot use any composition operatorMust be a first-class construct
(autonomous)Retain the connection with its
constituents
A need, but also a useful feature
Paper10
PaperTAPP
Paper
instancerdf:type
subclassrdfs:subClassOf
Giorgos
Author
Person
writes
23/02/2009 Giorgos Flouris 16
TAPP-09
RDF/S Graphsets
Graphsets are like named graphs
Have IDs (URIs)Used in quadruples
—Association of triples with graphsets[Giorgos rdf:type Person PT]
—Can be referred to (metadata)[PT rdf:type Confidential G]
Encode origin or shared origin
[Giorgos rdf:type Person PT]URI association (via skolem function)
—PT is the URI of {PUB, TAPP}
—PUB is the URI of {PUB}
A named graph is a graphset—PUB corresponds to {PUB}
Paper10
PaperTAPP
Paper
instancerdf:type
subclassrdfs:subClassOf
Giorgos
Author
Person
writes
PT
23/02/2009 Giorgos Flouris 17
TAPP-09
Querying With RDF/S Graphsets
Standard queries (original RQL)
Give me the Persons [Giorgos]
Provenance queries (extended RQL)
Give me the Persons per {PUB}[ ]
Give me the Persons per {TAPP, PUB}[Giorgos]
Give me the sources per which Author is a subclass of Person[{PUB}]
Give me all the individual sources[{TAPP}, {PUB}]
Paper10
PaperTAPP
Paper
instancerdf:type
subclassrdfs:subClassOf
Giorgos
Author
Person
writes
23/02/2009 Giorgos Flouris 18
TAPP-09
Validity and Redundancy Elimination
Two invariants for RDF/S graphs
Valid (per some validity rules)Redundant-free (space considerations)
The invariants allow optimized execution of queries
These invariants are imposed during change
Improve query speed, but make updates more difficultTrade-off between having query overhead or update overhead
23/02/2009 Giorgos Flouris 19
TAPP-09
Updating With RDF/S Graphsets
Updates supported through an extended version of RUL
INSERT and DELETEOnly for data (class and property instances)Implicit or explicit knowledgeTake into account and update graphset (provenance) information
Main considerations
Apply the change (INSERT or DELETE)Respect invariants
—Non-redundancy (INSERT) and validity (DELETE)
Make minimal changes (under coherence viewpoint)—No unnecessary loss of information
Take into account and preserve graphset (provenance) information—Applicable upon quadruples
23/02/2009 Giorgos Flouris 20
TAPP-09
Conclusion
Objective: assign provenance information to RDF/S graphs to capture why provenance
Triple-based representation—Turned triples into quadruples and used named graphs to record the origin
Inference (per RDFS)—Composed named graphs
Coherence semantics in updates (deletes)—Used graphsets for composed named graphs (cannot use an operator)
Proposed query and update languages for graphsets
Based on RQL, RULCan be used to query/update provenance informationProvided syntax and semantics, as well as an implementation
—Demo at: http://139.91.183.30:3026/RULdemo/named_graph_demo/
23/02/2009 Giorgos Flouris 21
TAPP-09
23/02/2009 Giorgos Flouris 22
TAPP-09
EXTRA SLIDES
23/02/2009 Giorgos Flouris 23
TAPP-09
RDF/S Graphset Properties
Three types of triples in a graphset:
Explicitly assigned triplesImplicitly assigned triples (from the
constituent named graphs)Implications of the above (per
RDFS)
Paper10
PaperTAPP
Paper
instancerdf:type
subclassrdfs:subClassOf
Giorgos
Author
Person
writes
PT
PT
23/02/2009 Giorgos Flouris 24
TAPP-09
Inserts and Deletes: General Process
INSERT
Validity respectedMust verify non-redundancy
Process
If INSERT is redundant ignore itRemove all redundant
information (after insert)
DELETE
Must verify validityNon-redundancy respectedIssues with inference and the
coherence viewpoint
Process
If DELETE is void ignore itMake explicit all originally
redundant information that will be lost otherwise
Restore validity by removing property instances if necessary