sydney talk-6-2015

63
ovenance and its abstraction – Missier 2015 The PROV data model and simple abstraction over provenance Paolo Missier School of Computing Science Newcastle University, UK University of Sydney June 2015

Upload: paolo-missier

Post on 11-Aug-2015

269 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5

The PROV data model and simple abstraction over provenance

Paolo Missier

School of Computing Science

Newcastle University, UK

University of Sydney

June 2015

Page 2: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5

Part I: The PROV data model(in a nutshell)

Page 3: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Role of provenance

Provenance refers to the sources of information, including entities and processes, involving in producing or delivering an artifact (*)

Provenance is a description of how things came to be, and how they came to be in the state they are in today (*)

• Provenance is evidence in support of clinical diagnosis1. Why do these variants appear in the output list?

2. Why have you concluded they are disease-causing?

• Requires ability to trace variants through workflow execution

• Workflow managers provide this

“Why are these variants included in the results?”

“Why do these two results differ?”

Page 4: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Why does provenance matter?

• To establish quality, relevance, trust

• To track information attribution through complex transformations

• To describe one’s experiment to others, for understanding / reuse

• To provide evidence in support of scientific claims

• To enable process analysis for debugging, improvement,

evolution

See also:

ACM Journal of Data and Information Quality (JDIQ) - Special Issue on Provenance, Data and Information Quality, Paolo Missier, Paolo Papotti, Eds. Volume 5 Issue 3, February 2015DOI: 10.1145/2692312http://dl.acm.org/citation.cfm?id=2700413

Page 5: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5The W3C Working Group on Provenance:

timeline

5

W3CIncubator groupon provenance

Chair: Yolanda Gil, ISI, USC

W3Cworking groupapproved

Chairs: Luc Moreau,Paul Groth

2009-2010

Main output:“Provenance XG Final Report”http://www.w3.org/2005/Incubator/prov/XGR-prov/- provides an overview of the various existing approaches, vocabularies- proposes the creation of a dedicated W3C Working

Group

April, 2011 April, 2013

ProposedRecommendationsfinalised

prov-dm: Data Modelprov-o: OWL ontology, RDF encodingprov-n: prov notationprov-constraints

...plus a number of non-prescriptive Notes

http://www.w3.org/2011/prov/wiki/

Page 6: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5PROV: scope and structure

6

source: http://www.w3.org/TR/prov-overview/

Recommendationtrack

See also:

Moreau, Luc, and Paul Groth. “Provenance: An Introduction to PROV.” Synthesis Lectures on the Semantic Web: Theory and Technology 3, no. 4 (September 15, 2013): 1–129. doi:10.2200/S00528ED1V01Y201308WBE007.

Page 7: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5PROV Core Elements (graph depiction)

7

An entity is a physical, digital, conceptual, or other kind of thing with some fixed aspects; entities may be real or imaginary.

An activity is something that occurs over a period of time and acts upon or with entities; it may include consuming, processing, transforming, ..., using, or generating entities.

An agent is something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent's activity.

Page 8: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Generation, Usage

8

Generation is the completion of production of a new entity by an activity. This entity did not exist before generation and becomes available for usage after this generation.

Usage is the beginning of utilizing an entity by an activity. Before usage, the activity had not begun to utilize this entity

PROV is based on a notion of instantaneous events, that mark transitions in the world

- generation, usage (and others)

Ordering constraints amongst events:

“generation of e must precede each of usages”

“a can only use / generate e after it has started and before it has ended”

Page 9: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Concepts and relations

9

Generation of “draft v1” expressed as relation:

wasGeneratedBy(“draft v1”, ...)

Usage of “draft v1” by “commenting” expressed as relation:

used(“commenting, “draft v1”,...)

Page 10: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5PROV notation

10

document

prefix prov <http://www.w3.org/ns/prov#>prefix ex <http://www.example.com/>

entity(ex:draftComments)entity(ex:draftV1, [ ex:distr='internal', ex:status = "draft"])entity(ex:paper1)entity(ex:paper2)

activity(ex:commenting)activity(ex:drafting)wasGeneratedBy(ex:draftComments, ex:commenting, 2013-03-18T11:10:00)used(ex:commenting, ex:draftV1, -)wasGeneratedBy(ex:draftV1, ex:drafting, -)used(ex:drafting, ex:paper1, -)used(ex:drafting, ex:paper2, -)

endDocument

Page 11: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Same example — PROV-O notation (RDF/N3)

11

:draftComments a prov:Entity ; :distr "internal"^^xsd:string ; prov:wasGeneratedBy :commenting .

:commenting a prov:Activity ; prov:used :draftV1 .

:draftV1 a prov:Entity ; :distr "internal"^^xsd:string ; :status "draft"^^xsd:string ; :version "0.1"^^xsd:string ; prov:wasGeneratedBy :drafting .

:drafting a prov:Activity ; prov:used :paper1, :paper2 .

:paper1 a prov:Entity, "reference"^^xsd:string .

:paper2 a prov:Entity, "reference"^^xsd:string .

Page 12: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Association, Attribution, Delegation: who did what?

12

An activity association is an assignment of responsibility to an agent for an activity, indicating that the agent had a role in the activity.

Attribution is the ascribing of an entity to an agent.

entity(ex:draftComments, [ ex:distr='internal' ])activity(ex:commenting)agent(ex:Bob, [prov:type = "mainEditor"] )agent(ex:Alice, [prov:type = "srEditor"])

wasAssociatedWith(ex:commenting, Bob, -, [prov:role = "editor"])actedOnBehalfOf(Bob, Alice)wasAttributedTo(ex:draftComments, ex:Bob)

Page 13: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Same example — PROV-O notation (RDF/N3)

13

:Alice a prov:Agent, "ex:chiefEditor"; :firstName "Alice"; :lastName "Cooper".

:Bob a prov:Agent, "ex:seniorEditor"; :firstName "Robert"; :lastName "Thompson"^; prov:actedOnBehalfOf :Alice .

:draftComments prov:wasAttributedTo :Bob .:drafting a prov:Activity ; prov:wasAssociatedWith :Bob .

Page 14: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Association and Attribution

14

Q.: what is the relationship between attribution and association?

This is defined as an inference rule in the PROV-CONSTR document

entity(e)agent(Ag)activity(a)

wasAttributedTo(e, Ag)wasGeneratedBy(e, a) wasAssociatedWith(a, Ag)

Page 15: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Communication amongst activities

15

Communication is the exchange of some unspecified entity by two activities, one activity using some entity generated by the other.

activity(ex:commenting)activity(ex:drafting)

wasInformedBy(ex:commenting, ex:drafting)

:drafting a prov:Activity .

:commenting a prov:Activity ; prov:wasInformedBy :drafting .

Page 16: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Communication, generation, usage

16

activity(ex:commenting)activity(ex:drafting)entity(e)wasInformedBy(ex:commenting, ex:drafting)wasGeneratedBy(e,ex:drafting)used(ex:commenting, e)

Q.: what is the relationship between communication, generation, and usage?

This are inference rules 5 and 6 in the PROV-CONSTR document

Page 17: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Summary of the PROV Core model

17

Page 18: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Derivation amongst entities

18

A derivation is a transformation of an entity into another, an update of an entity resulting in a new one, or the construction of a new entity based on a pre-existing entity.

entity(ex:draftV1)entity(ex:draftComments)wasDerivedFrom(ex:draftComments, ex:draftV1)

Q.: what is the relationship between derivation, generation, and usage?

:draftComments a prov:Entity ; prov:wasDerivedFrom :draftV1 .

:draftV1 a prov:Entity .

Page 19: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Relations may be given identifiers

19

entity(ex:draftComments)entity(ex:draftV1)activity(ex:commenting)wasGeneratedBy(gen1; ex:draftComments, ex:commenting, -)used(use1; ex:commenting, ex:draftV1, -)

gen1 denotes a generation event

use1 denotes a usage event

wasDerivedFrom(id; e2, e1, a, g2, u1, attrs)

General derivation relation:

Relation IDs make it possible to refer to relations in other relations

Page 20: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Rendering N-ary relations in PROV-O

20

RDF is for binary relations —- N-ary relations require reification

entity(ex:draftComments)entity(ex:draftV1)activity(ex:commenting)wasGeneratedBy(gen1; ex:draftComments, ex:commenting, 2013-03-18T10:00:01)used(use1; ex:commenting, ex:draftV1, -)

:draftComments a prov:Entity ; prov:qualifiedGeneration :gen1 .

:gen1 a prov:Generation ; prov:activity :commenting; prov:atTime “2013-03-18T10:00:01+09:00".

:commenting a prov:Activity ; prov:qualifiedUsage :use1 .

:use1 a prov:Usage ; :note "found comments useful"; prov:atTime "2013-03-21T10:00:01+09:00"; prov:entity :draftV1.

Page 21: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5“Qualified relation” RDF pattern

21

:draftComments a prov:Entity ; prov:qualifiedGeneration :gen1 .

:gen1 a prov:Generation ; prov:activity :commenting; prov:atTime “2013-03-18T10:00:01+09:00".

:commenting a prov:Activity ; prov:qualifiedUsage :use1 .

:use1 a prov:Usage ; :note "found comments useful"; prov:atTime "2013-03-21T10:00:01+09:00"; prov:entity :draftV1.

Page 22: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Plans — why was something done?

22

Most relation types have two arguments which are { Entity, Activity, Agent}

Derivation is one exception:

wasDerivedFrom(id; e2, e1, a, g2, u1, attrs)

Two other notable exceptions: - Associations with a plan- Delegation with an activity scope

wasAssociatedWith(id; a, ag, pl, attrs)

A plan is an entity that represents a set of actions or steps intended by one or more agents to achieve some goal

Page 23: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Association with a plan

23

A plan plays a role in an association

Page 24: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Plans are typed entities

24

activity(ex:_aProgramExecution, [ex:execTime="22.5sec"])agent(ex:_aJVM, [prov:type = “JVM-6.0”])entity(ex:myCleverProgram, [prov:type='prov:Plan', ex:label="Program 1"])

wasAssociatedWith(ex:_aProgramExecution, ex:_aJVM, ex:myCleverProgram, [prov:role="defaultRuntime", ex:accessPath="webapp" ])

A plan is an entity having prov:type = “prov:plan”

Page 25: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Plan pattern as PROV-O

25

:_aProgramExecution a prov:Activity ; :execTime "22.5sec; prov:qualifiedAssociation [ a prov:Association ; :accessPath "webapp"; prov:agent :_aJVM ; prov:hadPlan :myCleverProgram ; prov:hadRole "defaultRuntime"] .

:_aJVM a prov:Agent, “Java-6.0".

:myCleverProgram a prov:Entity, prov:Plan.

activity(ex:_aProgramExecution, [ex:execTime="22.5sec"])agent(ex:_aJVM, [prov:type = “JVM-6.0”])entity(ex:myCleverProgram, [prov:type='prov:Plan', ex:label="Program 1"])

wasAssociatedWith(ex:_aProgramExecution, ex:_aJVM, ex:myCleverProgram, [prov:role="defaultRuntime", ex:accessPath="webapp" ])

Page 26: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Plan pattern as PROV-O

26

:_aProgramExecution a prov:Activity ; :execTime "22.5sec; prov:qualifiedAssociation [ a prov:Association ; :accessPath "webapp"; prov:agent :_aJVM ; prov:hadPlan :myCleverProgram ; prov:hadRole "defaultRuntime"] .

:_aJVM a prov:Agent, “Java-6.0".

:myCleverProgram a prov:Entity, prov:Plan.

Page 27: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Delegation within an activity scope

27

Page 28: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Real-world artifacts vs provenance entities

28

ref: http://www.w3.org/2001/sw/wiki/PROV-FAQ#Examples_of_Provenance

“What do I know about the car I see in this Cambridge street today?”

•It was bought by Joe in 2011

•Joe drove it to Boston on March 16th, 2013. The car has now got 10,000 miles on it

•Joe drove it to Cambridge on March 18th, 2013.

“Same” car, but different provenance at each stage of its evolution

To Core Elements

Page 29: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Alternate-specialization pattern

29

Two alternate entities present aspects of the same thing. These aspects may be the same or different, and the alternate entities may or may not overlap in time.

An entity that is a specialization of another shares all aspects of the latter, and additionally presents more specific aspects of the same thing as the latter.

...But, this is still that car!

Semantic notes:1. Specialization implies alternate: IF specializationOf(e1,e2) THEN alternateOf(e1,e2).2. Alternate is symmetric: IF alternateOf(e1,e2) THEN alternateOf(e2,e1)

3. Specialization is transitive: IF specializationOf(e1,e2) and specializationOf(e2,e3) THEN specializationOf(e1,e3).

To Core Elements

differing in their location

same owner, added location

Page 30: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Reserved attributes and types

30

A small set of reserved attributes, with some usage restrictions

Page 31: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Bundles, provenance of provenance

31

A bundle is a named set of provenance descriptions, and is itself an entity, so allowing provenance of provenance to be expressed.

bundle pm:bundle1

entity(ex:draftComments)entity(ex:draftV1)

activity(ex:commenting)wasGeneratedBy(ex:draftComments, ex:commenting,-) used(ex:commenting, ex:draftV1, -)endBundle...entity(pm:bundle1, [ prov:type='prov:Bundle' ])wasGeneratedBy(pm:bundle1, -, 2013-03-20T10:30:00)wasAttributedTo(pm:bundle1, ex:Bob)

Page 32: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5

Page 33: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Bundles in PROV-O

33

Bundle definition (an RDF named graph):

ex:bundle1 { :draftComments a prov:Entity ; :status “blah"; prov:wasGeneratedBy :commenting .

:commenting a prov:Activity ; prov:used :draftV1 .

:draftV1 a prov:Entity .}

Bundle usage:

ex:bundle1 a prov:Entity, "prov:Bundle"; prov:qualifiedGeneration [ a prov:Generation ; prov:atTime “2013-03-20T10:30:00+09:00" ]; prov:wasAttributedTo :Bob .

Page 34: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Time, Events

34

wasStartedBy(id; a2, e, a1, t, attrs)

wasEndedBy(id; a2, e, a1, t, attrs)

Instead, the PROV data model is implicitly based on a notion of instantaneous events, that mark transitions in the world (*)

(*) PROV-CONSTR http://www.w3.org/TR/prov-constraints/#events (non-normative)

Events:

- activity start, activity end,

- entity generation , entity usage, entity invalidation

- Provenance statements are combined by different systems

- An application may not be able to align the times involved to a single global timeline

Therefore, PROV minimizes assumptions about time

Page 35: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5From “scruffy” provenance to “valid” provenance

35

- Are all possible temporal partial ordering of events equally acceptable?- How can we specify the set of all valid orderings?

More generally, how do we formally define what it means for a set of provenance statements to be valid?

PROV defines a set of temporal constraints that ensure consistency of a provenance graph

Page 36: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5

Part II – ProvAbs:model and policy for abstracting PROV graphs

Initial paper:

Missier, Paolo, Jeremy Bryans, Carl Gamble, Vasa Curcin, and Roxana Danger. ProvAbs: Model, Policy, and Tooling for Abstracting PROV Graphs. In Procs. IPAW 2014 (Int’l conference on Provenance and Annotations). Koln, Germany: Springer, 2014.

Page 37: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Initial project motivation: provenance of intelligence information

Consumer: • Motivated to acquire and act upon analysis But: expect support evidence, mitigate risk of acting upon inaccurate information

Provider:• Motivated to provide accurate analysis to Public Agencies • Enhance communication using provenance metadata for evidenceBut: cannot fully disclose sources, analysis methods, etc.

Page 38: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Provenance-enhanced information exchange

Page 39: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Provenance obfuscation

Objective:to provide Release Authorizers with tools for achieving a balance between a their sensitivity requirements and the receiver’s utility requirements

Ground rule for obfuscation: information can be removed, but no false evidence can be introduced

Page 40: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Provenance abstraction using ProvAbs

What:• Abstraction model for PROV• Policy model and language to drive the abstraction• Implementation: the ProvAbs tool

Why: • To enable data exchanges with partial disclosure of the data

provenance• To simplify understanding of provenance traces by humans

How:• Graph rewriting, from valid PROV to valid PROV

• A node grouping operator

Page 41: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5History of an analyst’s report

• Delegation• Roles• Clearance levels as

node properties

Entities: documents, reports, datasets. “Things”

Activities performed on Entities: human actions, or programsconsume, produce

Page 42: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5A closer look

Page 43: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

51 – Define policy to assign sensitivity to graph nodes

list classifications[protect, restricted, confidential, secret, topSecret];

for all (activity used data) where (data.Status > confidential in classifications)

setSensitivity(activity, 7);for all (activity used data) where (data.Status <= confidential in classifications) setSensitivity(activity, 5);

Page 44: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

52- Node selection

Select nodes for abstraction based on the receiver’s clearance level

7 7 7

5✔

︎ ✗︎ ✗ ︎ ✗ ︎ ✗

Page 45: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5 3- Abstraction

Apply abstraction operator

7 7 7

5✔

︎ ✗︎ ✗ ︎ ✗ ︎ ✗

Page 46: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Abstracting over sets of nodes

General abstraction idea: replace a group of (possibly non-contiguous) nodes with a new node

Page 47: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Abstraction must preserve provenance validity

Possible policy goals:• remove the activities for which David was possibly indirectly responsible”• e.g. “collapse all data derivation chains that do/do not include activity ‘analytics2’ ”• e.g. “remove all references to activities “consolidate-*”

If these are implemented simply as “cut and paste” from the graph, the result may not be a valid provenance graph in general

Page 48: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Intuition for correct abstraction

Page 49: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Incorrect abstraction -- example

• Cycles• Incorrect edge types

Problem: “non-convexity”

Page 50: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Criteria for abstraction

1. No new generation-usage cycles

2. No new dependencies

3. Satisfy type constraints on relationships

but: ok to remove some dependencies

Convexity by closure

Replacement, rewiring

Page 51: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Convexity by path closure

Page 52: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Replacement , rewiring

Page 53: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Extension – restore type correctness

Page 54: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5t-grouping

Nodes in the grouping set can be a mix of Entities or Activities

• When all boundary nodes are of the same type: grouping creates a node of that type

• e-grouping: new Entity node• a-grouping: new Activity node

• Boundary nodes of mixed types: grouping can introduce a node of either type

t-grouping: creates new node of type t { En, Act }∈

Note:Grouping is commutative and closed wrt composition

Page 55: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5t-grouping

a-grouping e-grouping

Page 56: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Mapping concrete to abstract events

Abstract graph nodes should be characterised by abstract events

• Generation is the completion of production of a new entity (PROV-DM Sec. 5.1.3)• Usage is the beginning of utilizing an entity (PROV-DM Sec. 5.1.4).

g’ = max { g1, g2 } u’ = min { u3, u4 }

Page 57: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Usage-follows-generation

Abstract graphs with abstract usage-generation events correspond to a specific class of base graphs with pattern:

<all generations> -- <all usages>

All generation events for all ei must precede all usage events for all ei.

Given a grouping set of entities{e1…en}

such that:

ei wasGeneratedBy aor

a used ei:

Page 58: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5The ProvAbs tool

• A tool to let a policy designer explore partial disclosure options• by experimenting with policy settings and clearance thresholds.

• Accepts graphs in PROV-N format• Policy specified interactively, or loaded from file

Page 59: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Related work: Views over (workflow) provenance

Specific setting: e-science workflows.Retrospective provenance: the provenance graph is a trace of dataflow executionProspective provenance: the workflow specification itself!

Biton, O, S Cohen-Boulakia, and S B Davidson. “Zoom*UserViews: Querying Relevant Provenance in Workflow Systems.” In VLDB, 1366–1369, 2007 (Demo paper)

Biton, Olivier, S Cohen-Boulakia, S B Davidson, C S Hara, C, S Cohen Boulakia, S B Davidson, and C S Hara. “Querying and Managing Provenance through User Views in Scientific Workflows.” In ICDE, 1072–1081, 2008. doi:http://dx.doi.org/10.1109/ICDE.2008.4497516.

Relies heavily on assumption of available prospective provenance

Page 60: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Views over workflows Views over provenance

Page 61: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5ProPub: provenance editing for publication

Dey, Saumen, Daniel Zinn, and Bertram Ludäscher. “ProPub: Towards a Declarative Approach for Publishing Customized, Policy-Aware Provenance.” In Scientific and Statistical Database Management, edited by Judith Bayard Cushing, James French, and Shawn Bowers, 6809:225–243. Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-22351-8_13.

Page 62: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5Provenance redaction

Cadenhead, Tyrone, Vaibhav Khadilkar, Murat Kantarcioglu, and Bhavani Thuraisingham. “Transforming Provenance Using Redaction.” In Proceedings of the 16th ACM Symposium on Access Control Models and Technologies, 93–102. SACMAT ’11. New York, NY, USA: ACM, 2011. doi:10.1145/1998441.1998456.

Graph redaction approach based on graph grammars

Approach:

• Apply§ a set of redaction policies, which involves a series of graph transformation steps:

• At each step, a policy specifies how to replace a sensitive subset of the graph with another subgraph

• Graph operations: vertex contraction, edge contraction, path contraction, node relabeling

Possible limitation: no proof of validity preservation

Page 63: Sydney talk-6-2015

Pro

vena

nce

and

its a

bstr

actio

n –

Mis

sier

201

5

TAPP'15

7th International Workshop on Theory and Practice of Provenance Edinburgh, Scotland July 8-9, 2015 In cooperation with USENIX and ACM SIGPLAN and SIGMOD