why & where - technionkanza/dbseminar/2011/whywhere.… · olap (online analytical processing)...
TRANSCRIPT
![Page 1: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/1.jpg)
Why & Where A characterization of Data Provenance
Authors: Peter Buneman, Sanjeev Khanna, and Wang Chiew-Tan
Presented by: Tamra Reutlinger
![Page 2: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/2.jpg)
Example
Name, Id, address Id, Telephone
Telephone Name
:
("John Doe",1234) Not Valid!
![Page 3: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/3.jpg)
Example
Name, Id, address Id, Telephone
Telephone Name
:
("John Doe",1234)
Id, Telephone
Valid
D1 D2
![Page 4: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/4.jpg)
The Two Meanings Of Provenance
Why – why is the tuple in our View?
Where – where did the data “1234” come from? (what path did it go
through to get here)
![Page 5: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/5.jpg)
Importance of Finding Provenance
Sources of different qualities
Scientific databases
On-line monitoring
OLAP (online analytical processing)
Provenance = Lineage =
מקור, מוצא, שושלת יוחסין, אילן יוחסין
![Page 6: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/6.jpg)
Goal
Computing provenance
A syntactic approach
A general data model
![Page 7: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/7.jpg)
Outline:
Introduction to data provenance
A deterministic model
Syntax & operations
Encoding relations
A Query language
Why provenance
Where provenance
Conclusion
![Page 8: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/8.jpg)
Example
Name, Id, address Id, Telephone
:
Telephone Name
("John Doe",1234)
Id, Telephone
Valid
D1
D2
{Id:1}
a b c
a 3 “a”
{Id:3}
{Id:2}
num
109
“Where.. Collect..”
![Page 9: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/9.jpg)
Edge-labeled Tree Models For
Semi-structured Data
The labels of each node are
distinct
semi-structured data
{Id:1}
a b c
a 3 “a”
{Id:3}
{Id:2}
num
109
![Page 10: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/10.jpg)
Syntax & Operations
{x1:y1, x2:y2} x1
y1
x2
y2
![Page 11: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/11.jpg)
Paths: x1.x2....xn
Example: the path {Id:1} identifies the value {Name:"Kim", Rate:50}
the path {id:1}.rate identifies the value 50
![Page 12: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/12.jpg)
Path representation of v
The set of all the paths to the constants
At the terminal nodes.
{a:{1:c,3:d}}
=>
{(a.1,c),(a.3,d)}
{c:3} b
1 2
1 3
c d
e1 a
![Page 13: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/13.jpg)
Substructure
:{1: ,3: } :{1: ,2 : ,3: }a c d a c b d:{1: ,3: } . :{1: ,3: }a c d b a c d
1 3
c d
a
a
2
b
w v
Path representation of w is a subset of the path representation of v
![Page 14: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/14.jpg)
Deep Union
v1 U v2 is the union of the path representations
of v1 and v2
c
2
a b
1 d
4
b
5
e
c
2
a b
1 d
4
5
e
v1 v1 U v2 v2
![Page 15: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/15.jpg)
Deep Union
The result may not be a partial function in which case the deep union is undefined.
c
2
a b
1 c
3
b
5
e ?!? C:2 or C:3
?!?
v1 v2
![Page 16: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/16.jpg)
Outline:
Introduction to data provenance
A deterministic model
Syntax & operations
Encoding relations
A Query language
Why provenance Where provenance
Conclusion
![Page 17: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/17.jpg)
Encoding of Relations
1685
{name:”G.F Handel}
Born period
“baroque”
{name:”J.S Bach”}
{name:”W.A Mozart”}
Born period
1685 “baroque”
Born period
1756 “classical”
Composers
{name:”G.F Handel}
{Opus:”BMV82”}
“Ihave enough”
{name:”J.S Bach”}
Works
title
“-” “art thou troubled?”
title title
{Opus:”BMV552”} {Opus:”HMV19”}
![Page 18: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/18.jpg)
Encoding of Relations
Relation Key Tuple
______________
________________
________________
______________
______________
________________
__________________________
__________________________
___________________________
_________________________
____________
___________________________
____________
___________
___________
![Page 19: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/19.jpg)
Outline:
Introduction to data provenance
A deterministic model
Syntax & operations
Encoding relations
A Query language
Why provenance Where provenance
Conclusion
![Page 20: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/20.jpg)
Example
Name, Id, address Id, Telephone
:
Telephone Name
("John Doe",1234)
Id, Telephone
Valid
D1
D2
{Id:1}
a b c
a 3 “a”
{Id:3}
{Id:2}
num
109
?
![Page 21: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/21.jpg)
A Query Language
A general syntactic form:
1 1 ,
:
,
( )
n n
where p e
p e
condition
collect e
![Page 22: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/22.jpg)
Example
s . . : ,
1700
{ : }:
where composers x born u D
u
collect year u C
{year:1685}:C x2
![Page 23: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/23.jpg)
Example
Q=
( .{ : }.{ : , : } ,
.{{ : }. : }: )
{ : }.{ : ,{ : }: }
where Composers name x born u period v D
Works name x opus w y D
collect name x born u opus w y
{
{ :" . . "}.{ :1685,{ :" 82"}:" "},
{ :" . . "}.{ :1685,{ :" 552"}:" "},
{ :" . . "}.{ :1685,{ :" 19"}:" ?"}
}
e
name J S Bach born opus BMV I haveenough
name J S Bach born opus BMV
name G F Handel born opus HMV Art thoughtroubled
![Page 24: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/24.jpg)
Example
1685
{name:”G.F Handel}
Born period
“baroque”
{name:”J.S Bach”}
{name:”W.A Mozart”}
Born period
1685 “baroque”
Born period
1756 “classical”
Composers
{name:”G.F Handel}
{Opus:”BMV82”}
“Ihave enough”
{name:”J.S Bach”}
Works
title
“-” “art thou troubled?”
title title
{Opus:”BMV552”} {Opus:”HMV19”}
1685
{name:”G.F Handel}
Born
{name:”J.S Bach”}
Born
1685
{Opus:”BMV82”}
“Ihave enough” “-” “art thou troubled?”
{Opus:”BMV552”} {Opus:”HMV19”}
![Page 25: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/25.jpg)
“Collect..“ -How?
For each pi and each assignment of the variables in pi, evaluate the condition
True? -add the value of e to the output.
”Union” together the output values.
( | |{ : }| | )collect e e e e e c xc – constants X – variables
![Page 26: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/26.jpg)
Example
?
. : . : ,
. : . :
: . _ :
where Emps Id x salary y D
Emps Id y bonus z D
collect Id x new salary y
![Page 27: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/27.jpg)
Well-Formed Queries
Q is well-formed if:
a) No pi is a single variable
b) Each ei is either a (nested) query or an expression that doesn’t involve a query
c) Each comparison is between variables or between variables and constants only.
soundness of rewrite rules
x DX
. .
. . : . ,
. :
S t u D
where R x y z Dt u
collect x y z
V 1700u V
![Page 28: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/28.jpg)
Well-Defined Queries
A query may be undefined on a certain input.
Q is Well-Defined if it is defined on any input.
- For the rest of the presentation, we will consider only queries that are both well-formed and well-defined.
![Page 29: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/29.jpg)
Singular Expression
A single path terminated by a constant or variable
and for any non-empty and
distinct expressions e1 and e2
1 2( )e e e
{Id:1}
a b c
a 3 “a”
{Id:3}
{Id:2}
num
109
![Page 30: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/30.jpg)
Normal Form
Q = Q1 U..U Qn and each Qi=
Spi and se - singular pattern and singular expression respectively.
Di - database constant
condition - Boolean predicate on the variables of
the query.
1 1( ,.., , ) ( )n nwhere sp D sp D condition collect se
![Page 31: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/31.jpg)
Strong Normalization
The rewrite system R is strongly normalizing
Therefore:
Well-formed query
any sequence of application of rewrite rules
Normal form
In a finite number of steps!
![Page 32: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/32.jpg)
Outline:
Introduction to data provenance
A deterministic model
Why provenance (syntactic characterization and invariance under query rewriting)
Where provenance
Conclusion
![Page 33: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/33.jpg)
Why Is The Tuple In Our View?
Name, Id, address Id, Telephone
:
Telephone Name
("John Doe",1234)
Id, Telephone
Valid
D1
D2
{Id:1}
a b c
a 3 “a”
{Id:3}
{Id:2}
num
109
“Where.. Collect..”
![Page 34: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/34.jpg)
Witnesses
The collection of values taken from D that proves an output.
s is a witness for t with respect to Q and D if:
t Q(s) and s D
![Page 35: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/35.jpg)
Example
Q=
( .{ : }.{ : , : } ,
.{{ : }. : }: )
{ : }.{ : ,{ : }: }
where Composers name x born u period v D
Works name x opus w y D
collect name x born u opus w y
{
{ :" . . "}.{ :1685,{ :" 82"}:" "},
{ :" . . "}.{ :1685,{ :" 552"}:" "},
{ :" . . "}.{ :1685,{ :" 19"}:" ?"}
}
e
name J S Bach born opus BMV I haveenough
name J S Bach born opus BMV
name G F Handel born opus HMV Art thoughtroubled
![Page 36: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/36.jpg)
Example
1685
{name:”G.F Handel}
Born period
“baroque”
{name:”J.S Bach”}
{name:”W.A Mozart”}
Born period
1685 “baroque”
Born period
1756 “classical”
Composers
{name:”G.F Handel}
{Opus:”BMV82”}
“Ihave enough”
{name:”J.S Bach”}
Works
title
“-” “art thou troubled?”
title title
{Opus:”BMV552”} {Opus:”HMV19”}
1685
{name:”G.F Handel}
Born
{name:”J.S Bach”}
Born
1685
{Opus:”BMV82”}
“Ihave enough” “-” “art thou troubled?”
{Opus:”BMV552”} {Opus:”HMV19”}
![Page 37: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/37.jpg)
Example - {name:"G.F Handel“}.born:1685
1685
{name:”G.F Handel}
Born period
“baroque”
{name:”J.S Bach”}
{name:”W.A Mozart”}
Born period
1685 “baroque”
Born period
1756 “classical”
Composers
{name:”G.F Handel}
{Opus:”BMV82”}
“Ihave enough”
{name:”J.S Bach”}
Works
title
“-” “art thou troubled?”
title title
{Opus:”BMV552”} {Opus:”HMV19”}
1685
{name:”G.F Handel}
Born
{name:”J.S Bach”}
Born
1685
{Opus:”BMV82”}
“Ihave enough” “-” “art thou troubled?”
{Opus:”BMV552”} {Opus:”HMV19”}
![Page 38: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/38.jpg)
Witnesses
{Composers.{name:"G.F. Handel"}.{born:1685, period:"baroque"},
Works.{{name:"G.F. Handel"}.opus:"HMV19"}.title:"Art thou troubled?"}
{name:"G.F Handel“}.born:1685
witnesses
![Page 39: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/39.jpg)
Example – Witness Basis
1685
{name:”G.F Handel}
Born period
“baroque”
{name:”J.S Bach”}
{name:”W.A Mozart”}
Born period
1685 “baroque”
Born period
1756 “classical”
Composers
{name:”G.F Handel}
{Opus:”BMV82”}
“Ihave enough”
{name:”J.S Bach”}
Works
title
“-” “art thou troubled?”
title title
{Opus:”BMV552”} {Opus:”HMV19”}
{name:”G.F Handel}
Born period
1685 “baroque”
Composers
{name:”G.F Handel}
Works
“art thou troubled?”
title
{Opus:”HMV19”}
![Page 40: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/40.jpg)
Witness Basis - WQ,D(t)
t=t1 U t2
WQ,D(t1) WQ,D(t2) WQ,D(t) U
Q=Q1 U Q2
Q2(D) Q1(D)
WQ1,D(t) WQ2,D(t) U
Q (D)
WQ,D(t)
The set of all witnesses
for a value t in Q(D)
![Page 41: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/41.jpg)
Witness Basis
Lemma 1: If Q ~> Q’ via the rewrite system R, then for any
value t in the output of Q(D), WQ,D(t)=WQ’,D(t)
Q - well formed
Q(D) Q(D)
WQ,D(t) WQ’,D(t) =
Q’ - normal form ~>
![Page 42: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/42.jpg)
Algorithm: Why(t,Qi,D)
D
1 1,.., ,n np e p e condition
1 1' .. n np e p e
' " ( ) ( ') : "iQ where collect C
1 1( ,.., , ) ( )
n ni i i i i iQ where p e p e condition collect e
( ) ?ie t
סימונים
t
![Page 43: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/43.jpg)
Minimal Witness Basis
A witness for a value is invariant under all equivalent queries but the witness basis is not.
The minimal witness basis is invariant under certain queries
![Page 44: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/44.jpg)
Minimal Witness, Minimal Witness Basis
s is a minimal witness for t if:
MQ,D(t) - The minimal witness basis for t,
is a maximal subset of WQ,D(t) such that:
' , ( ').s s t Q s
, ,( ), ( ); .Q D Q Dm M t w W t w m
![Page 45: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/45.jpg)
Example - 1685
1685
{name:”G.F Handel}
Born period
“baroque”
{name:”J.S Bach”}
{name:”W.A Mozart”}
Born period
1685 “baroque”
Born period
1756 “classical”
Composers
{name:”G.F Handel}
{Opus:”BMV82”}
“Ihave enough”
{name:”J.S Bach”}
Works
title
“-” “art thou troubled?”
title title
{Opus:”BMV552”} {Opus:”HMV19”}
{name:”G.F Handel}
Born period
1685 “baroque”
Composers
{name:”G.F Handel}
Works
“art thou troubled?”
title
{Opus:”HMV19”}
![Page 46: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/46.jpg)
Example - 1685
1685
{name:”G.F Handel}
Born period
“baroque”
{name:”J.S Bach”}
{name:”W.A Mozart”}
Born period
1685 “baroque”
Born period
1756 “classical”
Composers
{name:”G.F Handel}
{Opus:”BMV82”}
“Ihave enough”
{name:”J.S Bach”}
Works
title
“-” “art thou troubled?”
title title
{Opus:”BMV552”} {Opus:”HMV19”}
Not a proof tree For value!!!
{name:”G.F Handel}
Born period
1685 “baroque”
Composers
{name:”G.F Handel}
Works
“art thou troubled?”
title
{Opus:”HMV19”}
![Page 47: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/47.jpg)
Invariance of Minimal Witness Basis
under Equivalent queries
Q, Q’ - two equivalent well-formed queries
t is in Q(D) and Q’(D)
Then; MQ,D(t) = MQ’,D(t)
![Page 48: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/48.jpg)
D=D1U..UDn, V=V(D). For a value t in Q(D,V),
where Q’ is the rewritten query via our rewrite
system R in which view V has been “composed out".
Cascaded Witnesses (Query Composition)
Unnesting of Witnesses
Q’,D ,{ , ( )}
,
W t { ' | ( ') ( ),
' is the value taken from view V D , ' ( ')}
Q D V D
V D
w w w v W t
v w W v
![Page 49: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/49.jpg)
Outline:
Introduction to data provenance
A deterministic model
Why provenance
Where provenance (problems defining,
invariance under query rewriting)
Conclusion
![Page 50: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/50.jpg)
Reminder:
So far we have looked at what pieces of input data validate the existence of an output value. (why provenance)
We now focus on identifying what pieces of input data helped create values that appear in the output. (where provenance)
![Page 51: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/51.jpg)
Example - 1685
{name:”G.F Handel}
Born period
1685 “baroque”
Composers
{name:”G.F Handel}
Works
“art thou troubled?”
title
{Opus:”HMV19”}
{name:”G.F Handel}
Born
Composers
Witness basis
Where Provenance
There are many difficulties involved in formalizing this
![Page 52: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/52.jpg)
Invariance Over Equivalent Queries
Looking for employees with a salary of 50$
where Emps.{Id:x}.salary:$50 D, collect {Id:x}.salary:$50
where Emps.{Id:x}.salary:y D, y = $50
collect {Id:x}.salary:y
!
where Emps.{Id:x}.salary:$50 D, collect {Id:x}.salary:$50
y = $50K
What is the where- Provenance of 50$?
![Page 53: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/53.jpg)
Multiple Pieces of Data
where Emps.{Id:x}.salary:y D, Emps.{Id:x}.salary:z D, Emps.{Id:x}.bonus:z D
collect {Id:x}.new salary:y
where Emps.{Id:x}.salary:y D, Emps.{Id:x}.bonus:y D
collect {Id:x}.new_salary:y
New_salary is tracked
by y
New_salary is tracked
by y and z?
![Page 54: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/54.jpg)
Nested Queries
where R.x.y : z D, S.x.y : z D collect x.y : z
where R.x.y : z D, S.t.u D, t:u collect {x.y : z, t : u}
where R.x.y : z D ,
collect x.y : z
{R.1.2:3,S.1.2:3}
D Output: 1.2:3
Where provenance: {R.1:2,S.1:2}
Where provenance: {R.1:2,S.1:2}
t:u
{1.2:3,1.2:3}
=>u = y:z
where R.x.y : z D ,
collect x.y : z
![Page 55: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/55.jpg)
Traceable Queries
A restricted class of queries, for which where-provenance is preserved under rewriting.
![Page 56: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/56.jpg)
Example - {name:"G.F Handel“}.born:1685
{name:”G.F Handel}
Born period
1685 “baroque”
Composers
{name:”G.F Handel}
Works
“art thou troubled?”
title
{Opus:”HMV19”}
{name:”G.F Handel}
Born
Composers
Witness basis
Where Provenance
![Page 57: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/57.jpg)
Derivation Basis (Where Provenance)
The derivation basis for l:v finds a variable x in the output expression that will generate v.
1685
{name:”G.F Handel}
Born period
“baroque”
{name:”J.S Bach”}
{name:”W.A Mozart”}
Born period
1685 “baroque”
Born period
1756 “classical”
Composers
. . : ,
1700
{ : }:
where composers x born u D
u
collect year u C
{year:1685}:C x2
![Page 58: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/58.jpg)
Where(l:v,Q,D)
Computes the derivation basis of l:v.
The “collect" clause of the new query returns two things:
the patterns
the paths
pointing to x in the “where" clause of Q
, 0( : , , ) ( : ) {([[ ]] .. [[ ]] , )}Q D nWhere l v Q D l v p p S
![Page 59: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/59.jpg)
Derivation Basis
100
{Id:3}
bonus salary
2000
{Id:1}
{Id:2}
bonus salary
300 1900
bonus salary
17 1700
Emps
. : . : ,
. : . :
: . _ :
where Emps Id x salary y D
Emps Id x bonus y D
collect Id x new salary y
{Id:1}.new_salary:2100 {Id:2}.new_salary:1717 {Id:3}.new_salary:2200
1p
2p
1( ) . :1 . : 2000 p Emps Id salary D
, 1 2( : ) ( ( ) ( ), . :1 .{ , })}Q D l v p p Emps Id salary bonus
![Page 60: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/60.jpg)
Derivation Basis - , ( : )Q D l v
Q=Q1 U Q2
Q2(D) Q1(D)
U 1 , ( : )Q D l v2 , ( : )Q D l v , ( : )Q D l v
v is an atomic value
![Page 61: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/61.jpg)
Q Is Traceable If:
1) each pi in the query matches either against some database constant or against a sub-query
2) every sub-query is a view which does not share
any variables with the outer scope
3) only a singular pattern is allowed to match
against a sub-query
4) the pattern and output expression of the sub-
query consist of a sequence of distinct variables and have the same length.
![Page 62: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/62.jpg)
Propositions
Proposition 1:
Proposition 2:
for any l:v in the output of Q(D)
Q - traceable Q’ - traceable Q ~> Q’
Q - traceable Q ~> Q’ , ',( : ) ( : )Q D Q Dl v l v
![Page 63: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/63.jpg)
Outline:
Introduction to data provenance
A deterministic model
Why provenance
Where provenance
Conclusion
![Page 64: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/64.jpg)
Why Is The Tuple In Our View?
Name, Id, address Id, Telephone
:
Telephone Name
("John Doe",1234)
Id, Telephone
Valid
D1
D2
{Id:1}
a b c
a 3 “a”
{Id:3}
{Id:2}
num
109
“Where.. Collect..”
![Page 65: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/65.jpg)
Conclusions
o Describing and Understanding provenance of data
o Two perspectives: Why is a piece of data in the output? Where did a piece of data come from?
o A system of rewrite rules where
why-provenance is preserved over the class of well-defined queries and where-provenance is preserved over the class of traceable queries.
![Page 66: Why & Where - Technionkanza/dbseminar/2011/WhyWhere.… · OLAP (online analytical processing) Provenance = Lineage = רוקמ ,אצומ ,ןיסחוי תלשוש ,ןיסחוי ןליא](https://reader036.vdocument.in/reader036/viewer/2022081614/5fcd20c4848db4037631ef58/html5/thumbnails/66.jpg)
!תודה על ההקשבה