(gemsofpodsandtest-of-/metalk)TheSemiringFrameworkfor
DatabaseProvenance(:hindsightisgreat!:)
5/15/2017 1PODS2017
ValTannenUniversityofPennsylvania
5/15/2017 2PODS2017
Collaborators
TofTawardTJGreenLogicBloxGrigorisKarvounarakisLogicBlox
GofPODSpaperTJ
ORCHESTRAZackIvesUniversityofPennsylvaniaTJ,Grigoris
OthercorepapersNateFosterCornellUniversityYaelAmsterdamerBar-IlanUniversityDanielDeutchTelAvivUniversityTovaMiloTelAvivUniversitySudeepaRoyDukeUniversityYuvalMoskovitchTelAvivUniversity
RecentworkErichGrädelRWTHAachen
Muchgra/tudePeterBunemanUniversityofEdinburgh
Binarytrust
5/15/2017 PODS2017 3
mouse gray
mouse red
rat gray
*SueandValarenotedzoologists.**Zackisanotedcomputa(onalzoologist
cat mouse
cat rat
Sue’s notes *
Val’s notes *
cat gray
cat red
Zack ** computaTon
food color
Binarytrust
5/15/2017 PODS2017 4
mouse gray
mouse red
rat gray
*SueandValarenotedzoologists.**Zackisanotedcomputa(onalzoologist
cat mouse
cat rat
Sue’s notes *
Val’s notes *
cat gray
cat red
Zack ** computaTon
Yes
No
Yes
Yes
Yes Yes
No
No
No
Yes
Accesscontrol
5/15/2017 PODS2017 5
mouse gray
mouse red
rat gray
Pub < Conf < Sec < TSec
cat mouse
cat rat
Sue’s notes
Val’s notes
cat gray
cat red
Zack computaTon
TSec
TSec
Conf
Pub
Pub Conf
TSec
Confidencescores(non-binarytrust)
5/15/2017 PODS2017 6
mouse gray
mouse red
rat gray
cat mouse
cat rat
Sue’s notes
Val’s notes
cat gray
cat red
Zack computaTon
0.6
0.1
0.8
0.9
0.9 0.72
0.09
0.72 = max(0.9× 0.8, 0.9 × 0.6) 0.09 = 0.9 × 0.1
Asimplemodelfordatapricing
5/15/2017 PODS2017 7
mouse gray
mouse red
rat gray
cat mouse
cat rat
Sue’s notes
Val’s notes
cat gray
cat red
Zack computaTon
$6
$1
$8
$10
$10 $16
$11
16 = min(10 +8, 10 + 6) 11 = 10 + 1
5/15/2017 8PODS2017
Doitonceanduseitrepeatedly:provenance
Label(annotate)inputitemsabstractlywithprovenancetokens.Provenancetracking:propagateexpressions(involvingtokens)
(toannotateintermediatedataand,finally,outputs)
TracktwodisTnctwaysofusingdataitemsbycomputaTonprimiTves:
• jointly(thisaloneisbasicallylikekeepingalog)
• alterna/vely(doingbothisessenTal;thinktrust)
Input-outputcomposiTonal;Modular(intheprimiTves)
Later,wewanttoevaluatetheprovenanceexpressionstoobtain
binarytrust,accesscontrol,
confidencescores,dataprices,etc.
Algebraicinterpreta/onforRDB
SetX ofprovenancetokens.SpaceofannotaTons,provenanceexpressionsProv(X)
Prov(X)-relaTons:everytupleisannotatedwithsomeelementfromProv(X).
BinaryoperaTonsonProv(X):
· correspondstojointuse(join,cartesianproduct), +correspondstoalternaTveuse(unionandprojecTon).
SpecialannotaTons:
‘‘Absent’’tuplesareannotatedwith0. 1 isa‘‘neutral’’annotaTon(datawedonottrack).
5/15/2017 PODS2017 9
K-Rela/onalalgebra
Algebraiclawsof(Prov(X), +, ·, 0,1)?Moregenerally,forannotaTons
fromastructure(K, +, ·, 0,1)?
K-relaTons.GeneralizeRA+to(posiTve)K-rela/onalalgebra.
DesiredopTmizaTonequivalencesofK- relaTonalalgebraiff
(K, +, ·, 0,1) isacommuta/vesemiring.
GeneralizesSPJUorUCQornon-rec.Datalog
setsemanTcs(B,Ç,Æ,?,>)bagsemanTcs(N,+,·,0,1)
c-table-semanTcs[IL84](BoolExp(X), Ç,Æ,?,>) eventtablesemanTcs[FR97,Z97](P(Ω),[,Å,;,Ω)
5/15/2017 PODS2017 10
Whatisacommuta/vesemiring?
Analgebraicstructure(K,+,·,0,1)where:• Kisthedomain
• +isassociaTve,commutaTve,with0idenTty
• ·isassociaTve,with1idenTtysemiring• ·distributesover+• a·0=0·a=0
• ·isalsocommuta/ve
Unlikering,norequirementforinversesto+
115/15/2017 PODS2017
Provenance:abstractsemiringannota/on
5/15/2017 PODS2017 12
mouse gray
mouse red
rat gray
cat mouse
cat rat
Sue’s notes
Val’s notes
cat gray
cat red
Zack Zack(x,z):-
Sue(x,y),Val(y,z)
r s t
p q
p·r+q·t p·s
KeepX={p,q,r,s,t } abstract.DiagnosTcforwronganswers;DeleTonpropagaTon.E.g.,r=s=0
Provenancepolynomials(N[X],+,·,0,1)semiring
Provenancepolynomials
5/15/2017 PODS2017 13
(N[X],+,·,0,1)isthecommutaTvesemiringfreelygeneratedbyX(universalitypropertyinvolvinghomomorphisms)
ProvenancepolynomialsarePTIME-computable(datacomplexity).(querycomplexitydependsonlanguageandrepresentaTon)
ORCHESTRAprovenance(graphrepresentaTon)about30%overhead
Monomialscorrespondtologicalderiva/ons(prooftreesinnon-rec.Datalog)
Provenancereadingofpolynomails:
outputtuplehasprovenance2r2 + rs threederivaTonsofthetuple-twoofthemuser, twice,-thethirduses r and s, onceeach
Specializeprovenanceforaccesscontrol
5/15/2017 PODS2017 14
mouse gray
mouse red
rat gray
cat mouse
cat rat
Sue’s notes
Val’s notes cat gray
cat red
Zack Zack(x,z):-
Sue(x,y),Val(y,z)
r s t
p q
pr+qt ps
(A, min, max, 0,Pub) whereA=Pub < Conf < Sec < TSec < 0
f: X!A f(p)=f(q)=Pub f(r)=f(s)=TSec f(t)= Conf
eval(f): N[X]!A eval(f)(pr+qt)=Conf eval(f)(ps)= TSec
TSec
TSec
Conf
Conf
TSec
Pub
Pub
Specializeprovenanceforconfidencescores
5/15/2017 PODS2017 15
mouse gray
mouse red
rat gray
cat mouse
cat rat
Sue’s notes
Val’s notes cat gray
cat red
Zack Zack(x,z):-
Sue(x,y),Val(y,z)
r s t
p q
pr+qt ps
V =([0,1], max,·,0,1)theViterbisemiring
f: X![0,1] f(p)=f(q)=0.9 f(r)=0.6 f(s)=0.1 f(t)= 0.8
eval(f): N[X]!V eval(f)(pr+qt)=0.72 eval(f)(ps)= 0.09
0.6
0.1
0.8
0.72
0.09
0.9
0.9
Someapplica/onsemirings
5/15/2017 PODS2017 16
(B,Æ,Ç,>,?)binarytrust
(N,+,·,0,1)mul(plicity(numberofderiva(ons)
(A,min,max,0,Pub)accesscontrol
V =([0,1], max,·,0,1)Viterbisemiring(MPE)confidencescores
T =([0,1],min,+,1,0)tropicalsemiring(shortestpaths)datapricing
F =([0,1], max,min,0,1)“fuzzylogic”semiring
Twokindsofsemiringsinthisframework
5/15/2017 PODS2017 17
Provenancesemirings,e.g.,
(N[X],+,·,0,1)provenancepolynomials[GKT07]
(Why(X),[,d,;,{;})witnesswhy-provenance[BKT01]
Applica/onsemirings,e.g.,
(A,min,max,0,Pub)accesscontrol[FGT08]
V =([0,1], max,·,0,1)Viterbisemiring(MPE)[GKIT07]
Provenancespecializa/onrelieson
-Provenancesemiringsarefreelygeneratedbyprovenancetokens- QuerycommutaTonwithsemiringhomomorphisms
Querycommuta/onwithhomomorphisms
queryinQL homomorphismh : K1 ! K2
5/15/2017 PODS2017 18
K1-Rel
K1-Rel
query query
h
h K2-Rel
K2-Rel
QL =RA+,Datalog[GKT07]andextensions[FGT08,GP10,ADT11a,T13,DMT15,GUKFC16,T17]
AHierarchyofProvenanceSemirings[G09,DMRT14]
N[X]
B[X] Trio(X)
Why(X)
Which(X)PosBool(X)
mostinformaTve
leastinformaTve
Example:2x2y+xy+5y2+xz
+="
195/15/2017 PODS2017
Sorp(X)
surjecTvesemiringhomomorphism,idenTtyonX
absorpTon
absorpTon(ab+a=a)
"idemp.+idemp.
x2y+xy+y2+xz 3xy+5y+xz
y+xz
xy+y2+xz
xyz
"idemp.
xy+y+xz
"idemp. +idemp.
AHierarchyofProvenanceSemirings[G09,DMRT14]
N[X]
B[X] Trio(X)
Why(X)
Which(X)PosBool(X)
205/15/2017 PODS2017
Sorp(X)
A
T,V
N
B
Amenagerieofprovenancesemirings
5/15/2017 PODS2017 21
(Which(X),[,[*, ;,;*)setsofcontribuTngtuples“Lineage”(1)[CWW00]
(Why(X),[,d,;,{;})setsofsetsof…Witnesswhy-provenance[BKT01]
(PosBool(X),Æ,Ç,>,?)minimalsetsofsetsof…Minimalwitnesswhy-provenance[BKT01]also“Lineage”(2)usedinprobabilisTcdbs[SORK11]
(Trio(X),+,·,0,1)bagsofsetsof…“Lineage”(3)[BDHT08,G09]
(B[X],+,·,0,1)setsofbagsof…Booleancoeff.polynomials[G09]
(Sorp(X),+, ·,0,1)minimalsetsofbagsof…absorpTvepolynomials[DMRT14]
(N[X],+,·,0,1)bagsofbagsof…universalprovenancepolynomials[GKT07]
FromRA+toDatalogImmediateconsequenceoperatorFofaDatalogprogram.Incorporatestheedbpredicates,mapsidbpredicatestoidbpredicates.
It’sexpressibleinRA+.E.g.,transiTveclosureF(T)=E[ ¼1,3(E ⋈T)
GeneralizetoF:(K-Rel)n!(K-Rel)n(n=#ofidbpredicates)
Solvecertain(systems)ofleastfixedpointequaTonsoverK-relaTons.T=F(T)
Equivalently:-introduceunknownsZfortheannotaTonsofidbtuples
-solvesystemoffixedpointequaTonsoverK;right-handsidesarepolynomialsinK[Z].
AddiTonalstructureonKforthesetohave(unique)soluTons?
5/15/2017 PODS2017 22
ω-con/nuoussemirings
SemiringsKsuchthattheimmediateconsequenceoperatorofanyDatalogprogramhasaleastfixpointonK-relaTons.
Naturally ordered when x ≤ y iff there exists z s.t. x+z = y is an order relation (all semirings seen here are naturally ordered)
ω-complete also x0 ≤ x1 ≤ … ≤ xn ≤ … have l.u.b.’s (sup’s)
ω-continuous moreover + and · preserve those l.u.b.’s
5/15/2017 PODS2017 23
Amongourexamples
Manyofthesemiringsthatinterestus B,T,V, A, Farealreadyω-conTnuous.
(N, +, · , 0, 1) isnot,butits “compleTon”(N1=N [ {1}, +, · , 0, 1) is.
Forprovenance,thecompleTonofN[X]isnotN1[X].Insteadof(finite)polynomialsweneed(possiblyinfinite)formalpowerseries.
Theyformanω-conTnuoussemiringN1[[X]].MonomialssTllcorrespondtoderivaTonstrees.(EventransiTveclosurehasinfinitelymanyderivaTontreesifEhasloops.)
ThecompleTonofB[X]isB[[X]].
5/15/2017 PODS2017 24
Absorp/vepolynomials
MostinformaTveprovenancesemiringforDatalog:(N1[[X]],+, ·, 0,1)(InfinitepowerserieshavefiniterepresentaTonsassystemsofpolynomialequaTons.)
AbsorpTona+a"b=a
AbsorpTvepolynomialsSorp(X): booleancoefficientsbutonlyminimaldegreemonomials
x2y+xy+y2+xz!xy+y2+xz
AbsorpTvepowerseriessameasabsorpTvepolynomials!
Why?Ordermonomialsbydegreeofeachvariable.InthisinfiniteposetallanTchainsarefinite!(Dickson’sLemma)
Sorp(X)isalreadyω-conTnuous:providesprovenancepolynomialsforDatalog.
SoisPosBool(X),butSorp(X)provenancealsosupportstropicalandViterbisemiringapplicaTons
5/15/2017 PODS2017 25
Furtheraspectsoftheframework
5/15/2017 PODS2017 26
Extensiontotreedata(NestedRelaTonalCalculus,structuralrecursionontrees,unorderedXQuery)[FGT08]
StudyofCQ/UCQonprovenance-annotatedrelaTons[G09]
Extensiontoaggregates(poly-sizeoverhead)[ADT11a]
Poly-sizeprovenanceforDatalog(circuits;PosBool(X),Sorp(X)…)[DMRT14]
Extensiontodata-dependentfinitestateprocesses[DMT15]
ConnecTonstosemiringmonad[FGT08,T13] tosemimodules[ADT11a] totensorproducts[ADT11a,DMT15]
Nega/veinforma/on;non-monotoneopera/ons(difference)
5/15/2017 PODS2017 27
Booleanexpressions[IL84].Limited.
AddabinaryoperaToncorrespondingtodifference m-semirings(commongen.ofsetandbagdifference)[GP10] spm-semirings(OPTIONALinSPARQL)[GUKFC16]
EncodedifferencebyaggregaTon[ADT11a]
DifferentequaTonaltheories,differentalgebraicopTmizaTons[ADT11b]
STllnotclearhowtotracknega/veinforma/on.useful:non-answers(whynot?),inserTonpropagaTon.
Logicalmodelchecking(“provenanceof…truth?”) negaTonasduality(NNFs),logicalgames ongoingworkwithGrädelandIves[T16,T17]
Currenttargets
5/15/2017 PODS2017 28
ANALYTICSCOMPUTATIONS
“Fine-grainedprovenanceforlinearalgebraoperators”Yan,T.,IvesTaPP16
DISTRIBUTEDSYSTEMS/NETWORKPROVENANCE
“Time-awareprovenancefordistributedsystems”,Zhou,Ding,Haeberlen,Ives,LooTaPP11
“Diagnosingmissingeventsindistributedsystemswithnega(veprovenance”,Wu,Zhao,Haeberlen,Zhou,LooSIGCOMM14
STATICANALYSISOFSOFTWARE
“OnabstracTonrefinementforprogramanalysesinDatalog”Zhang,Mangal,Grigore,NaikPLDI14
Frameworkreferences(I)*
5/15/2017 PODS2017 29
[GKT07]“Provenancesemirings”Green,Karvounarakis,TannenPODS07.
[GKIT07]“Updateexchangewithmappingsandprovenance”Green,Karvounarakis,Ives,TannenVLDB07.
[FGT08]“AnnotatedXML:queriesandprovenance”Foster,Green,TannenPODS08.
[G09]“Containmentofconjunc(vequeriesonannotatedrela(ons”GreenICDT09.
[GP10]“OndatabasequerylanguagesforK-rela(ons”,Geerts,PoggiJAppl.Logic2010.
*SeealsocompanionpaperinPODS2017proceedings.
Frameworkreferences(II)
5/15/2017 PODS2017 30
[ADT11a]“Provenanceforaggregatequeries”,Amsterdamer,Deutch,TannenPODS11.
[ADT11b]“Onthelimita(onsofprovenanceforquerieswithdifference”,Amsterdamer,Deutch,TannenTaPP11
[T13]“Provenancepropaga(onincomplexqueries”TannenBunemanFestschri}2013
[DMRT14]“CircuitsforDatalogprovenance”,Deutch,Milo,Roy,T.ICDT14.
[DMT15]“Provenance-basedanalysisofdata-centricprocesses”Deutch,Moskovitch,TannenVLDBJ.2015
Frameworkreferences(III)
5/15/2017 PODS2017 31
[GUKFC16]“AlgebraicstructuresforcapturingtheprovenanceofSPARQLqueries”Geerts,Unger,Karvounarakis,Fundulaki,ChristophidesJACM2016
[T16]“Abouttheprovenanceoftruth”TannenSimonsInst.Website16h~ps://simons.berkeley.edu/talks/val-tannen-2016-12-09
[T17]“ProvenanceanalysisforFOLmodelchecking”TannenSIGLOGNews2017
Otherreferences
5/15/2017 PODS2017 32
[IL84]“Incompleteinforma(oninrela(onaldatabases”Imieliński,LipskiJACM1984
[FR97]“Aprobabilis(crela(onalalgebra”Fuhr,RölleckeTOIS1997
[Z97]“Queryevalua(oninprobabilis(crela(onaldatabases”ZimányiDDS1997
[CWW00]“Tracingthelineageofviewdatainawarehousingenvironment”Cui,Widom,WienerTODS2000
[BKT01]“Whyandwhere:acharacteriza(onofdataprovenance”Buneman,Khanna,TanICDT2001
[BDHTW08]“Databaseswithuncertaintyandlineage”Benjelloun,DasSarma,Halevy,Theobald,WidomVLDBJ.2008
[SORK11]“Probabilis(cdatabases”Suciu,Olteanu,Ré,KochSLDM2011
[SuciuOlteanuRéKoch11]
5/15/2017 PODS2017 33
Thankyou!