datalog - cis.upenn.edususan/cis700/slides/lec-datalogb.pdf · •non-recursive datalog with...

30
© 2017 A. Alawini, S. Davidson Datalog Susan B. Davidson CIS 700: Advanced Topics in Databases MW 1:30-3 Towne 309 http://www.cis.upenn.edu/~susan/cis700/homepage.html

Upload: vuliem

Post on 12-Aug-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

© 2017 A. Alawini, S. Davidson

Datalog

SusanB.DavidsonCIS700:AdvancedTopicsinDatabases

MW1:30-3

Towne309

http://www.cis.upenn.edu/~susan/cis700/homepage.html

• Signuptopresentapaper(theGoogledoclinkwassentonFriday)

• Classscheduleisbeingupdatedbasedonthis.

Homework for this week

UniversityofPennsylvania 2

• Firstsummaryisonthefollowingpaper:• “BigDataAnalyticswithDatalogQueriesonSpark”SIGMOD2016

• Whatisasummary(printandbringtoclass)?• Shortparagraphdescribingpaper• 1-3“strengths”,1-3“weaknesses”

• Atleastonequestionyouhaveaboutthepaper.

First paper summary due 2/5

UniversityofPennsylvania 3

• Facts(EDB)andrules(IDB)• Safequeries• Negationcanbetricky…

Last time: Datalog

UniversityofPennsylvania 4

5

The Bachelor problem Suppose we have an EDB relation married(x,y)

and want to calculate the bachelors.

Not correct (and not safe):

bachelor(y) :- NOT married(x,y)

bachelor(y) :- person(x), person(Y), NOT married(x,y)

notBachelor(y):- married(x,y) notBachelor(x):- married(x,y) bachelor(y) :- person(y), NOT notBachelor(y)

Also not correct (but safe):

Correct (and safe):

• Asimplerecursiveprogramandnaïveevaluation

• EvaluatingDatalog+programs

• Negationcanstillbetricky…

This time: Datalog+ with Recursion

UniversityofPennsylvania 6

• Non-recursiveDatalogwithnegationisacleaned-upcoreofSQL

• Unionsofconjunctivequeries•  Formsthecoreofqueryoptimization,whatweknowhowtoreasonovereasily

• Youcantranslateeasilybetweennon-recursiveDatalogwithnegationandSQL.

•  Takethejoinofthenonnegated,relationalsubgoalsandselect/deletefromthere.

Datalog versus SQL

UniversityofPennsylvania 7

• Recursion

• RulesexpressthingsthatgooninbothFROMandWHEREclauses,andletusstatesomegeneralprinciples(e.g.,containmentofrules)thatarealmostimpossibletostatecorrectlyinSQL.

Why Datalog?

UniversityofPennsylvania 8

Simple recursive Datalog program

UniversityofPennsylvania 9

1 2

2 1

2 3

1 4

3 4

4 5

R=

R encodes a graph.

T(x,y):- R(x,y) T(x,y):- R(x,z), T(z,y)

What does T compute?

10

Naïve Evaluation

T={}

WHILE(changestoT)DO

T=TU(R(x,y)U(R(x,y)⋈T(y,z)))

Simple recursive Datalog program

UniversityofPennsylvania 11

1 2

2 1

2 3

1 4

3 4

4 5

R=

R encodes a graph. T(x,y):- R(x,y) T(x,y):- R(x,z), T(z,y)

Alternate ways to compute transitive closure:

T(x,y):- R(x,y) T(x,y):- T(x,z), R(z,y)

T(x,y):- R(x,y) T(x,y):- T(x,z), T(z,y)

Right linear

Left linear

Non-linear

Another Interesting Program

UniversityofPennsylvania 12

1 2

2 1

2 3

1 4

3 4

4 5

R= ODD(x,y):- R(x,y) ODD(x,y):- R(x,z), EVEN(z,y) EVEN(x,y):-R(x,z), ODD(z,y) Q:- ODD(x,x)

Non 2-colorability: R encodes a graph.

13

Evaluating Datalog+ Programs

1.  Nonrecursiveprograms.

2.  Naïveevaluationofrecursiveprogramswithoutnegation.

3.  Semi-naïveevaluationofrecursiveprogramswithoutnegation.§  Eliminatessomeredundantcomputation.

14

Nonrecursive Evaluation

• If(andonlyif!)aDatalogprogramisnotrecursive,thenwecanordertheIDBpredicatessothatinanyruleforp(i.e.,pistheheadpredicate),theonlyIDBpredicatesinthebodyprecedep.

15

Why?

• Considerthedependencygraphwith:• Nodes=IDBpredicates.

• Arcpàqiffthereisaruleforpwithqinthebody.

• Cycleinvolvingnodepmeanspisrecursive.

• Nocycles:usetopologicalordertoevaluatepredicates.

16

Applying Rules

ToevaluateanIDBpredicatep:1.  Applyeachruleforptothecurrentrelationscorrespondingto

itssubgoals.

§  “Apply”=Ifanassignmentofvaluestovariablesmakesthebodytrue,insertthetuplethattheheadbecomesintotherelationforp(noduplicates).

§  Alsothinkofthe“product”oftherelationscorrespondingtothesubgoalswithselection/joinconditions

2.  Taketheunionoftheresultforeachp-rule.

17

Example

• Assignmentsmakingthebodytrue:

(x,y,z)=(1,5,2),(3,9,4)

• SoP={(1,5),(3,9)}.

Q={(1,2), (3,4)} R={(2,5),(4,9),(4,10), (6,7)}

P(x,y) :- Q(x,z), R(z,y), y<10

18

Algorithm for Nonrecursive

FOR(eachpredicatePintopologicalorder)DO

ApplytherulesforPto previouslycomputed

relations tocomputerelationP;

19

Naïve Evaluation for Recursive

makeallIDBrelationsempty;

WHILE(changestoIDB)DO

FOR(eachIDBpredicateP)DO

EvaluatePusingcurrentvaluesofallrelations;

20

Important Points

• AslongasthereisnonegationofIDBsubgoals,theneachIDBrelation“grows,”i.e.,oneachrounditcontainsatleastwhatitusedtocontain.• monotonicity

• Sincerelationsarefinite,theloopmusteventuallyterminate.

• Resultistheleastfixedpoint(minimalmodel)ofrules.

• Thesamefactsarediscoveredoverandoveragain.

• Thesemi-naïvealgorithmtriestoreducethenumberoffactsdiscoveredmultipletimes.• Thereisasimilaritytoincrementalviewmaintenance

Problem with Naïve Evaluation

UniversityofPennsylvania 21

Background: Incremental View Maintenance

UniversityofPennsylvania 22

V(x,y):-R(x,z),S(z,y)

If Rß R U ΔR then what is ΔV(X,Y)?

ΔV(x,y):-ΔR(x,z),S(z,y)

If Rß R U ΔR and Sß S U ΔS then what is ΔV(X,Y)?

ΔV(x,y):-ΔR(x,z),S(z,y)ΔV(x,y):-R(x,z),ΔS(z,y)ΔV(x,y):-ΔR(x,z),ΔS(z,y)

Background: Incremental View Maintenance

UniversityofPennsylvania 23

V(x,y):-T(x,z),T(z,y)

If Tß T U ΔT then what is ΔV(X,Y)?

ΔV(x,y):-ΔT(x,z),T(z,y)ΔV(x,y):-T(x,z),ΔT(z,y)ΔV(x,y):-ΔT(x,z),ΔT(z,y)

24

Semi-naïve Evaluation

• Keyidea:togetanewtupleforrelationPononeround,theevaluationmustusesometupleforsomerelationofthebodythatwasobtainedonthepreviousround.

• MaintainΔP=newtuplesaddedtoPonpreviousround.

• “Differentiate”rulebodiestobeunionofbodieswithoneIDBsubgoalmade“Δ.”

• SeparatetheDatalogprogramintothenon-recursive,andtherecursivepart.

• EachPidefinedbynon-recursive-SPJUiand(recursive-)SPJUi.

Semi-naïve Evaluation

UniversityofPennsylvania 25

P1=ΔP1=non-recursive-SPJU1,P2=ΔP2=non-recursive-SPJU2,…LoopΔP1 = Δ SPJU1 – P1; ΔP2 = ΔSPJU2 – P2; … if(ΔP1=∅andΔP2=∅and…)thenbreakP1 = P1 ∪ ΔP1; P2 = P2 ∪ ΔP2; …Endloop

Semi-naïve Evaluation

UniversityofPennsylvania 26

P1=ΔP1=non-recursive-SPJU1,P2=ΔP2=non-recursive-SPJU2,…LoopΔP1 = Δ SPJU1 – P1; ΔP2 = ΔSPJU2 – P2; … if(ΔP1=∅andΔP2=∅and…)thenbreakP1 = P1 ∪ ΔP1; P2 = P2 ∪ ΔP2; …Endloop

T(x,y):- R(x,y)T(x,y):- R(x,z), T(z,y)

T(x,y) = ΔT(x,y) =?(nonrecursiverule)LoopΔT(x,y) = ?(recursiveΔ-rule) if (ΔT = ∅)thenbreak

T = T∪ΔTEndloop

Semi-naïve Evaluation

UniversityofPennsylvania 27

P1=ΔP1=non-recursive-SPJU1,P2=ΔP2=non-recursive-SPJU2,…LoopΔP1 = Δ SPJU1 – P1; ΔP2 = ΔSPJU2 – P2; … if(ΔP1=∅andΔP2=∅and…)thenbreakP1 = P1 ∪ ΔP1; P2 = P2 ∪ ΔP2; …Endloop

T(x,y):- R(x,y)T(x,y):- R(x,z), T(z,y)

T(x,y) = ΔT(x,y) =R(x,y)LoopΔT(x,y) = (R(x,z),ΔT(z,y))-R(x,y) if (ΔT = ∅)thenbreak

T = T∪ΔTEndloop

• Avoidsrecomputingsome(butnotall)tuples

• Easytoimplement,nodisadvantageovernaïve

• AruleiscalledlinearifitsbodycontainsonlyonerecursiveIDBpredicate:

• Alinearrulealwaysresultsinasingleincrementalrule

• Anon-linearrulemayresultinmultipleincrementalrules

Discussion of Semi-Naïve Algorithm

UniversityofPennsylvania 28

29

Recursion and Negation Don’t Like Each Other

• WhenruleshavenegatedIDBsubgoals,therecanbeseveralminimalmodels.

• Recall:model=setofIDBfacts,plusthegivenEDBfacts,thatmaketherulestrueforeveryassignmentofvaluestovariables.• Ruleistrueunlessbodyistrueandheadisfalse.

S(x):-R(x),notT(x)T(x):-R(x),notS(x)

SupposeR(a).WhatareSandT?

Next time: Datalog-

UniversityofPennsylvania 30