relational model - homepages.inf.ed.ac.ukhomepages.inf.ed.ac.uk/apieris/courses/atfd2020/relational...
TRANSCRIPT
RelationalModel
AdvancedTopicsinFoundationsofDatabases,UniversityofEdinburgh,2019/20
RelationalModel
• Manyadhocmodelsbefore1970
− Hardtoworkwith
− Hardtoreasonabout
• 1970:RelationalModelbyEdgarFrankCodd
− Dataarestoredinrelations (ortables)
− Queriedusingadeclarativelanguage
− DBMSconvertsdeclarativequeriesintoproceduralqueries thatare
optimizedandexecuted
• KeyAdvantages
− Simpleandcleanmathematicalmodel(basedonlogic)
− Separationofdeclarativeandprocedural
RelationalDatabases
Flight origin destination airline
VIE LHR BA
LHR EDI BA
LGW GLA U2
LCA VIE OS
Airport code city
VIE Vienna
LHR London
LGW London
LCA Larnaca
GLA Glasgow
EDI Edinburgh
Constants
VIE,LHR,…
BA,U2,…
Vienna,London,…
RelationalDatabases
Flight origin destination airline
VIE LHR BA
LHR EDI BA
LGW GLA U2
LCA VIE OS
Airport code city
VIE Vienna
LHR London
LGW London
LCA Larnaca
GLA Glasgow
EDI Edinburgh
Relations
Constants
VIE,LHR,…
BA,U2,…
Vienna,London,…
RelationalDatabases
Flight origin destination airline
VIE LHR BA
LHR EDI BA
LGW GLA U2
LCA VIE OS
Airport code city
VIE Vienna
LHR London
LGW London
LCA Larnaca
GLA Glasgow
EDI Edinburgh
Constants
VIE,LHR,…
BA,U2,…
Vienna,London,…
Relations
Tuples
Relationalatoms
Flight(LHR,EDI,BA)
Airport(LGW,London)
Querying:RelationalAlgebra
Flight origin destination airline
VIE LHR BA
LHR EDI BA
LGW GLA U2
LCA VIE OS
Airport code city
VIE Vienna
LHR London
LGW London
LCA Larnaca
GLA Glasgow
EDI Edinburgh
Listalltheairlines
Querying:RelationalAlgebra
Flight origin destination airline
VIE LHR BA
LHR EDI BA
LGW GLA U2
LCA VIE OS
Airport code city
VIE Vienna
LHR London
LGW London
LCA Larnaca
GLA Glasgow
EDI Edinburgh
Listalltheairlines
πairline Flight
{BA,U2,OS}
Querying:RelationalAlgebra
Flight origin destination airline
VIE LHR BA
LHR EDI BA
LGW GLA U2
LCA VIE OS
Airport code city
VIE Vienna
LHR London
LGW London
LCA Larnaca
GLA Glasgow
EDI Edinburgh
ListthecodesoftheairportsinLondon
Querying:RelationalAlgebra
Flight origin destination airline
VIE LHR BA
LHR EDI BA
LGW GLA U2
LCA VIE OS
Airport code city
VIE Vienna
LHR London
LGW London
LCA Larnaca
GLA Glasgow
EDI Edinburgh
ListthecodesoftheairportsinLondon
πcode (σcity=‘London’Airport)
{LHR,LGW}
Querying:RelationalAlgebra
Flight origin destination airline
VIE LHR BA
LHR EDI BA
LGW GLA U2
LCA VIE OS
Airport code city
VIE Vienna
LHR London
LGW London
LCA Larnaca
GLA Glasgow
EDI Edinburgh
ListtheairlinesthatflydirectlyfromLondontoGlasgow
Querying:RelationalAlgebra
Flight origin destination airline
VIE LHR BA
LHR EDI BA
LGW GLA U2
LCA VIE OS
Airport code city
VIE Vienna
LHR London
LGW London
LCA Larnaca
GLA Glasgow
EDI Edinburgh
ListtheairlinesthatflydirectlyfromLondontoGlasgow
πairline ((Flight⋈origin=code(σcity=‘London’ Airport))⋈destination=code(σcity=‘Glasgow’Airport))
Querying:RelationalAlgebra
Aux origin destination airline code city code city
LGW GLA U2 LGW London GLA Glasgow
ListtheairlinesthatflydirectlyfromLondontoGlasgow
πairline ((Flight⋈origin=code(σcity=‘London’ Airport))⋈destination=code(σcity=‘Glasgow’Airport))
{U2}
definestheauxiliaryrelationAux
• Selection:σ
• Projection:π
• Crossproduct:×
• Naturaljoin:⋈
• Rename:ρ
• Difference:∖
• Union:∪
• Intersection:∩
RelationalAlgebra
inboldaretheprimitiveoperators
Formaldefinitionscanbefoundinanydatabasetextbook
Querying:DomainRelationalCalculus
Flight origin destination airline
VIE LHR BA
LHR EDI BA
LGW GLA U2
LCA VIE OS
Airport code city
VIE Vienna
LHR London
LGW London
LCA Larnaca
GLA Glasgow
EDI Edinburgh
Listalltheairlines
{z|∃x∃yFlight(x,y,z)}
Querying:DomainRelationalCalculus
Flight origin destination airline
VIE LHR BA
LHR EDI BA
LGW GLA U2
LCA VIE OS
Airport code city
VIE Vienna
LHR London
LGW London
LCA Larnaca
GLA Glasgow
EDI Edinburgh
Listalltheairlines
{BA,U2,OS}
Querying:DomainRelationalCalculus
Flight origin destination airline
VIE LHR BA
LHR EDI BA
LGW GLA U2
LCA VIE OS
Airport code city
VIE Vienna
LHR London
LGW London
LCA Larnaca
GLA Glasgow
EDI Edinburgh
ListthecodesoftheairportsinLondon
{x|∃yAirport(x,y) ∧ y=London}
Querying:DomainRelationalCalculus
Flight origin destination airline
VIE LHR BA
LHR EDI BA
LGW GLA U2
LCA VIE OS
Airport code city
VIE Vienna
LHR London
LGW London
LCA Larnaca
GLA Glasgow
EDI Edinburgh
ListthecodesoftheairportsinLondon
{LHR,LGW}
Querying:DomainRelationalCalculus
Flight origin destination airline
VIE LHR BA
LHR EDI BA
LGW GLA U2
LCA VIE OS
Airport code city
VIE Vienna
LHR London
LGW London
LCA Larnaca
GLA Glasgow
EDI Edinburgh
ListtheairlinesthatflydirectlyfromLondontoGlasgow
{z|∃x∃y∃u∃v Airport(x,u) ∧ u=London∧ Airport(x,u) ∧ u=London}∧ Flight(x,y,z)}
Querying:DomainRelationalCalculus
Flight origin destination airline
VIE LHR BA
LHR EDI BA
LGW GLA U2
LCA VIE OS
Airport code city
VIE Vienna
LHR London
LGW London
LCA Larnaca
GLA Glasgow
EDI Edinburgh
ListtheairlinesthatflydirectlyfromLondontoGlasgow
{U2}
{x|∀yR(x,y)}{x|¬R(x)}{x,y |R(x)∨ R(y)}
DomainRelationalCalculus
{x1,…,xk |φ}
first-orderformulawith
freevariables{x1,…,xk}
But,wecanexpress“problematic”queries,i.e.,dependonthedomain
DomainRelationalCalculus
domain ={1,2,3}
D ={R(1,1),R(1,2)}Ans={}
{x1,…,xk |φ}
first-orderformulawith
freevariables{x1,…,xk}
But,wecanexpress“problematic”queries,i.e.,dependonthedomain
{x|∀yR(x,y)}{x|¬R(x)}{x,y |R(x)∨ R(y)}
DomainRelationalCalculus
domain ={1,2}
D ={R(1,1),R(1,2)}Ans={1}
{x1,…,xk |φ}
first-orderformulawith
freevariables{x1,…,xk}
But,wecanexpress“problematic”queries,i.e.,dependonthedomain
{x|∀yR(x,y)}{x|¬R(x)}{x,y |R(x)∨ R(y)}
DomainRelationalCalculus
…thus,weadopttheactivedomainsemantics - quantifiedvariablesrangeover
theactivedomain,i.e.,theconstantsoccurringintheinputdatabase
{x1,…,xk |φ}
first-orderformulawith
freevariables{x1,…,xk}
But,wecanexpress“problematic”queries,i.e.,dependonthedomain
{x|∀yR(x,y)}{x|¬R(x)}{x,y |R(x)∨ R(y)}
Algebra=Calculus
Afundamentaltheorem(assumingtheactivedomainsemantics):
Theorem: Thefollowingquerylangaugesareequallyexpressive
• RelationalAlgebra(RA)
• DomainRelationalCalculus(DRC)
• TupleRelationalCalculus(TRC)
Note:Tuplerelationalcalculus isthedeclarativelanguageintroducebyCodd.Domainrelational
calculushasbeenintroducedlaterasaformalismclosertofirst-orderlogic
{|∃x∃y∃z∃w∃vAirport(x,Vienna) ∧ Airport(y,Glasgow) ∧
Flight(x,z,w)∧ Flight(z,y,v)
Quiz!
Flight origin destination airline
VIE LHR BA
LHR EDI BA
LGW GLA U2
LCA VIE OS
Airport code city
VIE Vienna
LHR London
LGW London
LCA Larnaca
GLA Glasgow
EDI Edinburgh
IsGlasgowreachablefromVienna?
London
Vienna
Larnaca
Glasgow
Edinburgh
YES
Quiz!
London
Vienna
Larnaca
Glasgow
Edinburgh
Flight origin destination airline
VIE LHR BA
LHR EDI BA
LGW GLA U2
LCA VIE OS
Airport code city
VIE Vienna
LHR London
LGW London
LCA Larnaca
GLA Glasgow
EDI Edinburgh
IsGlasgowreachablefromVienna?
NO
{|∃x∃y∃z∃w∃vAirport(x,Vienna) ∧ Airport(y,Glasgow) ∧
Flight(x,z,w)∧ Flight(z,y,v)
{|∃x∃y∃z∃w∃vAirport(x,Vienna) ∧ Airport(y,Glasgow) ∧
Flight(x,z,w)∧ Flight(z,y,v)
Quiz!
London
Vienna
Larnaca
Glasgow
Edinburgh
Flight origin destination airline
VIE LHR BA
LHR EDI BA
LGW GLA U2
LCA VIE OS
Airport code city
VIE Vienna
LHR London
LGW London
LCA Larnaca
GLA Glasgow
EDI Edinburgh
IsGlasgowreachablefromVienna?
YES∧ Flight(z,z1,w1) ∧
∃z1∃w1
Quiz!
Flight origin destination airline
VIE LHR BA
LHR EDI BA
LGW GLA U2
LCA VIE OS
Airport code city
VIE Vienna
LHR London
LGW London
LCA Larnaca
GLA Glasgow
EDI Edinburgh
IsGlasgowreachablefromVienna?
NO
London
Vienna
Larnaca
Glasgow
Edinburgh
{|∃x∃y∃z∃w∃vAirport(x,Vienna) ∧ Airport(y,Glasgow) ∧
Flight(x,z,w)∧ Flight(z,y,v)
∧ Flight(z,z1,w1) ∧
∃z1∃w1
Quiz!
Flight origin destination airline
VIE LHR BA
LHR EDI BA
LGW GLA U2
LCA VIE OS
Airport code city
VIE Vienna
LHR London
LGW London
LCA Larnaca
GLA Glasgow
EDI Edinburgh
IsGlasgowreachablefromVienna?
London
Vienna
Larnaca
Glasgow
Edinburgh
Recursivequery- notexpressibleinRA/DRC/TRC
(unlessweboundthenumberofintermediatestops)
ComplexityofQueryLanguages
• Thegoalistounderstandthecomplexityofevaluatingaqueryoveradatabase
• Ourmaintechnicaltooliscomplexitytheory
• Whattomeasure?Queriesmayhavealargeoutput,anditwouldbeunfairto
counttheoutputas“complexity”
• Wethereforeconsiderthefollowingdecisionproblems:
− QueryOutputTuple(QOT)
− BooleanQueryEvaluation(BQE)
AFewWordsonComplexityTheory
detailscanbefoundinthestandardtextbooks
seealsonotesonthewebpageofthecourse
ComplexityClasses
TIME(f(n)) = {Π |ΠisdecidedbysomeDTMintimeO(f(n))}
NTIME(f(n)) = {Π |ΠisdecidedbysomeNTMintimeO(f(n))}
SPACE(f(n)) = {Π |ΠisdecidedbysomeDTMusingspaceO(f(n))}
NSPACE(f(n)) = {Π |ΠisdecidedbysomeNTMusingspaceO(f(n))}
PTIME = ⋃k>0 TIME(nk)
NP = ⋃k>0 NTIME(nk)
EXPTIME = ⋃k>0 TIME(2nk)
NEXPTIME = ⋃k>0 NTIME(2nk)
LOGSPACE = SPACE(logn)
NLOGSPACE = NSPACE(logn)
PSPACE = ⋃k>0 SPACE(nk)
EXPSPACE = ⋃k>0 SPACE(2nk)
ComplexityClasses
• Wecannowrecallthestandardtimeandspacecomplexityclasses:
thesedefinitionsarerelyingontwo-
tapeTuringmachineswitharead-
onlyandaread/writetape
• ForeverycomplexityclassCwecandefineitscomplementaryclasscoC
RelationshipAmongComplexityClasses
LOGSPACE⊆ NLOGSPACE⊆ PTIME⊆ NP,coNP ⊆
PSPACE⊆ EXPTIME⊆ NEXPTIME,coNEXPTIME ⊆⋯
Someusefulnotes:
• ForadeterministiccomplexityclassC,coC =C
• coNLOGSPACE =NLOGSPACE
• ItisgenerallybelievedthatPTIME≠ NP,butwedon’tknow
• PTIME⊂ EXPTIME⇒ atleastonecontainmentbetweenthemisstrict
• PSPACE=NPSPACE,EXPSPACE=NEXPSPACE,etc.
• But,wedon’tknowwhetherLOGSPACE=NLOGSPACE
CompleteProblems
• Thesearethehardestproblemsinacomplexityclass
• AproblemthatiscompleteforaclassC,itisunlikelytobelonginalowerclass
• AproblemΠiscomplete foracomplexityclassC,orsimplyC-complete,if:
1. Π ∈ C
2. ΠisC-hard,i.e.,everyproblemΠ’∈ Ccanbeefficientlyreduced toΠ
• ToshowthatΠisC-harditsufficestoreducesomeC-hardproblemΠ’toit
thereexistsalogspace algorithmthatcomputesafunctionfsuchthat
w ∈ Π’ iff f(w) ∈ Π - inthiscasewewriteΠ’≤L Π
SomeCompleteProblems
• NP-complete
‒ SAT(satisfiabilityofpropositionalformulas)
‒ Manygraph-theoreticproblems(e.g.,3-colorability)
‒ Travelingsalesman
‒ etc.
• PSPACE-complete
‒ QuantifiedSAT(orsimplyQSAT)
‒ Equivalenceoftworegularexpressions
‒ Manygames(e.g.,Geography)
‒ etc.
BacktoQueryLanguages
ComplexityofQueryLanguages
• Thegoalistounderstandthecomplexityofevaluatingaqueryoveradatabase
• Ourmaintechnicaltooliscomplexitytheory
• Whattomeasure?Queriesmayhavealargeoutput,anditwouldbeunfairto
counttheoutputas“complexity”
• Wethereforeconsiderthefollowingdecisionproblems:
− QueryOutputTuple(QOT)
− BooleanQueryEvaluation(BQE)
ComplexityofQueryLanguages
QOT(L)
Input:adatabaseD,aqueryQ/k ∈ L,atupleofconstantst∈ adom(D)k
Question: t ∈ Q(D)?
Someusefulnotation:
• GivenadatabaseD,andaqueryQ,Q(D)istheanswer toQ overD
• adom(D)istheactivedomain ofD - theconstantsoccurringinD
• WewriteQ/k forthefactthatthearity ofQ isk≥ 0
L issomequerylanguage;forexample,RA,DRC,etc. - wewillseeseveralquerylanguages
ComplexityofQueryLanguages
BQE(L)
Input:adatabaseD,aBooleanqueryQ ∈ L
Question: isQ(D)non-empty?
Someusefulnotation:
• GivenadatabaseD,andaqueryQ,Q(D)istheanswer toQ overD
• adom(D)istheactivedomain ofD - theconstantsoccurringinD
• WewriteQ/k forthefactthatthearity ofQ isk≥ 0
L issomequerylanguage;forexample,RA,DRC,etc. - wewillseeseveralquerylanguages
(≡Lmeanslogspace-equivalent)
Theorem: QOT(L)≡L BQE(L),whereL ∈ {RA,DRC,TRC}
BQE(L)
Input:adatabaseD,aBooleanqueryQ ∈ L
Question: isQ(D)non-empty?
QOT(L)
Input:adatabaseD,aqueryQ/k ∈ L,atupleofconstantst∈ adom(D)k
Question: t ∈ Q(D)?
ComplexityofQueryLanguages
Theorem: QOT(DRC)≡L BQE(DRC)
Proof: (≤L)ConsideradatabaseD,ak-ary queryQ ={x1,…,xk |φ},andatuple(t1,…,tk)
LetQbool ={|∃x1⋯∃xk (φ ∧ x1=t1 ∧ x2=t2 ∧⋯ ∧ xk=tk)}
Clearly,(t1,…,tk) ∈ Q(D) iff Qbool (D)isnon-empty
(≥L)Trivial- aBooleandomainRCqueryisadomainRCquery
ComplexityofQueryLanguages
(letusshowthisfordomainrelationalcalculus)
BQE[D](L)
Input:aBooleanqueryQ ∈ L
Question: isQ(D)non-empty?
ComplexityMeasures
• Combinedcomplexity - bothD andQ arepartoftheinput
• Querycomplexity - fixedD,inputQ
• Datacomplexity - inputD,fixedQ
BQE[Q](L)
Input:adatabaseD
Question: isQ(D)non-empty?
Theorem: ForL ∈ {RA,DRC,TRC}thefollowinghold:
• BQE(L)isPSPACE-complete(combinedcomplexity)
• BQE[D](L)isPSPACE-complete,forafixeddatabaseD(querycomplexity)
• BQE[Q](L)isinLOGSPACE,forafixedqueryQ ∈ L (datacomplexity)
ComplexityofRA,DRC,TRC
Proof hints:
• Recursive algorithm that uses polynomial space in Q and logarithmic space in D
• Reduction from QSAT (a standardPSPACE-hardproblem)
Evaluating(Boolean)DRCQueries
Eval(D,φ) - forbrevitywewriteφ insteadof{|φ}
• Ifφ= R(t1,…,tk), then YESiff R(t1,…,tk) ∈ D
• Ifφ= ψ1∧ψ2,then YESiff Eval(D,ψ1) =YESandEval(D,ψ2) =YES
• Ifφ= ¬ψ,thenNOiff Eval(D,ψ) =YES
• Ifφ= ∃xψ(x),then YESiff forsomet∈ adom(D),Eval(D,ψ(t)) =YES
Lemma: It holds that
• Eval(D,φ) always terminates - this is trivial
• Eval(D,φ) = YES iff Q(D) is non-empty, whereQ = { | φ}
• Eval(D,φ) uses O(|φ| ⋅ log |φ| + |φ|2 ⋅ log |D|) space
Theorem: ForL ∈ {RA,DRC,TRC}thefollowinghold:
• BQE(L)isPSPACE-complete(combinedcomplexity)
• BQE[D](L)isPSPACE-complete,forafixeddatabaseD(querycomplexity)
• BQE[Q](L)isinLOGSPACE,forafixedqueryQ ∈ L (datacomplexity)
ComplexityofRA,DRC,TRC
Proof hints:
• Recursive algorithm that uses polynomial space in Q and logarithmic space in D
• Reduction from QSAT (a standardPSPACE-hardproblem)
EQUIV(L)
Input:twoqueriesQ1 ∈ Land Q2 ∈ L
Question:Q1≡Q2? orQ1(D)=Q2(D)forevery(finite)databaseD?
SAT(L)
Input:aqueryQ ∈ L
Question: istherea(finite)databaseD suchthatQ(D)isnon-empty?
OtherImportantAlgorithmicProblems
CONT(L)
Input:twoqueriesQ1 ∈ Land Q2 ∈ L
Question:Q1⊆Q2? orQ1(D)⊆Q2(D)forevery(finite)databaseD?
EQUIV(L)
Input:twoqueriesQ1 ∈ Land Q2 ∈ L
Question:Q1≡Q2? orQ1(D)=Q2(D)forevery(finite)databaseD?
SAT(L)
Input:aqueryQ∈ L
Question: istherea(finite)databaseDsuchthatQ(D)isnon-empty?
OtherImportantAlgorithmicProblems
CONT(L)
Input:twoqueriesQ1 ∈ Land Q2 ∈ L
Question:Q1⊆Q2? orQ1(D)⊆Q2(D)forevery(finite)databaseD?
theseproblemsareimportant
foroptimizationpurposes
SAT(L)
Input:aqueryQ ∈ L
Question: istherea(finite)databaseD suchthatQ(D)isnon-empty?
OtherImportantAlgorithmicProblems
• If the answer is no, then the input query Q makes no sense
• Query evaluation becomes trivial - the answer is always NO!
EQUIV(L)
Input:twoqueriesQ1 ∈ Land Q2 ∈ L
Question:Q1≡Q2? orQ1(D)=Q2(D)forevery(finite)databaseD?
OtherImportantAlgorithmicProblems
• Replace a query Q1 with a query Q2 that is easier to evaluate
• But, we have to be sure that Q1(D) = Q2(D) for every database D
• Approximate a query Q1 with a query Q2 that is easier to evaluate
• But, we have to be sure that Q1(D)⊆Q2(D) for every database D
CONT(L)
Input:twoqueriesQ1 ∈ Land Q2 ∈ L
Question:Q1⊆Q2? orQ1(D)⊆Q2(D)forevery(finite)databaseD?
OtherImportantAlgorithmicProblems
Theorem: ForL ∈ {RA,DRC,TRC},SAT(L)isundecidable
Proof hint: By reduction from the halting problem.
Given a Turing machine M, we can construct a query QM ∈ L such that:
Mhaltsontheemptystringiff thereexistsadatabaseD suchthatQ(D)isnon-empty
SATisUndecidable
Note:Actually,thisresultgoesbacktothe1950when
BorisA.Trakhtenbrotprovedthattheproblemofdeciding
whetherafirst-ordersentencehasafinitemodelisundecidable
AneasyconsequenceofthefactthatSATisundecidableisthat:
Theorem: ForL ∈ {RA,DRC,TRC},EQUIV(L)andCONT(L)areundecidable
Proof: By reduction from the complement of SAT(L)
• Consider a query Q ∈ L - i.e., an instance of SAT(L)
• LetQ’ beaquerythatisunsatisfiable,i.e.,Q’(D)isemptyforeveryD
• Forexample,whenL =DRC,Q’ canbethequery{|∃xR(x)∧¬R(x)}
• Clearly,Qisunsatisfiable iff Q≡Q’(orevenQ⊆Q’)
EQUIVandCONTareUndecidable
Recap
• The main languages for querying relational databases are:
− Relational Algebra (RA)
− Domain Relational Calcuclus (DRC)
− Tuple Relational Calculus (TRC)
• Evaluation is decidable, and highly tractable indata complexity
− Foundations of the database industry
− The core of SQL is equally expressive to RA/DRC/TRC
RA =DRC =TRC
(undertheactivedomainsemantics)
• Satisfiability, equivalence and containment are undecidable
− Perfect query optimization is impossible