database theory - vu 181.140, ss 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the...
TRANSCRIPT
![Page 1: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of](https://reader033.vdocument.in/reader033/viewer/2022060722/608283dbd07d610c115edb30/html5/thumbnails/1.jpg)
Database Theory
Database TheoryVU 181.140, SS 2011
1. Introduction: Relational Query Languages
Reinhard Pichler
Institut fur InformationssystemeArbeitsbereich DBAI
Technische Universitat Wien
8 March, 2011
Pichler 8 March, 2011 Page 1
![Page 2: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of](https://reader033.vdocument.in/reader033/viewer/2022060722/608283dbd07d610c115edb30/html5/thumbnails/2.jpg)
Database Theory
Outline
1. Overview1.1 Databases and Query Languages1.2 Query Languages: Relational Algebra1.3 Query Languages: Relational (Domain) Calculus1.4 Query Languages: SQL1.5 Query Languages: other Languages1.6 Some Fundamental Aspects of Query Languages
Pichler 8 March, 2011 Page 2
![Page 3: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of](https://reader033.vdocument.in/reader033/viewer/2022060722/608283dbd07d610c115edb30/html5/thumbnails/3.jpg)
Database Theory 1. Overview 1.1. Databases and Query Languages
A short history of databases
1970’s: relational revolution• Relational model of databases (E. F. Codd), truly realizing Physical
data independence• Relational query languages (SQL)
SEQUEL: SystemR from IBMQUEL: Ingress from UC Berkeley
1980’s• Relational query optimization• Constraints, dependency theory• Datalog (extend the query language with recursion)
1990’s• New models: temporal databases, OO, OR databases• Data mining, data warehousing
Late 1990’s until now: Internet revolution• Data integration on the web, managing huge data volumes• XML, Sensor networks, P2P
Pichler 8 March, 2011 Page 3
![Page 4: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of](https://reader033.vdocument.in/reader033/viewer/2022060722/608283dbd07d610c115edb30/html5/thumbnails/4.jpg)
Database Theory 1. Overview 1.1. Databases and Query Languages
Database theory
Cut-crossing many areas in Computer Science and Mathematics
• Complexity → efficiency of query evaluation, optimization
• Logics, Finite model theory → expressiveness
• Logic programming, constraint satisfaction (AI) → Datalog
• Graph theory → (hyper)tree-decompositions
• Automata → XML query model, data stream processing
Benefit from other fields on the one hand, contribute new results onthe other hand
Pichler 8 March, 2011 Page 4
![Page 5: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of](https://reader033.vdocument.in/reader033/viewer/2022060722/608283dbd07d610c115edb30/html5/thumbnails/5.jpg)
Database Theory 1. Overview 1.1. Databases and Query Languages
Relational data model
A database (also called structure) is a collection of relations (ortables)
Each database has a schema, i.e., the vocabulary (or signature)• Each relation r has a list of attributes (or columns) → denoted
schema(r)
Each attribute A has a domain (or universe) denoted dom(A)
• We definedom(r) =
[A∈schema(r)
dom(A)
Each relation contains a set of tuples (or rows)• Formally, a tuple in r is a mapping t : schema(r) → dom(r) such
that t(A) ∈ dom(A) for all A ∈ schema(r)
Note: For ease of notation, we often use ordered lists of attributesinstead of sets.
Pichler 8 March, 2011 Page 5
![Page 6: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of](https://reader033.vdocument.in/reader033/viewer/2022060722/608283dbd07d610c115edb30/html5/thumbnails/6.jpg)
Database Theory 1. Overview 1.1. Databases and Query Languages
Example
Schema• Author (AID integer, name string, age integer)• Paper (PID string, title string, year integer)• Write (AID integer, PID integer)
Instance• {〈142, Knuth, 73〉, 〈123, Ullman, 67〉, . . .}• {〈181140pods, Querycontainment, 1998〉, . . .}• {〈123, 181140pods〉, 〈142, 193214algo〉, . . .}
Pichler 8 March, 2011 Page 6
![Page 7: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of](https://reader033.vdocument.in/reader033/viewer/2022060722/608283dbd07d610c115edb30/html5/thumbnails/7.jpg)
Database Theory 1. Overview 1.1. Databases and Query Languages
Relational query languages
Query languages are formal languages with syntax and semantics:
• Syntax: algebraic or logical formalism or specific query language (likeSQL). Uses the vocabulary of the DB schema
• Semantics: M[Q] a mapping that transforms a database (instance)D into a database (instance) D ′ = M[Q](D) (i.e. the databaseM[Q](D) is the answer of Q over the DB D)
We always disregard queries that are dependent on the particularrepresentation of domain values. We thus focus on generic queries
Definition
Generic queries are queries that produce isomorphic results on isomorphicdatabases.
Pichler 8 March, 2011 Page 7
![Page 8: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of](https://reader033.vdocument.in/reader033/viewer/2022060722/608283dbd07d610c115edb30/html5/thumbnails/8.jpg)
Database Theory 1. Overview 1.2. Query Languages: Relational Algebra
Relational Algebra (RA)
σ → Selection∗π → Projection∗× → Cross product∗./ → Join
ρ → Rename∗− → Difference∗∪ → Union∗∩ → Intersection ∗Primitive operations, all others
can be obtained from these.
For precise definition of RA see any DB textbook or Wikipedia.
Pichler 8 March, 2011 Page 8
![Page 9: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of](https://reader033.vdocument.in/reader033/viewer/2022060722/608283dbd07d610c115edb30/html5/thumbnails/9.jpg)
Database Theory 1. Overview 1.2. Query Languages: Relational Algebra
Example
Recall the schema:• Author (AID integer, name string, age integer)• Paper (PID string, title string, year integer)• Write (AID integer, PID integer)
Example query: PIDs of the papers NOT written by Knuth
πPID(Paper)− πPID(Write ./ σname=”Knuth”(Author))
Example query: AIDs of authors who wrote exactly one paper
S2 = Write ./AID=AID′∧PID 6=PID′ ρAID′←AID,PID′←PID(Write)
S = πAIDWrite − πAIDS2
Pichler 8 March, 2011 Page 9
![Page 10: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of](https://reader033.vdocument.in/reader033/viewer/2022060722/608283dbd07d610c115edb30/html5/thumbnails/10.jpg)
Database Theory 1. Overview 1.3. Query Languages: Relational (Domain) Calculus
Recall First-order Logic (FO)
Formulas built using:
Quantifiers: ∀, ∃,
Boolean connectives: ∧, ∨, ¬Parentheses: (, )
Atoms: R(t1, . . . , tn), t1 = t2
Example database (i.e. a first-order structure):
Schema: E(FROM string, TO string)
Instance: {〈v , u〉, 〈u,w〉, 〈w , v〉}Example sentences of FO:
∀x∃yE (x , y)
∀x∃y∃z(E (z , x) ∧ E (x , y))
∃x∀y∃z(E (z , x) ∧ E (x , y))
∀x∃y∃z(¬(y = z) ∧ E (x , y) ∧ E (x , z))
Pichler 8 March, 2011 Page 10
![Page 11: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of](https://reader033.vdocument.in/reader033/viewer/2022060722/608283dbd07d610c115edb30/html5/thumbnails/11.jpg)
Database Theory 1. Overview 1.3. Query Languages: Relational (Domain) Calculus
Free variables of a formulaFO formulas may have free variables (i.e., not bound by a quantifier).
free : Formulae→ 2Variables
free(R(t1, . . . , tn)) := {ti | ti is a variable, 1 ≤ i ≤ n}free(t1 = t2) := {ti | ti is a variable, 1 ≤ i ≤ 2}free(ϕ ∧ ψ) := free(ϕ) ∪ free(ψ)
free(ϕ ∨ ψ) := free(ϕ) ∪ free(ψ)
free(¬ϕ) := free(ϕ)
free(∃x ϕ) := free(ϕ)− {x}free(∀x ϕ) := free(ϕ)− {x}
Example
free(∃zR(x , y , z)) = {x , y}free
(∃x1∃x2 R(x1, x2) ∧ S(x2, x3)
)= {x3}
Note: if free(ϕ) = ∅, then ϕ is a sentence.
Pichler 8 March, 2011 Page 11
![Page 12: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of](https://reader033.vdocument.in/reader033/viewer/2022060722/608283dbd07d610c115edb30/html5/thumbnails/12.jpg)
Database Theory 1. Overview 1.3. Query Languages: Relational (Domain) Calculus
Relational (Domain) CalculusIf ϕ is an FO formula with free(ϕ) = {x1, . . . , xn}, then
{〈x1, . . . , xn〉 | ϕ}
is an n-ary query of the domain calculus. On database A with domain A,it returns the set of all tuples 〈a1, . . . , an〉 ∈ (A)n such that the sentenceϕ[a1, . . . , an] obtained from ϕ by replacing each xi by ai evaluates to truein the structure A.
All free variables of ϕ must occur in the output tuple 〈x1, . . . , xn〉.Slight syntactic generalization: Variables may be repeated in theoutput tuple of the query.Example: We may write {〈x1, . . . , xn, x1〉 | ϕ} as a shortcut for{〈x1, . . . , xn, x
′1〉 | ϕ ∧ x1 = x ′1}.
We often simply write ϕ rather than {〈x1, . . . , xn〉 | ϕ}(i.e., the free variables of a formula are considered as the output).
In particular, we usually write ϕ rather than {〈〉 | ϕ} for Booleanqueries (n = 0).
Pichler 8 March, 2011 Page 12
![Page 13: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of](https://reader033.vdocument.in/reader033/viewer/2022060722/608283dbd07d610c115edb30/html5/thumbnails/13.jpg)
Database Theory 1. Overview 1.3. Query Languages: Relational (Domain) Calculus
Example
Recall the schema:• Author (AID integer, name string, age integer)• Paper (PID string, title string, year integer)• Write (AID integer, PID integer)
Example query: “PIDs of the papers NOT written by Knuth”
{PID | ∃T∃Y (Paper(PID,T ,Y ) ∧
∧¬(∃A∃AID(Write(AID,PID)∧Author(AID, ”Knuth”,A))))}
Example query: “AIDs of authors who wrote exactly one paper”
{AID | ∃PID(Write(AID,PID)∧¬∃PID2(Write(AID,PID2)∧PID 6= PID2))}
Pichler 8 March, 2011 Page 13
![Page 14: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of](https://reader033.vdocument.in/reader033/viewer/2022060722/608283dbd07d610c115edb30/html5/thumbnails/14.jpg)
Database Theory 1. Overview 1.3. Query Languages: Relational (Domain) Calculus
Quantifier rank of a formulaWe will need this for the future:
qr : Formulae→ Nqr(R(t1, . . . , tn)) := 0
qr(t1 = t2)) := 0
qr(ϕ ∧ ψ) := max(qr(ϕ), qr(ψ))
qr(ϕ ∨ ψ) := max(qr(ϕ), qr(ψ))
qr(¬ϕ) := qr(ϕ)
qr(∃x ϕ) := qr(ϕ) + 1
qr(∀x ϕ) := qr(ϕ) + 1
Example
qr(∃x1∃x2 (x1 = x2 ∧ ¬∃x3 R(x1, x2, x3))
)= 3.
qr(∃x1 (∃x2 x1 = x2) ∧ ¬(∃x3 S(x1, x3))
)= 2.
Pichler 8 March, 2011 Page 14
![Page 15: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of](https://reader033.vdocument.in/reader033/viewer/2022060722/608283dbd07d610c115edb30/html5/thumbnails/15.jpg)
Database Theory 1. Overview 1.4. Query Languages: SQL
SQL (Structured Query Language)
A standardized language:• most database management systems (DBMSs) implement SQL
SQL is not only a query language:• supports constructs to manage the database (create/delete
tables/rows)
Query constructs of SQL (SELECT/FROM/WHERE/JOIN) arebased on relational algebra
Example query: “AIDs of the co-authors of Knuth”
SELECT W1.AIDFROM Write W1, Write W2WHERE W1.PID=W2.PID AND W2.AID=”Knuth”
Pichler 8 March, 2011 Page 15
![Page 16: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of](https://reader033.vdocument.in/reader033/viewer/2022060722/608283dbd07d610c115edb30/html5/thumbnails/16.jpg)
Database Theory 1. Overview 1.4. Query Languages: SQL
Relational Algebra vs. Relational Calculus vs. SQL
Theorem (following Codd 1972)
Relational algebra, relational calculus, and SQL queries essentially haveequal expressive power.
queries in the 3 languages can be translated from one language toanother while preserving the query answer
all 3 languages have their advantages:
1 use the flexible syntax of relational calculus to specify the query2 use the simplicity of relational algebra for query
simplification/optimization3 use SQL to implement the query over a DB
Restrictions apply: no aggregation in SQL queries,“safety” requirements for relational calculus.
Pichler 8 March, 2011 Page 16
![Page 17: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of](https://reader033.vdocument.in/reader033/viewer/2022060722/608283dbd07d610c115edb30/html5/thumbnails/17.jpg)
Database Theory 1. Overview 1.5. Query Languages: other Languages
Towards other query languages languages
Paul Erdos (1913-1996), oneof the most prolific writers ofmathematical papers, wrotearound 1500 mathematical ar-ticles in his lifetime, mostlyco-authored. He had 509 di-rect collaborators
Pichler 8 March, 2011 Page 17
![Page 18: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of](https://reader033.vdocument.in/reader033/viewer/2022060722/608283dbd07d610c115edb30/html5/thumbnails/18.jpg)
Database Theory 1. Overview 1.5. Query Languages: other Languages
Erdos number
The Erdos number, is a way of describing the “collaborativedistance”, in regard to mathematical papers, between an author andErdos.
An author’s Erdos number is defined inductively as follows:
• Paul Erdos has an Erdos number of zero.• The Erdos number of author M is one plus the minimum among the
Erdos numbers of all the authors with whom M co-authored amathematical paper.
Rothschild B.L. co-authored a paper with Erdos → Rothschild B.L.’sErdos number is 1.
• Kolaitis P.G. co-authored a paper with Rothschild B.L. → KolaitisP.G.’s Erdos number is 2.
• Gottlob G.,co-authored a paper with Kolaitis P.G. → Gottlob G.’sErdos number is 3.
Rowling J.K.’s Erdos number is ∞
Pichler 8 March, 2011 Page 18
![Page 19: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of](https://reader033.vdocument.in/reader033/viewer/2022060722/608283dbd07d610c115edb30/html5/thumbnails/19.jpg)
Database Theory 1. Overview 1.5. Query Languages: other Languages
Queries about the Erdos number
Recall the schema:• Author (AID integer, name string, age integer)• Paper (PID string, title string, year integer)• Write (AID integer, PID integer)
Assume that Erdos’s AID is 001
Query “AIDs of the authors whose Erdos number ≤ 1”
P1 = πPID(σAID=001Write)
A1 = πAID(P1 ./ Write)
Query “AIDs of the authors whose Erdos number ≤ 2”
P2 = πPID(A1 ./ Write)
A2 = πAID(P2 ./ Write)
Pichler 8 March, 2011 Page 19
![Page 20: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of](https://reader033.vdocument.in/reader033/viewer/2022060722/608283dbd07d610c115edb30/html5/thumbnails/20.jpg)
Database Theory 1. Overview 1.5. Query Languages: other Languages
Queries about the Erdos number (continued)
What about Q1=“AIDs of the authors whose Erdos number ≤ ∞”?
What about Q2=“AIDs of the authors whose Erdos number =∞”?
Can we express Q1 and Q2 in relational calculus (or equivalently inRA)?
• We cannot!• Formal methods to prove this negative result will be presented in the
course
Are there query languages that allow to express Q1 and Q2?
• Yes, we can do this in DATALOG (the topic of the next lecture)
Pichler 8 March, 2011 Page 20
![Page 21: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of](https://reader033.vdocument.in/reader033/viewer/2022060722/608283dbd07d610c115edb30/html5/thumbnails/21.jpg)
Database Theory 1. Overview 1.6. Some Fundamental Aspects of Query Languages
Some fundamental aspects of query languages
Questions dealt with in this lecture
Expressive power of a query language
Comparison of query languages
Complexity of query evaluation
Undecidability of important properties of queries (e.g., redundancy,safety)
Important special cases (conjunctive queries)
Inexpressibility results
Pichler 8 March, 2011 Page 21
![Page 22: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of](https://reader033.vdocument.in/reader033/viewer/2022060722/608283dbd07d610c115edb30/html5/thumbnails/22.jpg)
Database Theory 1. Overview 1.6. Some Fundamental Aspects of Query Languages
Learning objectives
Short recapitulation of• the notion of a relational database,• the notion of a query language and its semantics,• relational algebra,• first-order logic (free variables, quantifier rank),• relationcal calculus,• SQL.
Some fundamental aspects of query languages
Pichler 8 March, 2011 Page 22