database theory - vu 181.140, ss 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the...

22
Database Theory Database Theory VU 181.140, SS 2011 1. Introduction: Relational Query Languages Reinhard Pichler Institut f¨ ur Informationssysteme Arbeitsbereich DBAI Technische Universit¨ at Wien 8 March, 2011 Pichler 8 March, 2011 Page 1

Upload: others

Post on 07-Nov-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of

Database Theory

Database TheoryVU 181.140, SS 2011

1. Introduction: Relational Query Languages

Reinhard Pichler

Institut fur InformationssystemeArbeitsbereich DBAI

Technische Universitat Wien

8 March, 2011

Pichler 8 March, 2011 Page 1

Page 2: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of

Database Theory

Outline

1. Overview1.1 Databases and Query Languages1.2 Query Languages: Relational Algebra1.3 Query Languages: Relational (Domain) Calculus1.4 Query Languages: SQL1.5 Query Languages: other Languages1.6 Some Fundamental Aspects of Query Languages

Pichler 8 March, 2011 Page 2

Page 3: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of

Database Theory 1. Overview 1.1. Databases and Query Languages

A short history of databases

1970’s: relational revolution• Relational model of databases (E. F. Codd), truly realizing Physical

data independence• Relational query languages (SQL)

SEQUEL: SystemR from IBMQUEL: Ingress from UC Berkeley

1980’s• Relational query optimization• Constraints, dependency theory• Datalog (extend the query language with recursion)

1990’s• New models: temporal databases, OO, OR databases• Data mining, data warehousing

Late 1990’s until now: Internet revolution• Data integration on the web, managing huge data volumes• XML, Sensor networks, P2P

Pichler 8 March, 2011 Page 3

Page 4: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of

Database Theory 1. Overview 1.1. Databases and Query Languages

Database theory

Cut-crossing many areas in Computer Science and Mathematics

• Complexity → efficiency of query evaluation, optimization

• Logics, Finite model theory → expressiveness

• Logic programming, constraint satisfaction (AI) → Datalog

• Graph theory → (hyper)tree-decompositions

• Automata → XML query model, data stream processing

Benefit from other fields on the one hand, contribute new results onthe other hand

Pichler 8 March, 2011 Page 4

Page 5: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of

Database Theory 1. Overview 1.1. Databases and Query Languages

Relational data model

A database (also called structure) is a collection of relations (ortables)

Each database has a schema, i.e., the vocabulary (or signature)• Each relation r has a list of attributes (or columns) → denoted

schema(r)

Each attribute A has a domain (or universe) denoted dom(A)

• We definedom(r) =

[A∈schema(r)

dom(A)

Each relation contains a set of tuples (or rows)• Formally, a tuple in r is a mapping t : schema(r) → dom(r) such

that t(A) ∈ dom(A) for all A ∈ schema(r)

Note: For ease of notation, we often use ordered lists of attributesinstead of sets.

Pichler 8 March, 2011 Page 5

Page 6: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of

Database Theory 1. Overview 1.1. Databases and Query Languages

Example

Schema• Author (AID integer, name string, age integer)• Paper (PID string, title string, year integer)• Write (AID integer, PID integer)

Instance• {〈142, Knuth, 73〉, 〈123, Ullman, 67〉, . . .}• {〈181140pods, Querycontainment, 1998〉, . . .}• {〈123, 181140pods〉, 〈142, 193214algo〉, . . .}

Pichler 8 March, 2011 Page 6

Page 7: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of

Database Theory 1. Overview 1.1. Databases and Query Languages

Relational query languages

Query languages are formal languages with syntax and semantics:

• Syntax: algebraic or logical formalism or specific query language (likeSQL). Uses the vocabulary of the DB schema

• Semantics: M[Q] a mapping that transforms a database (instance)D into a database (instance) D ′ = M[Q](D) (i.e. the databaseM[Q](D) is the answer of Q over the DB D)

We always disregard queries that are dependent on the particularrepresentation of domain values. We thus focus on generic queries

Definition

Generic queries are queries that produce isomorphic results on isomorphicdatabases.

Pichler 8 March, 2011 Page 7

Page 8: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of

Database Theory 1. Overview 1.2. Query Languages: Relational Algebra

Relational Algebra (RA)

σ → Selection∗π → Projection∗× → Cross product∗./ → Join

ρ → Rename∗− → Difference∗∪ → Union∗∩ → Intersection ∗Primitive operations, all others

can be obtained from these.

For precise definition of RA see any DB textbook or Wikipedia.

Pichler 8 March, 2011 Page 8

Page 9: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of

Database Theory 1. Overview 1.2. Query Languages: Relational Algebra

Example

Recall the schema:• Author (AID integer, name string, age integer)• Paper (PID string, title string, year integer)• Write (AID integer, PID integer)

Example query: PIDs of the papers NOT written by Knuth

πPID(Paper)− πPID(Write ./ σname=”Knuth”(Author))

Example query: AIDs of authors who wrote exactly one paper

S2 = Write ./AID=AID′∧PID 6=PID′ ρAID′←AID,PID′←PID(Write)

S = πAIDWrite − πAIDS2

Pichler 8 March, 2011 Page 9

Page 10: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of

Database Theory 1. Overview 1.3. Query Languages: Relational (Domain) Calculus

Recall First-order Logic (FO)

Formulas built using:

Quantifiers: ∀, ∃,

Boolean connectives: ∧, ∨, ¬Parentheses: (, )

Atoms: R(t1, . . . , tn), t1 = t2

Example database (i.e. a first-order structure):

Schema: E(FROM string, TO string)

Instance: {〈v , u〉, 〈u,w〉, 〈w , v〉}Example sentences of FO:

∀x∃yE (x , y)

∀x∃y∃z(E (z , x) ∧ E (x , y))

∃x∀y∃z(E (z , x) ∧ E (x , y))

∀x∃y∃z(¬(y = z) ∧ E (x , y) ∧ E (x , z))

Pichler 8 March, 2011 Page 10

Page 11: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of

Database Theory 1. Overview 1.3. Query Languages: Relational (Domain) Calculus

Free variables of a formulaFO formulas may have free variables (i.e., not bound by a quantifier).

free : Formulae→ 2Variables

free(R(t1, . . . , tn)) := {ti | ti is a variable, 1 ≤ i ≤ n}free(t1 = t2) := {ti | ti is a variable, 1 ≤ i ≤ 2}free(ϕ ∧ ψ) := free(ϕ) ∪ free(ψ)

free(ϕ ∨ ψ) := free(ϕ) ∪ free(ψ)

free(¬ϕ) := free(ϕ)

free(∃x ϕ) := free(ϕ)− {x}free(∀x ϕ) := free(ϕ)− {x}

Example

free(∃zR(x , y , z)) = {x , y}free

(∃x1∃x2 R(x1, x2) ∧ S(x2, x3)

)= {x3}

Note: if free(ϕ) = ∅, then ϕ is a sentence.

Pichler 8 March, 2011 Page 11

Page 12: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of

Database Theory 1. Overview 1.3. Query Languages: Relational (Domain) Calculus

Relational (Domain) CalculusIf ϕ is an FO formula with free(ϕ) = {x1, . . . , xn}, then

{〈x1, . . . , xn〉 | ϕ}

is an n-ary query of the domain calculus. On database A with domain A,it returns the set of all tuples 〈a1, . . . , an〉 ∈ (A)n such that the sentenceϕ[a1, . . . , an] obtained from ϕ by replacing each xi by ai evaluates to truein the structure A.

All free variables of ϕ must occur in the output tuple 〈x1, . . . , xn〉.Slight syntactic generalization: Variables may be repeated in theoutput tuple of the query.Example: We may write {〈x1, . . . , xn, x1〉 | ϕ} as a shortcut for{〈x1, . . . , xn, x

′1〉 | ϕ ∧ x1 = x ′1}.

We often simply write ϕ rather than {〈x1, . . . , xn〉 | ϕ}(i.e., the free variables of a formula are considered as the output).

In particular, we usually write ϕ rather than {〈〉 | ϕ} for Booleanqueries (n = 0).

Pichler 8 March, 2011 Page 12

Page 13: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of

Database Theory 1. Overview 1.3. Query Languages: Relational (Domain) Calculus

Example

Recall the schema:• Author (AID integer, name string, age integer)• Paper (PID string, title string, year integer)• Write (AID integer, PID integer)

Example query: “PIDs of the papers NOT written by Knuth”

{PID | ∃T∃Y (Paper(PID,T ,Y ) ∧

∧¬(∃A∃AID(Write(AID,PID)∧Author(AID, ”Knuth”,A))))}

Example query: “AIDs of authors who wrote exactly one paper”

{AID | ∃PID(Write(AID,PID)∧¬∃PID2(Write(AID,PID2)∧PID 6= PID2))}

Pichler 8 March, 2011 Page 13

Page 14: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of

Database Theory 1. Overview 1.3. Query Languages: Relational (Domain) Calculus

Quantifier rank of a formulaWe will need this for the future:

qr : Formulae→ Nqr(R(t1, . . . , tn)) := 0

qr(t1 = t2)) := 0

qr(ϕ ∧ ψ) := max(qr(ϕ), qr(ψ))

qr(ϕ ∨ ψ) := max(qr(ϕ), qr(ψ))

qr(¬ϕ) := qr(ϕ)

qr(∃x ϕ) := qr(ϕ) + 1

qr(∀x ϕ) := qr(ϕ) + 1

Example

qr(∃x1∃x2 (x1 = x2 ∧ ¬∃x3 R(x1, x2, x3))

)= 3.

qr(∃x1 (∃x2 x1 = x2) ∧ ¬(∃x3 S(x1, x3))

)= 2.

Pichler 8 March, 2011 Page 14

Page 15: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of

Database Theory 1. Overview 1.4. Query Languages: SQL

SQL (Structured Query Language)

A standardized language:• most database management systems (DBMSs) implement SQL

SQL is not only a query language:• supports constructs to manage the database (create/delete

tables/rows)

Query constructs of SQL (SELECT/FROM/WHERE/JOIN) arebased on relational algebra

Example query: “AIDs of the co-authors of Knuth”

SELECT W1.AIDFROM Write W1, Write W2WHERE W1.PID=W2.PID AND W2.AID=”Knuth”

Pichler 8 March, 2011 Page 15

Page 16: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of

Database Theory 1. Overview 1.4. Query Languages: SQL

Relational Algebra vs. Relational Calculus vs. SQL

Theorem (following Codd 1972)

Relational algebra, relational calculus, and SQL queries essentially haveequal expressive power.

queries in the 3 languages can be translated from one language toanother while preserving the query answer

all 3 languages have their advantages:

1 use the flexible syntax of relational calculus to specify the query2 use the simplicity of relational algebra for query

simplification/optimization3 use SQL to implement the query over a DB

Restrictions apply: no aggregation in SQL queries,“safety” requirements for relational calculus.

Pichler 8 March, 2011 Page 16

Page 17: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of

Database Theory 1. Overview 1.5. Query Languages: other Languages

Towards other query languages languages

Paul Erdos (1913-1996), oneof the most prolific writers ofmathematical papers, wrotearound 1500 mathematical ar-ticles in his lifetime, mostlyco-authored. He had 509 di-rect collaborators

Pichler 8 March, 2011 Page 17

Page 18: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of

Database Theory 1. Overview 1.5. Query Languages: other Languages

Erdos number

The Erdos number, is a way of describing the “collaborativedistance”, in regard to mathematical papers, between an author andErdos.

An author’s Erdos number is defined inductively as follows:

• Paul Erdos has an Erdos number of zero.• The Erdos number of author M is one plus the minimum among the

Erdos numbers of all the authors with whom M co-authored amathematical paper.

Rothschild B.L. co-authored a paper with Erdos → Rothschild B.L.’sErdos number is 1.

• Kolaitis P.G. co-authored a paper with Rothschild B.L. → KolaitisP.G.’s Erdos number is 2.

• Gottlob G.,co-authored a paper with Kolaitis P.G. → Gottlob G.’sErdos number is 3.

Rowling J.K.’s Erdos number is ∞

Pichler 8 March, 2011 Page 18

Page 19: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of

Database Theory 1. Overview 1.5. Query Languages: other Languages

Queries about the Erdos number

Recall the schema:• Author (AID integer, name string, age integer)• Paper (PID string, title string, year integer)• Write (AID integer, PID integer)

Assume that Erdos’s AID is 001

Query “AIDs of the authors whose Erdos number ≤ 1”

P1 = πPID(σAID=001Write)

A1 = πAID(P1 ./ Write)

Query “AIDs of the authors whose Erdos number ≤ 2”

P2 = πPID(A1 ./ Write)

A2 = πAID(P2 ./ Write)

Pichler 8 March, 2011 Page 19

Page 20: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of

Database Theory 1. Overview 1.5. Query Languages: other Languages

Queries about the Erdos number (continued)

What about Q1=“AIDs of the authors whose Erdos number ≤ ∞”?

What about Q2=“AIDs of the authors whose Erdos number =∞”?

Can we express Q1 and Q2 in relational calculus (or equivalently inRA)?

• We cannot!• Formal methods to prove this negative result will be presented in the

course

Are there query languages that allow to express Q1 and Q2?

• Yes, we can do this in DATALOG (the topic of the next lecture)

Pichler 8 March, 2011 Page 20

Page 21: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of

Database Theory 1. Overview 1.6. Some Fundamental Aspects of Query Languages

Some fundamental aspects of query languages

Questions dealt with in this lecture

Expressive power of a query language

Comparison of query languages

Complexity of query evaluation

Undecidability of important properties of queries (e.g., redundancy,safety)

Important special cases (conjunctive queries)

Inexpressibility results

Pichler 8 March, 2011 Page 21

Page 22: Database Theory - VU 181.140, SS 2011 [1.5ex] 1 ...all 3 languages have their advantages: 1 use the exible syntax of relational calculus to specify the query 2 use the simplicity of

Database Theory 1. Overview 1.6. Some Fundamental Aspects of Query Languages

Learning objectives

Short recapitulation of• the notion of a relational database,• the notion of a query language and its semantics,• relational algebra,• first-order logic (free variables, quantifier rank),• relationcal calculus,• SQL.

Some fundamental aspects of query languages

Pichler 8 March, 2011 Page 22