the relational model - university of waterloodavid/cs348/lect-relational... · 2019-11-23 · the...
TRANSCRIPT
The Relational ModelFall 2019
School of Computer ScienceUniversity of Waterloo
Databases CS348
(University of Waterloo) The Relational Model 1 / 43
How do we ask Questions (and understand Answers)?
In the beginning ...
Set comprehension syntax for questions:{(x1, . . . , xk ) | 〈condition〉}
Answers:All k -tuples of values that satisfy 〈condition〉.
(University of Waterloo) The Relational Model 3 / 43
How do we ask Questions (and understand Answers)?Find all pairs of (natural) numbers that add to 5!
Question: {(x , y) | x + y = 5PLUS(x , y ,5)}Answer: {(0,5), (1,4), (2,3), (3,2), (4,1), (5,0)}
. . . but but but why? (explain this to a 6 year old!)because (0,5,5), etc., appear in PLUS!
Find pairs of numbers that add to the same number asthey subtract to (i.e., x + y = x − y )!
Question: {(x , y) | ∃z.PLUS(x , y , z) ∧ PLUS(z, y , x)}Answer: {(0,0), (1,0), . . . , (5,5)?}
. . . answer depends on the content (instance) of PLUS!
Find the neutral element (of addition)!
Question: {(x) | PLUS(x , x , x)}Answer: {(0)}
Addition Table
PLUS0 0 00 1 11 0 10 2 22 0 2· · ·
0 5 5· · ·
1 4 5· · ·
2 3 5· · ·· · ·
(University of Waterloo) The Relational Model 4 / 43
How do we ask Questions about Employees?
Find all employees who work for “Bob” !
Question: {(x , y) | EMP(x , y ,Bob)}Answer: {(Sue,CS), (Bob,CO)}
why? because (Sue,CS,Bob), etc., appear in EMP!
Find pairs of emp-s working for the same boss!
Q: {(x1, x2) | ∃y1, y2, z.EMP(x1, y1, z) ∧ EMP(x2, y2, z)}A: {(Sue,Bob), (Fred, John), (Jim,Eve)} ← is that all?
Find employees who are their own bosses!
Q: {(x) | ∃y .EMP(x , y , x)}A: {(Sue), (Bob)}
Employee Table
EMPname dept boss
Sue CS BobBob CO BobFred PM MarkJohn PM MarkJim CS FredEve CS FredSue PM Sue
(University of Waterloo) The Relational Model 5 / 43
The Relational Model
Idea
All information is organized in (a finite number of) relations.
Features:simple and clean data modelpowerful and declarative query/update languagessemantic integrity constraintsdata independence
(University of Waterloo) The Relational Model 7 / 43
Relational Databases
Components:
Universe a set of values D with equality (=)Relation predicate name R, and arity k of R (the number of
columns)instance: a relation R ⊆ Dk .
Database signature: finite set ρ of predicate namesinstance: a relation Ri for each Ri
Notation
Signature: ρ = (R1, . . . ,Rn) Instance: DB = (D,=,R1, . . . ,Rn)
(University of Waterloo) The Relational Model 8 / 43
Examples of Relational Databases
The integers, with addition and multiplication:ρ = (PLUS,TIMES) DB = (Z,=,PLUS,TIMES)
A Bibliography Database (see following slides)
. . .
(University of Waterloo) The Relational Model 9 / 43
A Bibliography Relational Database Signature
Predicates (also called table headers):
AUTHOR(aid, name)WROTE(author, publication)PUBLICATION(pubid, title)BOOK(pubid, publisher, year)JOURNAL(pubid, volume, no, year)PROCEEDINGS(pubid, year)ARTICLE(pubid, crossref, startpage, endpage)
⇒ identifiers, called attributes, label columns (needed for SQL)
(University of Waterloo) The Relational Model 10 / 43
A Bibliography Relational Database Instance
Relations (also called tables):
AUTHOR = { (1, John), (2,Sue) }
WROTE = { (1,1), (1,4), (2,3) }
PUBLICATION = { (1,Mathematical Logic),(3,Trans. Databases),(2,Principles of DB Syst.),(4,Query Languages) }
BOOK = { (1,AMS,1990) }
JOURNAL = { (3,35,1,1990) }
PROCEEDINGS = { (2,1995) }
ARTICLE = { (4,2,30,41) }
(University of Waterloo) The Relational Model 11 / 43
A Common Visualization for Relational Databases
AUTHORaid name
1 John2 Sue
WROTEauthor publication
1 11 42 3
PUBLICATIONpubid title
1 Mathematical Logic3 Trans. Databases2 Principles of DB Syst.4 Query Languages
(University of Waterloo) The Relational Model 12 / 43
A Common Visualization for Relational DatabaseSchemata†
†Relational database signatures plus integrity constraints.
(University of Waterloo) The Relational Model 13 / 43
Simple (Atomic) “Truth”
Idea
Relationships between objects (tuples) that are present in an instanceare true, relationships absent are false.
In the sample Bibliography database instance“John” is an author with id “1”: (1, John) ∈ AUTHOR;“Mathematical Logic” is a publication:
(1,Mathematical Logic) ∈ PUBLICATION;Moreover, it is a book published by “AMS” in “1990”:
(1,AMS,1990) ∈ BOOK;“John” wrote “Mathematical Logic”: (1,1) ∈ WROTE;“John” has NOT written “Trans. Databases”: (1,3) 6∈ WROTE;etc.
(University of Waterloo) The Relational Model 14 / 43
Query Conditions
Idea1: use variables to generalize conditions
AUTHOR(x , y) will be true of any valuation {x 7→ a, y 7→ b, . . .}exactly when the pair (a,b) ∈ AUTHOR
Idea2: build more complex conditions from simpler ones using . . .
Logical connectives:Conjunction (and): AUTHOR(x , y) ∧ WROTE(x , z)Disjunction (or): AUTHOR(x , y) ∨ PUBLICATION(x , y)Negation (not): ¬AUTHOR(x , y)
Quantifiers:Existential (there is. . . ): ∃x .author(x , y)
(University of Waterloo) The Relational Model 15 / 43
Conditions in the Relational Calculus
Idea
Conditions can be formulated using the language of first-order logic.
Definition (Syntax of Conditions)
Given a database schema ρ = (R1, . . . ,Rk ) and a set of variablenames {x1, x2, . . .}, conditions are formulas defined by
ϕ ::= Ri(xi1 , . . . , xik ) | xi = xj | ϕ ∧ ϕ | ∃xi .ϕ︸ ︷︷ ︸conjunctive formulas
| ϕ ∨ ϕ
︸ ︷︷ ︸positive formulas
| ¬ϕ
︸ ︷︷ ︸first-order formulas
(University of Waterloo) The Relational Model 16 / 43
First-order Variables and ValuationsHow do we interpret variables?
Definition (Valuation)
A valuation is a function θ that maps variable names to values in theuniverse:
θ : {x1, x2, . . .} → D.To denote a modification to θ in which variable x is instead mapped tovalue v , one writes:
θ[x 7→ v ].
Idea
Answers to queries⇔ valuations to free variables that make theformula true with respect to a database.
(University of Waterloo) The Relational Model 17 / 43
Complete Semantics for Conditions
Definition
The truth of a formula ϕ is defined with respect to1 a database instance DB = (D,=,R1,R2, . . .) , and2 a valuation θ : {x1, x2, . . .} → D
as follows:
DB, θ |= R(xi1 , . . . , xik ) if R ∈ ρ, (θ(xi1), . . . , θ(xik )) ∈ R
DB, θ |= xi = xj if θ(xi) = θ(xj)
DB, θ |= ϕ ∧ ψ if DB, θ |= ϕ and DB, θ |= ψ
DB, θ |= ¬ϕ if not DB, θ |= ϕ
DB, θ |= ∃xi .ϕ if DB, θ[xi 7→ v ] |= ϕ for some v ∈ D
(University of Waterloo) The Relational Model 18 / 43
Equivalences and Syntactic Sugar
Boolean Equivalences¬(¬ϕ1) ≡ ϕ1ϕ1 ∨ ϕ2 ≡ ¬(¬ϕ1 ∧ ¬ϕ2)ϕ1 → ϕ2 ≡ ¬ϕ1 ∨ ϕ2ϕ1 ↔ ϕ2 ≡ (ϕ1 → ϕ2) ∧ (ϕ2 → ϕ1). . .
First-order Equivalences∀x .ϕ ≡ ¬∃x .¬ϕ
(University of Waterloo) The Relational Model 19 / 43
Relational Calculus
Definition (Queries)
A query in the relational calculus is a set comprehension of the form{(x1, . . . , xk ) | ϕ}.
Definition (Query Answers)
An answer to a query {(x1, . . . , xk ) | ϕ} over DB is the relation{(θ(x1), . . . , θ(xk )) | DB, θ |= ϕ},
where {x1, . . . , xk} = FV (ϕ)†.
†FV denotes the free variables of ϕ.
(University of Waterloo) The Relational Model 20 / 43
On Formulas
Definition (Free Variables)
The free variables of a formula ϕ, written FV (ϕ), are defined asfollows:
FV (R(xi1 , . . . , xik )) ≡ {xi1 , . . . , xik}FV (xi = xj) ≡ {xi , xj}FV (ϕ ∧ ψ) ≡ FV (ϕ) ∪ FV (ψ)
FV (¬ϕ) ≡ FV (ϕ)
FV (∃xi .ϕ) ≡ FV (ϕ)− {xi}
A formula that has no free variables expresses is called a sentence.
(University of Waterloo) The Relational Model 21 / 43
Example
Find pairs of emp-s working for the same boss!
Q: {(x1, x2) | ∃y1, y2, z.EMP(x1, y1, z) ∧ EMP(x2, y2, z)}A: {(Sue,Fred), . . .}
because:1 EMP, {x1 7→ Sue, y1 7→ CS, z 7→ Bob, . . .} |= EMP(x1, y1, z)
2 EMP, {x2 7→ Fred, y2 7→ CO, z 7→ Bob, . . .} |= EMP(x2, y2, z)
3 EMP, {x1 7→ Sue, y1 7→ CS, x2 7→ Fred, y2 7→ CO, z 7→ Bob, . . .}|= EMP(x1, y1, z) ∧ EMP(x2, y2, z)
4 EMP, {x1 7→ Sue, x2 7→ Fred, . . .}|= ∃y1, y2, z.EMP(x1, y1, z) ∧ EMP(x2, y2, z)
Emp Table
EMPname dept boss
Sue CS BobFred CO BobBob PM MarkJohn PM MarkJim CS FredEve CS FredSue PM Sue
(University of Waterloo) The Relational Model 22 / 43
Sample Queries
Over numbers (with addition and multiplication):list all composite numberslist all prime numbers
Over the bibliography database:list all publicationslist titles of all publicationslist titles of all bookslist all publications without authorslist (pairs of) coauthor nameslist titles of publications written by a single author
(University of Waterloo) The Relational Model 23 / 43
How do we ask Questions (and understand Answers)?Find the neutral element (of addition)!
Question: {(x) | PLUS(x , x , x)}Answer: {(0)}
but shouldn’t the query really be{(x) | ∀y .PLUS(x , y , y) ∧ PLUS(y , x , y)} (∗)
Idea
(∗) is the same as {(x) | ∀y .PLUS(x , y , y)}because PLUS is commutative
is the same as {(x) | PLUS(x , x , x)}because PLUS is monotone
⇒ Laws of Arithmetic for Natural Numbers
Addition Table
PLUS0 0 00 1 11 0 10 2 02 0 2· · ·
0 5 5· · ·
1 4 5· · ·
2 3 5· · ·· · ·
(University of Waterloo) The Relational Model 25 / 43
Laws a.k.a. Integrity Constraints
Idea
What must be always true for the natural numbers (i.e., for PLUS)?
addition is commutative∀x , y , z.PLUS(x , y , z)→ PLUS(y , x , z)
(¬∃x , y , z.PLUS(x , y , z) ∧ ¬PLUS(y , x , z))
addition is a (relational representation of a) binary function∀x , y , z1, z2.PLUS(x , y , z1) ∧ PLUS(x , y , z2)→ z1 = z2
(¬∃x , y , z1, z2.PLUS(x , y , z1) ∧ PLUS(x , y , z2) ∧ ¬(z1 = z2))
addition is a total function∀x , y .∃z.PLUS(x , y , z)
addition is monotone in both arguments (harder), etc., etc.
(University of Waterloo) The Relational Model 26 / 43
Laws a.k.a. Integrity Constraints for Employees
Idea
Integrity constraints⇒ yes/no conditions that must be true in every valid database instance.
Every Boss is an Employee∀x , y , z.EMP(x , y , z)→ ∃u,w .EMP(z,u,w)
Every Boss manages a unique Department∀x1, x2, y1, y2, z.EMP(x1, y1, z) ∧ EMP(x2, y2, z)→ y1 = y2
No Boss cannot have another Employee serving as their Boss∀x , y , z.EMP(x , y , z)→ EMP(z, y , z)
(University of Waterloo) The Relational Model 27 / 43
Integrity Constraints
A relational signature captures only the structure of relations.
Idea
Valid database instances satisfy additional integrity constraints.
Values of a particular attribute belong to a prescribed data type.Values of attributes are unique among tuples in a relation (keys).Values appearing in one relation must also appear in anotherrelation (referential integrity).Values cannot appear simultaneously in certain relations(disjointness).Values in certain relation must appear in at least one of anotherset of relations (coverage).. . .
(University of Waterloo) The Relational Model 28 / 43
Example Revisited (Bibliography)
Typing Constraints / Domain ContraintsAuthor id’s are integers.Author names are strings.
Uniqueness of Values / Identification (keys)Author id’s are unique and determine author names.Publication id’s are unique as well.Articles can be identified by their publication id.Articles can also be identified by the publication id ofthe collection they have appeared in and theirstarting page number.
Referential Integrity / Foreign KeysBooks, journals, proceedings and articles arepublications.The components of a WROTE tuple must be anauthor and a publication.
(University of Waterloo) The Relational Model 29 / 43
Example Revisited (cont.)
DisjointnessBooks are different from journals.Books are also different from proceedings.
CoverageEvery publication is a book or a journal or aproceedings or an article.Every article appears in a journal or in a proceedings.
(University of Waterloo) The Relational Model 30 / 43
Example Revisited (cont.)
(University of Waterloo) The Relational Model 31 / 43
Views and Integrity Constraints
Idea
Answers to queries can be used to define derived relations (views)⇒ extension of a DB schema
subtraction, complement, . . .collection-style publication, editor, . . .
In general, a view is an integrity constraint of the form∀x1, . . . , xk .R(x1, . . . , xk )↔ ϕ
for R a new relation name and x1, . . . , xk free variables of ϕ.
(University of Waterloo) The Relational Model 32 / 43
Database Instances and Integrity Constraints
Definition (Relational Database Schema)
A relational database schema is a signature ρ and a (finite) set ofintegrity constraints Σ over ρ.
Definition
A relational database instance DB (over a schema ρ) conforms to aschema Σ (written DB |= Σ) if and only if DB, θ |= ϕ for any integrityconstraint ϕ ∈ Σ and any valuation θ.
(University of Waterloo) The Relational Model 33 / 43
Story so far. . .
1 databases⇔ relational structures2 queries⇔ set comprehensions with formulas in First-Order logic3 integrity constraints⇔ closed formulas in FO logic
. . . so is there anything new here?
⇒YES: database instances must be finite
(University of Waterloo) The Relational Model 35 / 43
Unsafe Queries{(y) | ¬∃x .author(x , y)}{(x , y , z) | book(x , y , z) ∨ proceedings(x , y)}{(x , y) | x = y}
⇒ we want only queries with finite answers (over finite databases).
Definition (Domain-independent Query)
A query {(x1, . . . , xk ) | ϕ} is domain-independent ifDB1, θ |= ϕ ⇐⇒ DB2, θ |= ϕ
for any pair of database instances DB1 = (D1,=,R1, . . . ,Rk) andDB2 = (D2,=,R1, . . . ,Rk) and all θ.
Theorem
Answers to domain-independent queries contain only values that existin R1, . . . ,Rk (the active domain).
Domain-independent + finite database⇒ “safe”(University of Waterloo) The Relational Model 36 / 43
Safety and Query SatisfiabilityTheorem
Satisfiability1 of first-order formulas is undecidable;co-r.e. in generalr.e for finite databases
Proof.
Reduction from PCP (see Abiteboul et. al. book, p.122-126).
Theorem
Domain-independence of first-order queries is undecidable.
Proof.
ϕ is satisfiable iff {(x , y) | (x = y) ∧ ϕ} is not domain-independent.1Is there a database for which the answer is non-empty?
(University of Waterloo) The Relational Model 37 / 43
Range-restricted Queries
Definition (Range restricted formulas)
A formula ϕ is range restricted when, for ϕi that are also rangerestricted, ϕ has the form
R(xi1 , . . . , xik ),ϕ1 ∧ ϕ2 ,ϕ1 ∧ (xi = xj) ({xi , xj} ∩ FV (ϕ1) 6= ∅),∃xi .ϕ1 (xi ∈ FV (ϕ1)),ϕ1 ∨ ϕ2 (FV (ϕ1) = FV (ϕ2)), orϕ1 ∧ ¬ϕ2 (FV (ϕ2) ⊆ FV (ϕ1)).
Theorem
Range-restricted⇒ Domain-independent.
(University of Waterloo) The Relational Model 38 / 43
Domain Independent v.s. Range-restricted
Do we lose expressiveness by restricting to Range-restricted queries?
Theorem
Every domain-independent query can be written equivalently as arange restricted query.
Proof.
1 restrict every variable in ϕ to active domain,2 express the active domain using a unary query over the database
instance.
(University of Waterloo) The Relational Model 39 / 43
Computational Properties
Evaluation of every query terminates⇒ relational calculus is not Turing completeData Complexity in the size of the database, for a fixed query.⇒ in PTIME⇒ in LOGSPACE⇒ AC0 (constant time on polynomially many CPUs in parallel)Combined complexity⇒ in PSPACE⇒ can express NP-hard problems (encode SAT)
(University of Waterloo) The Relational Model 40 / 43
Query Evaluation vs. Theorem Proving
Query EvaluationGiven a query {(x1, . . . , xk ) | ϕ} and a finite databaseinstance DB find all answers to the query.
Query SatisfiabilityGiven a query {(x1, . . . , xk ) | ϕ} determine whether thereis a (finite) database instance DB for which the answer isnon-empty.
much harder (undecidable) problemcan be solved for fragments of the query language
(University of Waterloo) The Relational Model 41 / 43
Query Equivalence and DB SchemaDo we ever need the power of theorem proving?
Definition (Query Subsumption)
A query {(x1, . . . , xk ) | ϕ} subsumes a query {(x1, . . . , xk ) | ψ} withrespect to a database schema Σ if{(θ(x1), . . . , θ(xk )) | DB, θ |= ψ} ⊆ {(θ(x1), . . . , θ(xk )) | DB, θ |= ϕ}
for every database DB such that DB |= Σ.
necessary for query simplificationequivalent to proving ∧
φi∈Σ
φi
→ (∀x1, . . . xk .ψ → ϕ)
undecidable in general; decidable for fragments of relationalcalculus
(University of Waterloo) The Relational Model 42 / 43
What queries cannot be expressed in RC?
Note
RC is not Turing-complete⇒ there must be computable queries that cannot be written in RC.
Built-in Operationsordering, arithmetic, string operations, etc.
Counting/Aggregationcardinality of sets (parity)
Reachability/Connectivity/. . .paths in a graph (binary relation)
Model extensions: Incompleteness/Inconsistencytuples with unknown (but existing) valuesincomplete relations and open world assumptionconflicting information (e.g., from different data sources)
(University of Waterloo) The Relational Model 43 / 43