uncertainty in databases -...

64
Uncertainty in Databases Lecture 2: Essential Database Foundations

Upload: phungdiep

Post on 31-Aug-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Uncertainty in Databases

Lecture 2: Essential Database Foundations

Page 2: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Table of Contents

1 Historical Background

2 Database Model

3 Queries

4 Constraints

5 Complexity

6 References

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 3: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Codd’s VisionCodd Catches OnTop Academic RecognitionSelected Publication Venues

Table of Contents

1 Historical Background

2 Database Model

3 Queries

4 Constraints

5 Complexity

6 References

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 4: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Codd’s VisionCodd Catches OnTop Academic RecognitionSelected Publication Venues

Codd’s Vision

I 1970: Edgar F. Codd invents the Relational Database modelI Data stored as a collection fo relations that conform to a

schema and can be accessed through a query language overthe schema

I Separation between the logical and phisical layersI Work done in IBM San Jose, which is now IBM AlmadenI E. F. Codd: A Relational Model of Data for Large Shared Data

Banks. In Communications of the ACM 13(6): 377-387 (1970)

I 1972: Codd introduced the relational algebra and therelational calculus (logical view of database querying), andproved their equal expressive power [Cod72]

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 5: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Codd’s VisionCodd Catches OnTop Academic RecognitionSelected Publication Venues

Codd Catches On

I 1973: Michael Stonebraker and Eugene Wong implementCodd’s vision in INGRES, which commercialised in 1983 andevolved to Postgres (now PostgreSQL) in 1989

I 1974: A group from the IBM San Jose lab implements Codd’svision in System R, which evolved to DB2 in 1983

I SQL initially developed at IBM by Donald D. Chamberlin andRaymond F. Boyce [CB74]

I 1977: Influenced by Codd, Larry Ellison founds SoftwareDevelopment Laboratories, which becomes RelationalSoftware in 1979, which becomes Oracle Systems Corporationin 1982, named after its flagship product—Oracle database

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 6: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Codd’s VisionCodd Catches OnTop Academic RecognitionSelected Publication Venues

Top Academic Recognition

1981: Codd receives the ACM Turing Award

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 7: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Codd’s VisionCodd Catches OnTop Academic RecognitionSelected Publication Venues

Breaking News

Last week: ACM announces Michael Stonebraker as the 2014Turing Award winner for fundamental contributions to the concepts

and practices underlying modern database systems

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 8: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Codd’s VisionCodd Catches OnTop Academic RecognitionSelected Publication Venues

Selected Publication Venues

I Conferences:I SIGMOD: ACM Special Interest Group on Management of

Data (since 1975)I PODS: ACM Symp. on Principles of Database Systems (since

1982)I VLDB: Intl. Conf. on Very Large Databases (since 1975)I ICDE: IEEE Intl. Conf. on Data Engineering (since 1984)I ICDT: Intl. Conference on Database Theory (since 1986)I EDBT: Intl. Conference on Extending Database Technology

(since 1988)

I Journals:I TODS: ACM Transactions on Database Systems (since 1976)I VLDBJ: The VLDB Journal (since 1992)I SIGMOD REC: ACM SIGMOD Record (since 1969)

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 9: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

RelationsDatabase SchemasLogical Viewpoint

Table of Contents

1 Historical Background

2 Database Model

3 Queries

4 Constraints

5 Complexity

6 References

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 10: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

RelationsDatabase SchemasLogical Viewpoint

Relations

I A relation r consists of a heading (A1, . . . , Am), which is asequence (A1, . . . , Am) of distinct attributes, and a body,which is a finite collection of tuples t = (a1, . . . , am) of values

I We assume some infinite domain of values (or constants)

I By a convenient abuse of notation, a relation is oftenidentified with its bodyI (e.g., t ∈ r means that t is a tuple in the body of r)

I We may refer to the ith value in a tuple t ∈ R as t.Ai or t[i]

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 11: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

RelationsDatabase SchemasLogical Viewpoint

Database Schemas

I A relation schema has a relation name (or relation symbol) Rand a heading (A1, . . . , An), which is again a sequence ofattributes; it is denoted by R(A1, . . . , Am)I Or simply R if the attributes are not important or clear from

the context

I The arity of R(A1, . . . , Am) is m, and is denoted by ar(R)

I Sometimes the attributes are not important, and we may usejust R/m to specify that R is a relation name of arity m

I A schema is a pair S = (R,Σ), where R is a set of relationschemas with distinct names, and Σ is a set of constraintsover RI R is often called a signatureI We later discuss languages of constraints

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 12: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

RelationsDatabase SchemasLogical Viewpoint

Database Instance

I A relation r is said to be over a relation schema R if r and Rhave the same heading

I A database instance (or just instance) I over a schemaS = (R,Σ) associates with every relation name R a relationRI over R, such that all the constraints in Σ are satisfied

I We denote by Inst(S) the set of all the instances over S

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 13: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

RelationsDatabase SchemasLogical Viewpoint

Logical Viewpoint

I It is convenient and common to view the database as a logicalsystem (first-order of higher-order logic)I vocabulary=schema+built-in predicates (e.g., <, >, =), and

structure=instance

I Database queries are viewed as logical formulas ϕ(x) over thedatabase: Q(I) = {a | I |= ϕ(a)}

I But there are some significant restrictions of the logic:I We usually have only relation (no function) symbolsI We usually consider only finite structures (cf. finite-model

theory)I Queries should be independent of the domain outside the

database (cf. relational calculus)

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 14: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

RelationsDatabase SchemasLogical Viewpoint

On Finite Structures

I Consider the set F = {ϕi | i = 1, 2, . . . } of sentences whereϕi is the sentence “R has at least i distinct tuples”

I Each ϕi is expressible in first-order logic

I Every finite subset of F has a finite model, but there is nofinite model for F

I Hence, the compactness theorem no longer holds in the finite

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 15: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Relational AlgebraSQLConjunctive Queries

Table of Contents

1 Historical Background

2 Database Model

3 Queries

4 Constraints

5 Complexity

6 References

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 16: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Relational AlgebraSQLConjunctive Queries

Queries

I The queries we will consider are such that produce a relationfrom the database

I Formally, a query Q over a schema S is associated with aheading (A1, . . . , Ak), and it maps every instance I ∈ Inst(S)into a relation with that heading

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 17: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Relational AlgebraSQLConjunctive Queries

Example

Takes

student cno

Ahuva 1Alon 1

Ahuva 2

Course

cno cname

1 AI2 DB3 PL

Find all pairs (s, c) of students and courses, such that there existsa course number x where both Takes(s, x) and Course(x, c) hold

student cname

Ahuva AIAlon AI

Ahuva DB

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 18: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Relational AlgebraSQLConjunctive Queries

Boolean Queries

I A special case is where k = 0, and then the result eithercontains the empty tuple or is empty; in this case we say thatthe query is Boolean

I Example: Is it the case that for some x, bothTakes(’Ahuva’, x) and Course(x, ’DB’) hold?

I We often denoteI Q(I) = {()} by Q(I) = true or I |= QI Q(I) = ∅ by Q(I) = false or I 6|= Q

I Boolean queries are very important in the analysis querylanguages (expressiveness, complexity, optimization andequivalence, etc.)

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 19: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Relational AlgebraSQLConjunctive Queries

Relational Algebra

I Introduced by Codd [Cod72]

I Used by existing database systems, mainly for internalquery-plan optimization

I A collection of operations over relationsI Unary: r → t; binary: (r, s)→ t

I Queries via:1 Applying the operators to the database relations2 Composition

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 20: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Relational AlgebraSQLConjunctive Queries

Algebraic Operators

I Union (∪), difference (−)I R ∪ S and R− S allowed if R and S are union compatible,

that is, the have the same heading

I Cartesian product (×)I R× S allowed only when R and S have disjoint headings

I Projection (π)I πA′

1,...,A′k(R) allowed if A′

1, . . . , A′k are distinct attributes of R

I Selection (σ)I σϕ(R) allowed if ϕ is a condition over the attributes of R

(e.g., A1 = A2 or A1 6= A2)

I Renaming (ρ)I ρA→B(R) allowed if A is an attribute of R and B is not an

attribute of R

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 21: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Relational AlgebraSQLConjunctive Queries

Example

Takesstudent cno

Ahuva 1Alon 1

Ahuva 2

Coursecno cname

1 AI2 DB3 PL

Takes× ρcno→cCourse

student cno c cname

Ahuva 1 1 AIAlon 1 1 AI

Ahuva 2 2 DBAhuva 1 2 DBAlon 1 3 PL

Ahuva 2 3 PL

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 22: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Relational AlgebraSQLConjunctive Queries

Example

Takesstudent cno

Ahuva 1Alon 1

Ahuva 2

Coursecno cname

1 AI2 DB3 PL

σcno=c(Takes× ρcno→cCourse)

student cno c cname

Ahuva 1 1 AIAlon 1 1 AI

Ahuva 2 2 DB

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 23: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Relational AlgebraSQLConjunctive Queries

Example

Takesstudent cno

Ahuva 1Alon 1

Ahuva 2

Coursecno cname

1 AI2 DB3 PL

πstudent,cno,cname

(σcno=c(Takes× ρcno→cCourse)

)student cno cname

Ahuva 1 AIAlon 1 AI

Ahuva 2 DB

In short: Takes ./ Course (natural join)

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 24: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Relational AlgebraSQLConjunctive Queries

Example

Takesstudent cno

Ahuva 1Alon 1

Ahuva 2

Coursecno cname

1 AI2 DB3 PL

πstudent,cname

(σcno=c(Takes× ρcno→cCourse)

)student cname

Ahuva AIAlon AI

Ahuva DB

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 25: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Relational AlgebraSQLConjunctive Queries

SQL

I SQL (Structured Query Language) is natural language toexpress relational algebraI SELECT (projection) . . . AS (rename)I FROM (Cartesian product)I WHERE (selection)I UNIONI MINUS

I And much more, e.g., aggregate operators (e.g., COUNT,SUM), clustering operators (e.g., GROUP BY, HAVING),ranking (e.g., ORDER BY, LIMIT), and more

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 26: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Relational AlgebraSQLConjunctive Queries

The Example in SQL

Takes

student cno

Ahuva 1Alon 1

Ahuva 2

Course

cno cname

1 AI2 DB3 PL

Find all pairs (s, c) of students and courses, such that there existsa course number x where both Takes(s, x) and Course(x, c) hold

SELECT S.student, C.cname

FROM Takes T, Course C

WHERE T.cno = C.cno

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 27: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Relational AlgebraSQLConjunctive Queries

Conjunctive Queries

I Conjunctive Queries (CQs) are SELECT-FROM-WHEREqueries (no MINUS, UNION, etc.) such that all the WHEREconditions are equalities among attributes

I CQs are typically represented in the following FOL notation:

Q(x) :− ∃y[ϕ1(x,y) ∧ · · · ∧ ϕk(x,y)

]where:I x and y are disjoint sequences of variablesI Each ϕi(x,y) is a an atomic formula of the form

R(τ1, . . . , τm) where R is an m-ary relation in the schema andeach τj is either a variable in x, a variable in y, or a constantvalue (e.g., 7 or ’Ahuva’)

I Every variable in x occurs at least once on the right hand side

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 28: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Relational AlgebraSQLConjunctive Queries

CQ Terminology and Notation

Q(x) :− ∃y[ϕ1(x,y) ∧ · · · ∧ ϕk(x,y)

]For simplification, quantification and conjunction are omitted:

Q(x)︸ ︷︷ ︸head

:−atom︷ ︸︸ ︷

ϕ1(x,y) , · · · , ϕk(x,y)︸ ︷︷ ︸body

A variables in x is called a free or head variable, and a variable iny is called an existential variable

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 29: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Relational AlgebraSQLConjunctive Queries

CQ Terminology and Notation

Q(x) :− ∃y[ϕ1(x,y) ∧ · · · ∧ ϕk(x,y)

]For simplification, quantification and conjunction are omitted:

Q(x)

︸ ︷︷ ︸head

:−

atom︷ ︸︸ ︷

ϕ1(x,y) , · · · , ϕk(x,y)

︸ ︷︷ ︸body

A variables in x is called a free or head variable, and a variable iny is called an existential variable

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 30: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Relational AlgebraSQLConjunctive Queries

CQ Terminology and Notation

Q(x) :− ∃y[ϕ1(x,y) ∧ · · · ∧ ϕk(x,y)

]For simplification, quantification and conjunction are omitted:

Q(x)︸ ︷︷ ︸head

:−atom︷ ︸︸ ︷

ϕ1(x,y) , · · · , ϕk(x,y)︸ ︷︷ ︸body

A variables in x is called a free or head variable, and a variable iny is called an existential variable

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 31: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Relational AlgebraSQLConjunctive Queries

The Example as CQ

Takes

student cno

Ahuva 1Alon 1

Ahuva 2

Course

cno cname

1 AI2 DB3 PL

Find all pairs (s, c) of students and courses, such that there existsa course number x where both Takes(s, x) and Course(x, c) hold

Q(s, c) :− Takes(s, x) ∧ Course(x, c)

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 32: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Relational AlgebraSQLConjunctive Queries

Why Are CQs Interesting?

I This is the class of all the queries that can be phrased in RAwhen using only:I Selection with equality predicateI ProjectionI Join

I For that reason, CQs are often called SPJ queries

I CQs are the building block of expressive query languages, suchas Datalog

I Useful queries that are simple enough to perform deepinvestigation for various database problemsI As we shall see, there are significant deep insights and

algorithms that apply only to conjunctive queries

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 33: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies

Table of Contents

1 Historical Background

2 Database Model

3 Queries

4 Constraints

5 Complexity

6 References

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 34: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies

Why Do We Care about Constraints?

I Allow to enforce database coherence and avoid bugs

I Allow to formally determine what inconsistency meansI Very relevant to us

I May have dramatic effect on algorithms and complexity

I By focusing on specific classes of constraint languages, weallow for nontrivial analysis

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 35: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies

Functional Dependencies

Lab

name faculty

SIPL EELCL CS

SSDL CSSTAT IE

LabRoom

lab building room

SIPL Meyer 100SIPL Meyer 101LCL Taub 100

SSDL Taub 200

I A lab belongs to just one faculty (i.e., name is a key for Lab)

I A specific room in a specific building belongs to only one lab

I A lab may have multiple rooms, but all in the same building

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 36: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies

Formal Definition

I Let S be a schema

I A Functional Dependency (FD) over S is an expression of theform R : U → V , where U and V are sets of attributes of R

I An instance I over S satisfies the FD R : U → V if for everytwo tuples t1 and t2 of RI :

t1 and t2 agree on U ⇒ t1 and t2 agree on V

I By “agree on W” we mean that t1 and t2 have the same valuein every position that corresponds to an attribute of W

I If U and V cover all the attributes of R, then R : U → V is akey constraint and U is said to be a key for R

I As a simplified notation, we write U and V by simply listingtheir attributes (no set notation)

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 37: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies

Example Revisited

Lab

name faculty

SIPL EELCL CS

SSDL CSSTAT IE

LabRoom

lab building room

SIPL Meyer 100SIPL Meyer 101LCL Taub 100

SSDL Taub 200

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 38: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies

Example Revisited

Lab

name faculty

SIPL EELCL CS

SSDL CSSTAT IE

LabRoom

lab building room

SIPL Meyer 100SIPL Meyer 101LCL Taub 100

SSDL Taub 200

I A lab belongs to just one faculty (i.e., lab is a key for Lab)

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 39: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies

Example Revisited

Lab

name faculty

SIPL EELCL CS

SSDL CSSTAT IE

LabRoom

lab building room

SIPL Meyer 100SIPL Meyer 101LCL Taub 100

SSDL Taub 200

I A lab belongs to just one faculty (i.e., lab is a key for Lab)

Lab : name→ faculty

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 40: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies

Example Revisited

Lab

name faculty

SIPL EELCL CS

SSDL CSSTAT IE

LabRoom

lab building room

SIPL Meyer 100SIPL Meyer 101LCL Taub 100

SSDL Taub 200

I A specific room in a specific building belongs to only one lab

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 41: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies

Example Revisited

Lab

name faculty

SIPL EELCL CS

SSDL CSSTAT IE

LabRoom

lab building room

SIPL Meyer 100SIPL Meyer 101LCL Taub 100

SSDL Taub 200

I A specific room in a specific building belongs to only one lab

LabRoom : building room→ lab

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 42: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies

Example Revisited

Lab

name faculty

SIPL EELCL CS

SSDL CSSTAT IE

LabRoom

lab building room

SIPL Meyer 100SIPL Meyer 101LCL Taub 100

SSDL Taub 200

I A lab may have multiple rooms, but all in the same building

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 43: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies

Example Revisited

Lab

name faculty

SIPL EELCL CS

SSDL CSSTAT IE

LabRoom

lab building room

SIPL Meyer 100SIPL Meyer 101LCL Taub 100

SSDL Taub 200

I A lab may have multiple rooms, but all in the same building

LabRoom : lab→ building

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 44: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies

Generalizing FDs

I There are various formalisms that naturally extend FDs tocross-relation dependencies

I Following are two popular examples:I Equality-Generating Dependencies (EGDs)I Denial Constraints (DCs)

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 45: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies

EGDs

I The FD Lab : name→ faculty can be phrased in FOL as

∀x, y, z[Lab(x, y) ∧ Lab(x, z)→ y = z

]I An EGD is an expression of the form

∀x[ϕ(x)→ y1 = y2

]I ϕ(x) is a conjunction of atomic formulasI y1 and y2 are variables in x

I Example:

Lab(l1, f1), Lab(l2, f2), LabRoom(l1, b, r1), LabRoom(l2, b, r2)

→ f1 = f2

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 46: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies

DCs

I The FD Lab : name→ faculty can be phrased in FOL as

∀x, y, z ¬(Lab(x, y) ∧ Lab(x, z) ∧ y 6= z

)I A DC is an expression of the form

∀x ¬(ϕ(x) ∧ ψ(x)

)I x = (x1, . . . , xn) is a sequence of variablesI Each ϕ(x) is a conjunction of atomic formulasI Each γ(x) is a conjunction of comparisons between two

variables in x (e.g., x1 6= x2, x1 < x2, x1 ≥ x2, etc.)

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 47: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies

Inclusion Dependencies

Lab

name faculty

SIPL EELCL CS

SSDL CSSTAT IE

LabRoom

lab building room

SIPL Meyer 100SIPL Meyer 101LCL Taub 100

SSDL Taub 200

I Every lab in LabRoom should be listed in the Lab relation(foreign key)

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 48: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies

Formal Defintion

I Let S be a schema

I An INclusion Dependency (IND) over S is an expression δ ofthe form

R[A1, . . . , Am] ⊆ S[B1, . . . , Bm]

where:I R and S are relation name in S

I R and S may be equal

I A1, . . . , Am are distinct attributes of RI B1, . . . , Bm are distinct attributes of S

I An instance I over S satisfies δ if

πA1,...,Am(RI) ⊆ πB1,...,Bm(SI)

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 49: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies

Example

Consider the relation Friend[person1, person2]

Friend[person1, person2] ⊆ Friend[person2, person1]

means that friendship is symmetric

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 50: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies

Another Example

Lab

name faculty

SIPL EELCL CS

SSDL CSSTAT IE

LabRoom

lab building room

SIPL Meyer 100SIPL Meyer 101LCL Taub 100

SSDL Taub 200

I Every lab should be listed in the Lab relation (foreign key)

LabRoom[lab] ⊆ Lab[name]

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 51: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies

Tuple-Generating Dependencies

I The IND LabRoom[lab] ⊆ Lab[name] can be phrased in FOLas

∀x, y, z[LabRoom(x, y, z)→ ∃w[Lab(x,w)]

]I A Tuple-Generating Dependency (TGD) is an expression of

the form∀x[ϕ(x)→ ∃yψ(x,y)

]where ϕ(x) and ψ(x,y) are conjunctions of atomic formulas

I Example:

Researcher(p, l), LabRoom(l, b, r)→ ∃r′[PersonRoom(p, b, r′)]

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 52: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Functional DependenciesGeneralizing FDsInclusion DependenciesTuple-Generating Dependencies

Question

Can we express that Friends is transitive with TGDs?

Can we express that Friends has no triangles with TGDs?

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 53: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Data and Combined ComplexityOther Yardsticks of Efficiency

Table of Contents

1 Historical Background

2 Database Model

3 Queries

4 Constraints

5 Complexity

6 References

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 54: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Data and Combined ComplexityOther Yardsticks of Efficiency

Types of Database Complexity

I We consider computational problems that involve one or moreof the following components:I Schema SI A set Σ of constraintsI A query QI A database instance I

I Common complexity measures designed to distinguish oneinput from another (e.g., instances are far bigger thanschemas/queries)

I Combined complexity: everyting is given as input

I Data complexity: I is given as input, everything else is fixedI Formally, we consider infinitely many computational problems

PS,Σ,Q, one per combination of S, Σ and Q

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 55: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Data and Combined ComplexityOther Yardsticks of Efficiency

Example: Complexity of CQ Answering

Problem Def. (Boolean CQ Evaluation)

Given a schema S, a Boolean CQ Q over S and an instance I overS, determine whether Q(I) = true.

We will show that this problem is NP-complete under combinedcomplexity, by reduction from the Clique problem.

Problem Def. (Clique)

Given a graph G = (V,E) and a number k, determine whether Gcontains a clique of size k, that is, a subset U of V such that|U | = k and every two nodes in U are neighbours.

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 56: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Data and Combined ComplexityOther Yardsticks of Efficiency

Reduction

I Given G = (V,E) with V = {1, . . . , n}, and k, construct:I S = {RE/2}I IG = {RE(i, j) | {i, j} ∈ E and i < j}I Qk is a CQ with existential variables X1, . . . , Xk, and an atom

RE(Xi, Xj) for every i and j with 1 ≤ i < j ≤ kI For example, suppose that G is the following graph:

3

1 2

4

IG = RE

1 32 32 43 4

Q3 :− RE(X1, X2), RE(X1, X3), R(X2, X3)

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 57: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Data and Combined ComplexityOther Yardsticks of Efficiency

Correctness

I The reduction is correct since the following two areequivalent:

1 G has a clique of size at least k2 Qk(IG) = true

I Hence, determining whether Q(I) = true, given S, Q and I,is NP-hardI Membership in NP is straightforward, hence, the problem is

NP-complete

I Note: The schema S does not depend on the input (G, k),but the size of Q is quadratic in k

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 58: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Data and Combined ComplexityOther Yardsticks of Efficiency

Data Complexity

What is the data complexity of answering a query in RA?

I We consider the problem PS,Q of computing the answers for aquery Q in RA (Relational Algebra) over a given inputinstance I over S

I The naive way of straightforwardly executing Q runs inpolynomial time!

I As a special case, CQ evaluation is in polynomial time underdata complexityI Note that data complexity is insensitive to the representation

of the query

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 59: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Data and Combined ComplexityOther Yardsticks of Efficiency

Other Yardsticks of Efficiency

Other yardsticks of efficiency are often used in databasecomplexity; here are two examples:

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 60: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Data and Combined ComplexityOther Yardsticks of Efficiency

Parameterized Complexity

I Parameterized complexity is between data complexity andcombined complexity;I Query evaluation is Fixed Parameter Tractable if it can be

evaluated in time O(f(Q) · p(I)), where f is any (computable)function of the query in p(I) is polynomial in the size of II Our reduction from clique shows that CQ evaluation is not

likely to be FPT

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 61: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Data and Combined ComplexityOther Yardsticks of Efficiency

Input-Output Complexity

I A query (e.g., CQ) may be required to output an exponentialnumber of results

I Hence, it makes no sense to require evaluation in polynomialtime

I Input-output complexity measures the time as a function ofboth the input and the outputI Polynomial total time: the running time is polynomial in the

combined size of the input and the outputI Polynomial delay: the answers are produced one by one, where

the delay between every two answers is polynomial in the inputonly

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 62: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

Table of Contents

1 Historical Background

2 Database Model

3 Queries

4 Constraints

5 Complexity

6 References

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 63: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

Historical BackgroundDatabase Model

QueriesConstraintsComplexityReferences

References I

Donald D. Chamberlin and Raymond F. Boyce, SEQUEL: Astructured english query language, SIGMOD, ACM, 1974,pp. 249–264.

E. F. Codd, A relational model of data for large shared databanks, Commun. ACM 13 (1970), no. 6, 377–387.

, Relational completeness of data base sublanguages,Database Systems (1972), 65–98.

Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2

Page 64: Uncertainty in Databases - Technionwebcourse.cs.technion.ac.il/236605/Spring2015/ho/WCFiles/L2... · Faculty of Computer Science, Technion Uncertainty in Databases: Lecture 2. Historical

End of lecture 2Essential Database Foundations