informatics 1: data & analysis - informatics blog service · f sj s2student ^ s.age>19g...

37
http://blog.inf.ed.ac.uk/da16 Informatics 1: Data & Analysis Lecture 6: Tuple Relational Calculus Ian Stark School of Informatics The University of Edinburgh Friday 29 January 2016 Semester 2 Week 3

Upload: others

Post on 26-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

http://blog.inf.ed.ac.uk/da16

Informatics 1: Data & AnalysisLecture 6: Tuple Relational Calculus

Ian Stark

School of InformaticsThe University of Edinburgh

Friday 29 January 2016Semester 2 Week 3

Page 2: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Careers Events Next Week !

Careers in ITJob Fair

Wednesday 3 February 2016

Informatics Forum1300–1600

http://www.ed.ac.uk/schools-departments/careers

Careers advice and stalls from 40 local, nationaland international employers

Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 4: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Lecture Plan for Weeks 1–4

Data RepresentationThis first course section starts by presenting two common datarepresentation models.

The entity-relationship (ER) modelThe relational model Note slightly different naming:

-relationship vs. relational

Data ManipulationThis is followed by some methods for manipulating data in the relationalmodel and using it to extract information.

Relational algebraThe tuple-relational calculusThe query language SQL

Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 5: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

The State We’re In

Relational modelsRelations: Tables matching schemasSchema: A set of field names and their domainsTable: A set of tuples of values for these fields

Studentmatric name age email

s0456782 John 18 john@infs0378435 Helen 20 helen@physs0412375 Mary 18 mary@infs0189034 Peter 22 peter@math

Coursecode title yearinf1 Informatics 1 1

math1 Mathematics 1 1geo1 Geology 1 1dbs Database Systems 3adbs Advanced Databases 4

Takesmatric code mark

s0456782 inf1 71s0412375 math1 82s0412375 geo1 64s0189034 math1 56

Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 6: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

The State We’re In

Relational algebraA mathematical language of bulk operations on relational tables. Eachoperation takes one or more tables, and returns another.

selection σ, projection π, renaming ρ, union ∪, difference −,cross-product ×, intersection ∩ and different kinds of join ./

Tuple relational calculus (TRC)A declarative mathematical notation for writing queries: specifyinginformation to be drawn from the linked tables of a relational model.

Structured Query Language (SQL)A mostly-declarative programming language for interacting with relationaldatabase management systems (RDBMS): defining tables, changing data,writing queries. International Standard ISO 9075

Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 7: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Tuple Relational Calculus: Example

All records for students more than 19 years old

{ S | S ∈ Student ∧ S.age > 19 }

The set of tuples S such that S is in the table “Student” and hascomponent “age” greater than 19.

Studentmatric name age emails0456782 John 18 john@infs0378435 Helen 20 helen@physs0412375 Mary 18 mary@infs0189034 Peter 22 peter@math

Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 8: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Tuple Relational Calculus: Example

All records for students more than 19 years old

{ S | S ∈ Student ∧ S.age > 19 }

The set of tuples S such that S is in the table “Student” and hascomponent “age” greater than 19.

Studentmatric name age emails0456782 John 18 john@infs0378435 Helen 20 helen@physs0412375 Mary 18 mary@infs0189034 Peter 22 peter@math

Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 9: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Tuple Relational Calculus: Example

All records for students more than 19 years old

{ S | S ∈ Student ∧ S.age > 19 }

The set of tuples S such that S is in the table “Student” and hascomponent “age” greater than 19.

This is like list comprehension in programming languages:

Haskell [ s | s <− students, age s > 19 ]

Python [s for s in students if s.age > 19 ]

These are in turn based on comprehensions in set theory.

Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 10: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Tuple Relational Calculus Basics

Queries in TRC have the general form

{ T | P(T) }

where T is a tuple variable and P(T) is a predicate, a logical formula.

Every tuple variable such as T has a schema, listing its fields and theirdomains. In practice, the details of the schema are usually inferred fromthe way T is used in the predicate P(T).

A tuple variable ranges over all possible tuple values matching its schema.

The result of the query{ T | P(T) }

is then the set of all possible tuple values for T such that P(T) is true.

Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 11: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Another Example

Names and ages of all students over 19

{ T | ∃S . S ∈ Student ∧ S.age > 19∧ T .name = S.name ∧ T .age = S.age }

The set of tuples T such that there is a tuple S in table “Student” withfield “age” greater than 19 and where S and T have the same values for“name” and “age”.

Studentmatric name age emails0456782 John 18 john@infs0378435 Helen 20 helen@physs0412375 Mary 18 mary@infs0189034 Peter 22 peter@math

T

name ageHelen 20Peter 22

Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 12: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Another Example

Names and ages of all students over 19

{ T | ∃S . S ∈ Student ∧ S.age > 19∧ T .name = S.name ∧ T .age = S.age }

The set of tuples T such that there is a tuple S in table “Student” withfield “age” greater than 19 and where S and T have the same values for“name” and “age”.

Tuple variable S has schema matching the table “Student”.Tuple variable T has fields “name” and “age”, with domains to matchthose of S.Even if S has other fields, they do not appear in T or the overall result.

Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 13: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Another Example

Names and ages of all students over 19

{ T | ∃S . S ∈ Student ∧ S.age > 19∧ T .name = S.name ∧ T .age = S.age }

The set of tuples T such that there is a tuple S in table “Student” withfield “age” greater than 19 and where S and T have the same values for“name” and “age”.

The variable occurrences { T | · · · and ∃S . · · · are binding — they name avariable which is used later in the expression

The binding of T extends over the whole set { T | . . . T . . . T . . . }The binding of S includes everything to the right of the binder ∃S .

The region over which each variable is bound is called its scope.Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 14: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Formula Syntax

Inside TRC expression { T | P(T) } the logical formula P(T) may be quitelong, but is built up from standard logical components.

Simple assertions: (T ∈ Table), (T .age > 65), (S.name = T .name), . . .

Logical combinations: (P ∨Q), (P ∧Q∧ ¬Q ′), . . .

Quantification:

∃S . P(S) There exists a tuple S such that P(S)∀T . Q(T) For all tuples T it is true that Q(T)

For convenience, we require that for ∃S . P(S) the variable S must actuallyappear in P(S); and the same for ∀T . Q(T). We also write:

∃S ∈ Table . P(S) to mean ∃S . S ∈ Table∧ P(S)

Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 15: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Facebook Graph Search +

Mark Zuckerberg (Credit: AP/Jeff Chiu)

Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 25: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Students and Courses

Studentmatric name age email

s0456782 John 18 john@infs0378435 Helen 20 helen@physs0412375 Mary 18 mary@infs0189034 Peter 22 peter@math

Coursecode title yearinf1 Informatics 1 1

math1 Mathematics 1 1geo1 Geology 1 1dbs Database Systems 3adbs Advanced Databases 4

Takesmatric code mark

s0456782 inf1 71s0412375 math1 82s0412375 geo1 64s0189034 math1 56

Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 26: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Students and Courses (1/5)

Students taking Geology 1

{ R | ∃S ∈ Student . ∃T ∈ Takes . ∃C ∈ Course .C.title = "Geology 1" ∧ C.code = T .code∧ T .matric = S.matric ∧ S.name = R.name }

Schema for S, T and C match those of the tables from which they aredrawn. The schema for result R is a single field “name” with stringdomain, because that’s all that appears here.

One way to compute this in relational algebra:

πname((Student ./ Takes) ./ (σtitle="Geology 1"(Course)))

Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 27: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Relational AlgebraThe relational algebra expression can be rearranged without changing itsvalue, but possibly affecting the time and memory needed for computation:

πname((Student ./ Takes) ./ (σtitle="Geology 1"(Course)))πname(Student ./ (Takes ./ (σtitle="Geology 1"(Course))))πname(Student ./ ((σtitle="Geology 1"(Course)) ./ Takes))

We can also visualise this as rearrangements of a tree:

π

./

./

Student Takes

σ

Course

Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 28: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Relational AlgebraThe relational algebra expression can be rearranged without changing itsvalue, but possibly affecting the time and memory needed for computation:

πname((Student ./ Takes) ./ (σtitle="Geology 1"(Course)))πname(Student ./ (Takes ./ (σtitle="Geology 1"(Course))))πname(Student ./ ((σtitle="Geology 1"(Course)) ./ Takes))

We can also visualise this as rearrangements of a tree:

π

./

Student ./

Takes σ

CourseIan Stark Inf1-DA / Lecture 6 2016-01-29

Page 29: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Relational AlgebraThe relational algebra expression can be rearranged without changing itsvalue, but possibly affecting the time and memory needed for computation:

πname((Student ./ Takes) ./ (σtitle="Geology 1"(Course)))πname(Student ./ (Takes ./ (σtitle="Geology 1"(Course))))πname(Student ./ ((σtitle="Geology 1"(Course)) ./ Takes))

We can also visualise this as rearrangements of a tree:

π

./

Student ./

σ

Course

Takes

Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 30: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Students and Courses (2/5)

Courses taken by students called “Ada”

{ R | ∃S ∈ Student, T ∈ Takes,C ∈ Course .S.name = "Ada" ∧ S.matric = T .matric∧ T .code = C.code ∧ C.title = R.title }

Note the slightly abbreviated syntax for multiple quantification: we usecomma-separated ∃ . . . , . . . , . . . instead of ∃ . . .∃ . . .∃ . . .

Computing this in relational algebra:

πtitle((Course ./ Takes) ./ (σname="Ada"(Student)))

Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 31: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Students and Courses (3/5)

Students taking Informatics 1 or Geology 1

{ R | ∃S ∈ Student, T ∈ Takes,C ∈ Course .(C.title = "Informatics 1"∨ C.title = "Geology 1")∧ C.code = T .code ∧ T .matric = S.matric ∧ S.name = R.name }

Now the logical formula becomes a little more elaborate.

Computing this in relational algebra:

πname((Student ./ Takes) ./ (σtitle="Informatics 1"(Course)))∪ πname((Student ./ Takes) ./ (σtitle="Geology 1"(Course)))

πname((Student ./ Takes) ./ (σ(title="Informatics 1"∨title="Geology 1")(Course)))

Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 32: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Students and Courses (4/5)

Students taking both Informatics 1 and Geology 1

{ R | ∃S ∈ Student, T1, T2 ∈ Takes,C1,C2 ∈ Course .C1.title = "Informatics 1" ∧ C2.title = "Geology 1"∧ C1.code = T1.code ∧ T1.matric = S.matric∧ C2.code = T2.code ∧ T2.matric = S.matric ∧ S.name = R.name }

Computing this in relational algebra:

πname((Student ./ Takes) ./ (σtitle="Informatics 1"(Course)))∩ πname((Student ./ Takes) ./ (σtitle="Geology 1"(Course)))

Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 33: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Students and Courses (5/5)

Students taking no courses

{ R | ∃S ∈ Student . S.name = R.name∧ ∀T ∈ Takes . T .matric 6= S.matric }

Computing this in relational algebra:

πname(Student− πname,mn,age,email(Student ./ Takes))

? Challenge: why not one of these instead?

πname(Student− (Student ./ Takes))

πname(Student) − πname(Student ./ Takes))

Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 34: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Relational Algebra vs. Tuple Relational Calculus

Codd gave a proof that relational algebra and TRC are equally expressive:anything expressed in one language can also be written in the other.

So why have both?

They give different perspectives and allow the following approach:

Use relational calculus to specify the information wanted;Translate into relational algebra to give a procedure for computing it;Rearrange the algebra to make that procedure efficient.

The database language SQL is based on the calculus: well-suited to givinglogical specifications, independent of any eventual implementation.

The algebra beneath it is good for rewriting, equations, and calculation.

Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 35: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Domain-Specific Languages +

Charles V, 1500–1558Holy Roman Emperor, King ofSpain, Archduke of Austria

“I speak Spanish to God, Italian to women,French to men and German to my horse.”

Attributed, but even Wikipedia is sceptical.

Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 36: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Query Optimization

... Rearrange the algebra to make that procedure efficient.

This last part is central to the viability of modern large databases. Aneffective query optimizer will draw up a list of possible query plans andcompare the costs of all of them, taking account of:

How much data there is, where it is, how it is arranged;What indexes are available, for which tables, and where they are;Selectivity: estimates of how many rows a subquery will return;Estimated size of any intermediate tables;What parts can be done in parallel;What I/O and computing resources are available;. . .

Ian Stark Inf1-DA / Lecture 6 2016-01-29

Page 37: Informatics 1: Data & Analysis - Informatics Blog Service · f Sj S2Student ^ S.age>19g Thesetoftuples Ssuchthat isinthetable“Student”andhas component“age”greaterthan19. Student

Homework !

Read This

Inside Google Spanner, the Largest Single Database on EarthWired 2012-11-26http://www.wired.com/2012/11/google-spanner-time/all/

Do ThisTuple-relational calculus can be quite tricky to understand, and it’s notalways obvious to follow what a query means. So, homework this time isto read the lectures slides again, going through the examples to see howeach query works.

Ian Stark Inf1-DA / Lecture 6 2016-01-29