exercises on relational algebramontesi/bd/es2016-17/01. relational algebra.pdf · that is missing...
TRANSCRIPT
Databases
Exercises on Relational Algebra
The Lab Sessions
Giacomo Bergami ([email protected])bergami.co.nr
▪ 2016/10/07▪ Keys and Superkeys▪ Relational Algebra (I)
▪ Negation▪ Minimum
▪ 2016/10/14▪ Relational Algebra (II)
▪ At least 2…▪ More exercises + Questions
Keys…
def. A superkey K in r(S) (K⊆S) univocally identifies tuples in r.
¬∃t1≠t
2 ∈ r. t
1[K]=t
2[K]
∀t1≠t
2 ∈ r. t
1[K]≠t
2[K]
▪ Recap: within the relational model, each tuple is unique. ∀t
1∈r. ∀t
2∈ r\{t
1}. t
1≠t
2 (HP)
▪ Hence, having the aforementioned condition is “sufficient” for unicity.
…and Superkeysdef. K⊆S is minimal for a property P(.) if ¬∃T⊂K.P(T)
▪ This means that any other key smaller than K does not satisfy a predicate.
def. A key K in r(S) (K⊆S) is a minimal superkey.▪ This means that a key is already minimal, since
there is no other subset of K satisfying the property “being a superkey”. (A key is also a “minimal key”)¬∃T⊂K.(¬∃t
1≠t
2 ∈ r. t
1[K]=t
2[K])
∀t1≠t
2 ∈ r.∃t
1≠t
2 ∈ r. t
1[K]=t
2[K]
Recap: operators
▪ Set Operations: /, ∪, ∩, ×▪ σθ(R): Filter R by θ▪ πL(R): Reduce R’s schema to the attributes in
L▪ R⋈S, R⋈θS: combining two relations using a
predicate θ▪ ρB←A(R): Renames A as B in R’s schema
Why?
▪ Query expressions’ equivalences: query optimizations in current (relational) databases use algebras to rewrite rules into an equivalent expression that takes less time to compute.▪ In this course “correctness” and “readable” results
are preferred to “quick” and “efficient” ones (see above).
▪ A way to express queries in a logical framework.▪ How to express “not forall”?▪ How to express “minimum”?▪ How to “count” (there exists at least two…)?
Exercise 1
Given the following relation:
Train(Code, Start, End, miles)
Provide all the routes between Boston and Chicago with one switch.
▪ In order to switch train, you have to reach an end and then start again the journey.
▪ We have to join the Train relation.▪ We have to rename the fields before joining them
Exercise 1
ρC1←Code,S←Start,Sw←End,M1←miles
(Train) Train(C1, S, Sw, M1) ⋈
S=”Boston” AND E=”Chicago”
ρC2←Code,Sw←Start,E←End,M2←miles
(Train) Train(C2, Sw, E, M2)
Which is the most efficient solution?
▪ Solution BρC1←Code,S←Start,Sw←End,M1←miles
(σStart=”Boston”
(Train)) ⋈ρC2←Code,Sw←Start,E←End,M2←miles
(σEnd=”Chicago”
(Train))
▪ Solution A
Exercise 2(a)
Given the following relation:
Employee(Code, Name Surname)
BelongsTo(Employee, Office)
Associate to each employee its office, only if it exists.
▪ R⋈θS combines the tuples from both relations iff. both of them satisfy a predicate θ
Exercise 2(a)
Code Name Surname
0123 Johann Sebastian Bach
4567 Wolfgang Amadeus Mozart
0571 Edvard Grieg
3573 Claude Debussy
Employee
Employee Office
0123 Baroque
4567 Classical
3573 Impressionism
BelongsTo
Code Name Surname Office
0123 Johann Sebastian Bach Baroque
4567 Wolfgang Amadeus Mozart Classical
3573 Claude Debussy Impressionism
πCode,Name,Surname,Office
(Employee ⋈ Code=Employee BelongsTo)
Exercise 2(b)
Given the following relation:
Employee(Code, Name Surname)
BelongsTo(Employee, Office)
Return employees’ codes with with no office.
▪ In this case we have to use a left join in order to get the complete list of employees
Exercise 2(b)
Code Name Surname
0123 Johann Sebastian Bach
4567 Wolfgang Amadeus Mozart
0571 Edvard Grieg
3573 Claude Debussy
Employee
Employee Office
0123 Baroque
4567 Classical
3573 Impressionism
BelongsTo
Code Name Surname Office
0123 Johann Sebastian Bach Baroque
4567 Wolfgang Amadeus Mozart Classical
0571 Edvard Grieg NULL
3573 Claude Debussy Impressionism
πCode,Name,Surname,Office
(Employee Code=Employee BelongsTo)
Exercise 2(b)
Code Name Surname Office
0123 Johann Sebastian Bach Baroque
4567 Wolfgang Amadeus Mozart Classical
0571 Edvard Grieg NULL
3573 Claude Debussy Impressionism
πCode,Name,Surname,Office
(Employee Code=Employee BelongsTo)
πCode,Name,Surname
(σOffice is NULL
(Employee Code=Employee BelongsTo))
Code Name Surname
0571 Edvard Grieg
Exercise 3(a)
Given the following relations:
State(Name, Area)
City(Code, Name, Inhabitants)
FormedOf(State, City)
Return the U.S.A. States’ names having more than 1.000.000 inhabitants.
How to solve this query?
Exercise 3(a)
Given the following relations:
State(Name, Area)
City(Code, Name, Inhabitants)
FormedOf(State, City)
Return the U.S.A. States’ names having more than 1.000.000 inhabitants.
■ This query requires the group by operator (Γ,γ), that is missing in the proposed relational algebra. Let’s change the query.
Exercise 3(b)
Given the following relations:
State(Name, Area)
City(Code, Name, Inhabitants)
FormedOf(State, City)
Return the U.S.A States’ names having cities with more than 1.000.000 inhabitants.
Exercise 3(b)
Name Area
Rhode Island 1,545
New Jersey 8,723
California 163,696
Alaska 665,384
State
State City
Rhode Island 401
New Jersey 862
California 213
Alaska 907
Alaska 908
Code Name Inhabitants
401 Providence 178,042
862 Newark 277,140
213 Los Angeles 4,030,904
907 Anchorage 291,826
908 North Pole 2,117
FormedOf City
Given the following relations…
State(Name, Area)
City(Code, Name, Inhabitants)
FormedOf(State, City)
Exercise 3(b)
Name Area
Rhode Island 1,545
New Jersey 8,723
Florida 65,758
Alaska 665,384
State
State City
Rhode Island 401
New Jersey 862
California 213
Alaska 907
Alaska 908
Code Name Inhabitants
401 Providence 178,042
862 Newark 277,140
213 Los Angeles 4,030,904
907 Anchorage 291,826
908 North Pole 2,117
FormedOf City
…Return the U.S.A States’ names having cities with more than 1.000.000 inhabitants (θ).
Code Name Inhabitants
213 Los Angeles 4,030,904
σInhabitants>1000000
(City)? FormedOf already has the State’s name information. City and FormedOf are
enough
?
⋈
Exercise 3(b)State City
Rhode Island 401
New Jersey 862
California 213
Alaska 907
Alaska 908
FormedOf
Code Name Inhabitants
213 Los Angeles 4,030,904
σInhabitants>1000000
(City)
⋈City=Code
State Code
Rhode Island 401
New Jersey 862
California 213
Alaska 907
Alaska 908
ρCode←City(FormedOf)
Code Name Inhabitants
213 Los Angeles 4,030,904
σInhabitants>1000000
(City)
⋈City=Code
πState
(FormedOf ⋈City=Code
σInhabitants>1000000
(City))
πState
(ρCode←City(FormedOf)⋈ σInhabitants>1000000(City))
Exercise 4
Given the following relations:
Person(ID, Name, Surname, Age)
Registration(Person,Lecture)
Lecture(Name,Price)
Return the people’s names which are registred for a lecture that costs more than 10 times their age.
■ Let’s just use natural joins, projection, selections and renamings.
Exercise 4
Given the following relations:
ρPerson←ID
(Person) Person(Person, Name,…, Age)
Registration Registration(Person,Lecture)
ρLecture←Name
(Lecture) Lecture(Lecture,Price)
Return the people’s names which are registred for a lecture that costs more than 10 times their age.
■ Let’s just use natural joins, projection, selections and renamings.
Exercise 4
πName
(σCost>10*Age
(
ρPerson←ID
(Person) ⋈
Registration ⋈
ρLecture←Name
(Lecture)
))
Return the people’s names which are registred for a lecture that costs more than 10 times their age.
Exercise 5
Given the following relations:
State(Name, Area)
City(Code, Name, Inhabitants)
FormedOf(State, City)
return the City names belonging to states larger than 10000 squared miles
Exercise 5ρState←Name
(σArea>1000
(State)) State(State, Area)
ρCity←Code
(City) City(City, Name, Inhabitants)
FormedOf FormedOf(State, City)
return the City names belonging to states larger than 10000 squared miles
ρCity←Code
(City) ⋈
FormedOf ⋈
ρState←Name
(σArea>1000
(State))
πName
(
)
Exercise 6(a)Given the following relations:
Student(Number, Surname, Name, Dept)
Exam(Student, Subject, Grade, Day)
Provide the students’ surnames and names that passed at least one exam with grade A.
▪ Please note that, in this case, we could select all the exams with grade A, then grasp the student
▪ This time we’re going to use theta-joins
Exercise 6(a)
πSurname,Name
(
Student ⋈ π
Number(ρ
Number←Student(σ
Grade=“A”(Exam)))
)
Student(Number, Surname, Name, Dept)
Exam(Student, Subject, Grade, Day)
πSurname,Name
(
Student ⋈Number=Student
πStudent
(σGrade=“A”
(Exam))
)
▪ Using natural joins
Exercise 6(b)Given the following relations:
Student(Number, Surname, Name, Dept)
Exam(Student, Subject, Grade, Day)
Provide the students’ surnames and names that passed no exam with grade A.
Example 6(b)
πSurname,Name
(Student⋈(
πNumber
(Student)
\ πNumber
(ρNumber←Student
(σGrade=“A”
(Exam))
)
U
\
⟦S⟧
▪ Remember that ⟦¬S⟧=U\⟦S⟧, where U is the universe relation▪ U: Student (id)▪ S: Students (id) that gave exams with
grade A
Exercise 6(c)Given the following relations:
Student(Number, Surname, Name, Dept)
Exam(Student, Subject, Grade, Day)
Provide the students’ surnames, names and first exam (by date) passed with grade A.
▪ Is it possible to define a minimum operator in relational algebra?
Number Surname Name Dept.M1 Rossi Ugo Computer
ScienceM2 Bianchi Mario Computer
Science
Student
ExamStudent Subject Grade Day
M1 DB A 08/05/2012M1 Compl. DB A 10/05/2012M1 Lambda Calc. A 06/06/2012M1 ALGEBRA B 07/01/2011M2 OS B 07/02/2012
Exercise 6(c)
Defining the min. operator
minC(S)={t|∃t,t’∈S. t[C]≤t’[C]}
={t|∃t∈S. ¬∀t’∈S. t[C]>t’[C]}
=S\{t |∃t∈S. ∀t’∈S. t[C]>t’[C]}
=S⋈(πC(S)\π
C(π
C(S)⋈
C>C’ πC’(ρ
C’←C(S))))
■ Return the tuple in S having the minimum value of C. ■ In this case we return the tuple t from S having
the minimum C value.
Defining the local min. operator
locminA,C
(S)=
=S⋈(πA,C
(S)\πA,C
(πA,C
(S)⋈C>C’
πA,C’
(ρC’←C
(S))))
■ Return the tuple in S having the minimum value of C within the entries with the same A value. ■ By doing so, we perform the comparison
C>C’among all the tuples with the same A values
Warning!
▪ “min” and “local min.” operators are “non standard” operators
▪ When you do the exam, you have to provide the algebra expression associated with it.▪ Problem: it is too difficult to “keep in mind”▪ Any kind of “cards”, “texts” and “notes” are
forbidden.▪ Logical language is a way to obtain a “safe“
expressions without risks. Use them!
Exercise 6(c): Final Result
S = σGrade=“A”
(Exam)
T = πStudent,Day
(S)\πStudent,Day
(πStudent,Day
(S)
⋈Day>Day’
πStudent,Day’
(ρDay’←Day
(S)))
πSurname,Name,Day
(Student ⋈Number=Student
T)
Provide the students’ surnames, names and first exam (by date) passed with grade A.
Local minimum for student/day
S = σGrade=“A”
(Exam)
πStudent,Day
(S)\πStudent,Day
(πStudent,Day
(S)
⋈Day>Day’
πStudent,Day’
(ρDay’←Day
(S)))
Provide the students’ surnames, names and first exam (by date) passed with grade A.■ Return the student (id) and exam day having
the minimum value of exam date with grade A within the entries for the same student (id).
Exercise 7
Given the following relations:User(Tax,Surname,Birth)
Field(FCode,IsCovered)
Bookings(FCode,Day,TimeStart,TimeEnd,Tax)
return the Tax code of the users that have booked at least two times a “non covered field” and that have never booked a covered field.▪ and = intersection▪ never booked a covered field = negation▪ at least two times = how to “count”?
Exercise 7(a)
Given the following relations:User(Tax,Surname,Birth)
Field(FCode,IsCovered)
Bookings(FCode,Day,TimeStart,TimeEnd,Tax)
return the Tax code of the users that have never booked a covered field
▪ Use the negated semantics.
Exercise 7(a)
return the Tax code of the users that have never booked a covered field
▪ Use the negated semantics.
πTax
(User)\πTax
( User⋈
Bookings ⋈
σIsCovered=True
(Field))
Exercise 7(b)
Given the following relations:User(Tax,Surname,Birth)
Field(FCode,IsCovered)
Bookings(FCode,Day,TimeStart,TimeEnd,Tax)
return the Tax code of the users that have booked at least two times a “non covered field”
▪ Is there a way to evaluate such function using the previoulsy defined functions?
At least one, at least two…
▪ First: if there exists a minimum, then it means that there exists at least one element (that is the minimum).
exC(S)=min
C(S)
▪ Second: if there exsits another element immediately following the minimum, then there exists two elements (the minimum and the element immediately following it)
ex2C(S)=ex
C(S\ex
C(S))=min
C(S\min
C(S))
At least two – locmin (1)
locminA,C
(S\locminA,C
(S))=?
▪ First of all, exclude the minimum from S:S\locmin
A,C(S)=
= S\(S⋈(πA,C
(S)\πA,C
(πA,C
(S) ⋈C>C’
πA,C’
(ρC’←C
(S))
)
))
= S⋈ πA,C
(πA,C
(S)⋈C>C’
πA,C’
(ρC’←C
(S)))
= S⋈ πA,C
(∂S)
At least two – minloc (2)
locminA,C
(S\locminA,C
(S))=?
▪ T = S ⋈ πA,C
(∂S) = S\locminA,C
(S)▪ locmin
A,C(S)=
=S⋈(πA,C
(S)\πA,C
(πA,C
(S)⋈C>C’
πA,C’
(ρC’←C
(S))))
locminA,C
(S\locminA,C
(S))=
= locminA,C
(S⋈ πA,C
(∂S))= (S⋈π
A,C(∂S))⋈
(πA,C
(T)\πA,C
(πA,C
(T)⋈C>C’
πA,C’
(ρC’←C
(T))))
= S⋈(πA,C
(T)\πA,C
(πA,C
(T)⋈C>C’
πA,C’
(ρC’←C
(T))))
At least two – minloc (3)
locminA,C
(S\locminA,C
(S))=
= S⋈(πA,C
(T)\πA,C
(πA,C
(T)⋈C>C’
πA,C’
(ρC’←C
(T))))
▪ Recall: πA,C’
(ρC’←C
(T))=ρC’←C
(πA,C’
(T))
= S⋈(πA,C
(T)\πA,C
(πA,C
(T)⋈C>C’
ρC’←C
(πA,C’
(T))))
At least two – minloc (4)
locminA,C
(S\locminA,C
(S))=
= S⋈(πA,C
(T)\πA,C
(πA,C
(T)⋈C>C’
ρC’←C
(πA,C’
(T))))
▪ Recall: πA,C
(S⋈πA,C
(∂S))=πA,C
(∂S) ▪ Recall: ∂S = π
A,C(S)⋈
C>C’πA,C’
(ρC’←C
(S))
= S⋈(πA,C
(∂S)\πA,C
(πA,C
(∂S)⋈C>C’
ρC’←C
(πA,C
(∂S)))).
Exercise 7(b)[…] Return the Tax code of the users that have booked at least two times a “non covered field”
S=πFCode,…,Tax
(Bookings⋈σIsCovered=True
(Field))
πTax
(ex2Tax,(FCode,Day,timeStart)
(S))
▪ For ordering multiple attributes, use the lexicographical order:
(FCode<FCode’) OR
(FCode=FCode’ AND Day<Day’) OR
(FCode=FCode’ AND Day=Day’ AND
timeStart<timeStart’)
Exercise 7 (finally!)
[…] Return the Tax code of the users that have booked at least two times a “non covered field” and that have never booked a covered field.
▪ Use the intersection.
A = πTax
(User)\πTax
(User⋈Bookings⋈σIsCovered=True
(Field))
S = πFCode,…,Tax
(Bookings⋈σIsCovered=True
(Field))
A ∩ πTax
(ex2Tax,(FCode,Day,timeStart)
(S))
Warning!
▪ If logic is used as a “formal specification”, you could derive a “correct” relational algebra expression from it.▪ Abstract reasoning (see functional
programming) allows to reason on the result and not on “how” to evaluate the solution.
▪ Intermediate variables and operators allow to have a more readable code.
▪ Any non standard operator (min, locmin, ex, ex2,…) shall be defined.