relational algebra (continued) lecture 5. relational algebra (summary) basic operations: –...
TRANSCRIPT
Relational AlgebraRelational Algebra(continued)(continued)
Lecture 5
Relational Algebra (Summary)
• Basic operations:– Selection ( ) Selects a subset of rows from relation.– Projection ( ) Deletes unwanted columns from relation.– Cross-product ( ) Allows us to combine two relations.– Set-difference ( - ) Tuples in reln. 1, but not in reln. 2.– Union ( ) Tuples either in reln. 1 or in reln. 2 or in both.– Rename ( ) Changes names of the attributes
• Since each operation returns a relation, operations can be composed ! (Algebra is “closed”.)
• Use of temporary relations recommended.
3
Additional operators We define additional operations that do not add
any power to the relational algebra, but that simplify common queries.
– Natural join– Conditional Join– Equi-Join– Division
Also, we’ve already seen “Set intersection”:r s = r - (r - s)
All joins are really special cases of conditional join
4
Quick note on notation
customer-name loan-number
Patty 1234
Apu 3421
Selma 2342
Ned 4531
customer-name loan-number
Seymour 3432
Marge 3467
Selma 7625
Abraham 3597
good_customers bad_customers
If we have two or more relations which feature the same attribute names, we could confuse them. To prevent this we can use dot notation.For example
good_customers.loan-number
5
Natural-Join Operation: MotivationNatural-Join Operation: Motivationcust-name l-number
Patty 1234
Apu 3421
l-number branch
1234 Dublin
3421 Irvine
borrower loan
cust-name borrower.l-number loan.l-number branch
Patty 1234 1234 Dublin
Patty 1234 3421 Irvine
Apu 3421 1234 Dublin
Apu 3421 3421 Irvine
cust-name borrower.l-number loan.l-number branch
Patty 1234 1234 Dublin
Apu 3421 3421 Irvine
borrower.l-number = loan.l-number(borrower x loan)))
Very often, we have a query and the answer is not contained in a single relation. For example, I might wish to know where Apu banks.The classic relational algebra way to do such queries is a cross product, followed by a selection which tests for equality on some pair of fields.
While this works…• it is unintuitive• it requires a lot of memory• the notation is cumbersome
Note that in this example the two relations are the same size (2 by 2), this does not have to be the case.
So, we have a more intuitive way of achieving the same effect, the natural join, denoted by the symbol
6
Natural-Join Operation: IntuitionNatural-Join Operation: IntuitionNatural join combines a cross product and a selection into one operation. It performs a selection forcing equality on those attributes that appear in both relation schemes. Duplicates are removed as in all relation operations.For the previous example (one attribute in common: “l-number”), we have…
borrower loan
• If the two relations have no attributes in common, then their natural join is simply their cross product.• If the two relations have more than one attribute in common, then the natural join selects only the rows where all pairs of matching attributes match. (let’s see an example on the next slide).
= borrower.l-number = loan.l-number(borrower x loan)))
7
l-name f-name age
Bouvier Selma 40
Bouvier Patty 40
Smith Maggie 2
Al-name f-name ID
Bouvier Selma 1232
Smith Selma 4423
B
l-name f-name age l-name f-name ID
Bouvier Selma 40 Bouvier Selma 1232
Bouvier Selma 40 Smith Selma 4423
Bouvier Patty 2 Bouvier Selma 1232
Bouvier Patty 40 Smith Selma 4423
Smith Maggie 2 Bouvier Selma 1232
Smith Maggie 2 Smith Selma 4423
l-name f-name age l-name f-name ID
Bouvier Selma 40 Bouvier Selma 1232
l-name f-name age ID
Bouvier Selma 40 1232A B =
Both the l-name and the f-name match, so select.
Only the f-names match, so don’t select.
Only the l-names match, so don’t select.
We remove duplicate attributes…
The natural join of A and B
Note that this is just a way to visualize the natural join, we don’t really have to do the cross product as in this example
8
Natural-Join OperationNatural-Join Operation• Notation: r s• Let r and s be relation instances on schemas R and S
respectively.The result is a relation on schema R S which is
obtained by considering each pair of tuples tr from r and ts from s.
• If tr and ts have the same value on each of the attributes in R S, a tuple t is added to the result, where
– t has the same value as tr on r
– t has the same value as ts on s
• Example:R = (A, B, C, D)S = (E, B, D)
• Result schema = (A, B, C, D, E)• r s is defined as:
r.A, r.B, r.C, r.D, s.E (r.B = s.B r.D = s.D (r x s))
9
Natural Join Operation – ExampleNatural Join Operation – Example• Relation instances r, s:
A B
12412
C D
aabab
B
13123
D
aaabb
E
r
A B
11112
C D
aaaab
E
s
r sHow did we get here?
Lets do a trace over the next few slides…
10
A B
12412
C D
aabab
B
13123
D
aaabb
E
r s
First we note which attributes the two relations have in common…
11
A B
12412
C D
aabab
B
13123
D
aaabb
E
r
A B
11
C D
aa
E
s
There are two rows in s that match our first row in r, (in the relevant attributes) so both are joined to our first row…
12
A B
12412
C D
aabab
B
13123
D
aaabb
E
r s
A B
11
C D
aa
E
…there are no rows in s that match our second row in r, so do nothing…
13
A B
12412
C D
aabab
B
13123
D
aaabb
E
r s
A B
11
C D
aa
E
…there are no rows in s that match our third row in r, so do nothing…
14
A B
12412
C D
aabab
B
13123
D
aaabb
E
r s
A B
1111
C D
aaaa
E
There are two rows in s that match our fourth row in r, so both are joined to our fourth row…
15
A B
12412
C D
aabab
B
13123
D
aaabb
E
r s
There is one row that matches our fifth row in r,.. so it is joined to our fifth row and we are done!
A B
11112
C D
aaaab
E
16
Natural Join on Sailors Example
sid sname rating age bid day
22 dustin 7 45.0 101 10/ 10/ 9658 rusty 10 35.0 103 11/ 12/ 96
11 RS
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.558 rusty 10 35.0
sid bid day
22 101 10/10/9658 103 11/12/96
S1 R1
17
Last week we saw…
Query: Find the name of the sailor who reserved boat 101.
€
Temp=ρ (sid→sid1, S1) × ρ(sid→sid2, R1)
Result=πSname
(σsid1=sid2 ∧bid=101
(Temp))
* Note my use of “temporary” relation Temp.
18
Query revisited using natural join
Query: Find the name of the sailor who reserved boat 101.
€
Result=πSname
(σbid=101
(S1 >< R1))
Or
Result=πSname
(S1 >< σbid=101
(R1))
19
Conditional-Join Operation:Conditional-Join Operation:The conditional join is actually the most general type of join. We introduced the natural join first only because it is more intuitive and... natural!
Just like natural join, conditional join combines a cross product and a selection into one operation. However instead of only selecting rows that have equality on those attributes that appear in both relation schemes, we allow selection based on any predicate.
r c s = c(r x s) Where c is any predicate on the attributes of r and/or s
Duplicate rows are removed as always, but duplicate columns are not removed!
20
l-name f-name marr-Lic age
Simpson Marge 777 35
Lovejoy Helen 234 38
Flanders Maude 555 24
Krabappel Edna 978 40
l-name f-name marr-Lic age
Simpson Homer 777 36
Lovejoy Timothy 234 36
Simpson Bart null 9
r r.age < s.age AND r.Marr-Lic = s.Marr-Lic s
r.l-name r.f-name r.Marr-Lic r.age s.l-name s.f-name s.marr-Lic s.age
Simpson Marge 777 35 Simpson Homer 777 36
We want to find all women that are younger than their husbands…
Conditional-Join Example:Conditional-Join Example:
r s
Note we have removed ambiguity of attribute names by using “dot” notationAlso note the redundant information in the marr-lic attributes
21
Equi-Join
• Equi-Join: Special case of conditional join where the conditions consist only of equalities.
• Natural Join: Special case of equi-join in which equalities are specified on ALL fields having the same names in both relations.
22
l-name f-name marr-Lic age
Simpson Marge 777 35
Lovejoy Helen 234 38
Flanders Maude 555 24
Krabappel Edna 978 40
l-name f-name marr-Lic age
Simpson Homer 777 36
Lovejoy Timothy 234 36
Simpson Bart null 9
r r.Marr-Lic = s.Marr-Lic s
r.l-name r.f-name Marr-Lic r.age s.l-name s.f-name s.age
Simpson Marge 777 35 Simpson Homer 36
Lovejoy Helen 234 38 Lovejoy Timothy 36
Equi-JoinEqui-Join
r s
23
Review on Joins• All joins combine a cross product and a selection into
one operation.• Conditional Join
– the selection condition can be of any predicate (e.g. rating1 > rating2)
• Equi-Join: – Special case of conditional join where the conditions consist
only of equalities.
• Natural Join– Special case of equi-join in which equalities are specified on
ALL fields having the same names in both relations.
24
Banking ExamplesBanking Examples
branch (branch-name, branch-city, assets)
customer (customer-name, customer-street, customer-only)
account (account-number, branch-name, balance)
loan (loan-number, branch-name, amount)
depositor (customer-name, account-number)
borrower (customer-name, loan-number)
Note that I have not indicated primary keys here for simplicity.
25
Example QueriesExample Queries• Find all loans of over $1200
amount > 1200 (loan)
loan
loan-number branch-name amount
1234 Riverside 1,923.03
3421 Irvine 123.00
2342 Dublin 56.25
4531 Prague 120.03
1234 Riverside 1,923.03amount > 1200 (loan)
“select from the relation loan, only the rows which have an amount greater than 1200”
26
Example QueriesExample Queries• Find the loan number for each loan of an amount greater than $1200
loan-number (amount > 1200 (loan))
loan
loan-number branch-name amount
1234 Riverside 1,923.03
3421 Irvine 123.00
2342 Dublin 56.25
4531 Prague 120.03
amount > 1200 (loan) 1234 Riverside 1,923.03
loan-number (amount > 1200 (loan)) 1234
“select from the relation loan, only the rows which have an amount greater than 1200, then project out just the loan_number”
27
Example QueriesExample Queries• Find all loans greater than $1200 or less than $75
amount > 1200 amount < 75(loan)
loan
loan-number branch-name amount
1234 Riverside 1,923.03
3421 Irvine 123.00
2342 Dublin 56.25
4531 Prague 120.03
amount > 1200 amount < 75(loan)
“select from the relation loan, only the rows which have an amount greater than 1200 or an amount less than 75
1234 Riverside 1,923.03
2342 Dublin 56.25
28
Example QueriesExample Queries• Find the names of all customers who have a loan, an account, or both, from
the bank
customer-name (borrower) customer-name (depositor)
customer-name loan-number
Patty 1234
Apu 3421
Selma 2342
Ned 4531
depositorborrower
customer-name account-number
Moe 3467
Apu 2312
Patty 9999
Krusty 3423
Patty
Apu
Selma
Ned
Moe
Apu
Patty
Krusty
Moe
Apu
Patty
Krusty
Selma
Ned
customer-name (borrower) customer-name (depositor)
29
Example QueriesExample QueriesFind the names of all customers who have a loan at the Riverside branch.
customer-name (branch-name=“Riverside ” (borrower.loan-number = loan.loan-number(borrower x loan)))
customer-name loan-number
Patty 1234
Apu 3421
loan-number branch-name amount
1234 Riverside 1,923.03
3421 Irvine 123.00
borrower loan
Note this example is split over two slides!
customer-name borrower.loan-number
loan.loan-number
branch-name amount
Patty 1234 1234 Riverside 1,923.03
Patty 1234 3421 Irvine 123.00
Apu 3421 1234 Riverside 1,923.03
Apu 3421 3421 Irvine 123.00
We retrieve borrower and loan…
…we calculate their cross product…
30
customer-name borrower.loan-number
loan.loan-number
branch-name amount
Patty 1234 1234 Riverside 1,923.03
Patty 1234 3421 Irvine 123.00
Apu 3421 1234 Riverside 1,923.03
Apu 3421 3421 Irvine 123.00
…we calculate their cross product…
…we select the rows where borrower.loan-number is equal to loan.loan-number…
…we select the rows where branch-name is equal to “Riverside”
…we project out the customer-name.
customer-name borrower.loan-number
loan.loan-number
branch-name amount
Patty 1234 1234 Riverside 1,923.03
Apu 3421 3421 Irvine 123.00
customer-name borrower.loan-number
loan.loan-number
branch-name amount
Patty 1234 1234 Riverside 1,923.03
Patty
customer-name (branch-name=“Riverside ” (borrower.loan-number = loan.loan-number(borrower x loan)))
31
Now Using Natural JoinNow Using Natural JoinFind the names of all customers who have a loan at the Riverside branch.
customer-name (branch-name=“Riverside ” (borrower.loan-number = loan.loan-number(borrower x loan)))
=> customer-name (branch-name=“Riverside ” borrower loan))
customer-name loan-number
Patty 1234
Apu 3421
loan-number branch-name amount
1234 Riverside 1,923.03
3421 Irvine 123.00
borrower loanWe retrieve borrower and loan…
1234 in borrower is matched with 1234 in loan…
3421 in borrower is matched with 3421 in loan…
The rest is the same.
customer-name borrower.loan-number
loan.loan-number
branch-name amount
Patty 1234 1234 Riverside 1,923.03
Apu 3421 3421 Irvine 123.00
customer-name borrower.loan-number
loan.loan-number
branch-name amount
Patty 1234 1234 Riverside 1,923.03
32
Example QueriesExample QueriesFind the largest account balance
...we will need to rename account relation as d...
balance(account) - account.balance(account.balance < d.balance (account x (d,account)))
account-number
balance
Apu 100.30
Patty 12.34
Lenny 45.34
Note this example is split over three slides!
accountaccount-number
balance
Apu 100.30
Patty 12.34
Lenny 45.34
d
We do a rename to get a “copy” of account which we call d…
… next we will do a cross product…
33
account.account-number
account.balance
d.account-number
d.balance
Apu 100.30 Apu 100.30
Apu 100.30 Patty 12.34
Apu 100.30 Lenny 45.34
Patty 12.34 Apu 100.30
Patty 12.34 Patty 12.34
Patty 12.34 Lenny 45.34
Lenny 45.34 Apu 100.30
Lenny 45.34 Patty 12.34
Lenny 45.34 Lenny 45.34
balance(account) - account.balance(account.balance < d.balance (account x (d,account)))
account.account-number
account.balance
d.account-number
d.balance
Patty 12.34 Apu 100.30
Patty 12.34 Lenny 45.34
Lenny 45.34 Apu 100.30
… do a cross product…
…select out all rows where account.balance is less than d.balance…
.. next we project…
34
balance(account) - account.balance(account.balance < d.balance (account x (d,account)))
account.account-number
account.balance
d.account-number
d.balance
Patty 12.34 Apu 100.30
Patty 12.34 Lenny 45.34
Lenny 45.34 Apu 100.30
.. next we project out account.balance…
…then we do a set difference between it and the original account.balance from the account relation…
… the set difference leaves us with one number, the largest value!
account.balance
12.34
12.34
45.34account-number
balance
Apu 100.30
Patty 12.34
Lenny 45.34
account
100.30
35
Now Using Conditional JoinNow Using Conditional Join
Find the largest account balance ...we will need to rename account relation as d...
balance(account) - account.balance(account.balance < d.balance (account x (d,account)))
(d,account)
balance(account) - account.balance(account account.balance < d.balance d)
36
More Examples on Sailors Relations
Sailors(sid, sname, rating, age)
Boats(bid, bname, color)
Reserves(sid, bid, day)
37
Find names of sailors who’ve reserved boat #103
• Solution 1:
• Solution 2:
• Solution 3:
)))Re(( (103
Sailorsservesbidsname
)Re1 (103
servesTempbid
SailorsTempTemp 12
)2(Re Tempsult sname
sname bidserves Sailors( (Re ))
103
38
Find names of sailors who’ve reserved a red boat
• Information about boat color only available in Boats; so need an extra join:
• A more efficient solution:
sname color redBoats serves Sailors((
' ') Re )
sname sid bid color redBoats s Sailors( ((
' ') Re ) )
A query optimizer can find this given the first solution!
Find sailors who’ve reserved a red or a green boat
• Can identify all red or green boats, then find sailors who’ve reserved one of these boats:
)(''''
Boatsgreencolorredcolor
Tempboats
)Re(Re SailorsservesTempboatssnamesult Can also define Tempboats using union! (How?)
What happens if is replaced by in this query?
Find sailors who’ve reserved a red and a green boat
• Previous approach won’t work! Must identify sailors who’ve reserved red boats, sailors who’ve reserved green boats, then find the intersection (note that sid is a key for Sailors):
)Re))(''
(( servesBoatsredcolorsid
Tempred
))((Re SailorsTempgreenTempredsnamesult
)Re))(''
(( servesBoatsgreencolorsid
Tempgreen
Consider yet another query
• Find the sailor(s) (sid) who reserved all the red boats.
sid bid day
22 101 10/10/96 22 103 10/11/96 58 102 11/12/96
R1
bid color
101 Red 102 Green 103 Red
B
Start an attempt
• who reserved what boat:
All the red boats:
sid bid
22 101 22 103 58 102
)1(,
1 Rbidsid
S
))((2 Bredcolorbid
S bid
101 103
43
Division OperationDivision Operation• Suited to queries that include the phrase “for all”, e.g. Find sailors who have
reserved all red boats.• Let S1 have 2 fields, x and y; S2 have only field y:
– S1/S2 =
– i.e., S1/S2 contains all x tuples (sailors) such that for every y tuple (redboat) in S2, there is an xy tuple in S1 (i.e, x reserved y).
• In general, x and y can be any lists of fields; y is the list of fields in S2, and xy is the list of fields of S1.
• Let r and s be relations on schemas R and S respectively where
– R = (A1, …, Am, B1, …, Bn)
– S = (B1, …, Bn)
The result of r / s is a relation on schema
R – S = (A1, …, Am)
r / s
€
x |∀ y in S2 (∃ x,y in S1) ⎧ ⎨ ⎪
⎩ ⎪
⎫ ⎬ ⎪
⎭ ⎪
44
Division Operation – ExampleDivision Operation – Example
Relations r, s:
r / s:
A
B
1
2
A B
12311134612
r
s
occurs in the presence of both 1 and 2, so it is returned. occurs in the presence of both 1 and 2, so it is returned. does not occur in the presence of both 1 and 2, so is ignored....
45
Another Division ExampleAnother Division Example
A B
aaaaaaaa
C D
aabababb
E
11113111
Relations r, s:
r /s:
D
ab
E
11
A B
aa
C
r
s
<, a , > occurs in the presence of both <a,1> and <b,1>, so it is returned.< , a , > occurs in the presence of both <a,1> and <b,1>, so it is returned.<, a , > does not occur in the presence of both <a,1> and <b,1>, so it is ignored.
Examples of Division A/Bsno pnos1 p1s1 p2s1 p3s1 p4s2 p1s2 p2s3 p2s4 p2s4 p4
pnop2
pnop2p4
pnop1p2p4
snos1s2s3s4
snos1s4
snos1
A
B1B2
B3
A/B1 A/B2 A/B3
47
Find the sailor(s) who reserved ALL red boats
• who reserved what boat:
• All the red boats:
sid bid
22 101 22 103 58 102
)1(,
1 Rbidsid
S
))((2 Bredcolorbid
S bid
101 103
=> S1/S2
Find the names of sailors who’ve reserved all boats
• Uses division; schemas of the input relations must be carefully chosen:
))((/))(Re,
(Tempsids Boatsbid
servesbidsid
)(Result SailorsTempsidssname
To find sailors who’ve reserved all ‘Interlake’ boats:
))(''
(/ BoatsInterlakebnamebid
.....
Some PropertiesSome Properties• Selection
• Projection
where li is a subset of attributes in R, li li+1
• A projection commutes with a selection that only uses attributes retained by the projection:
where c(a1,…,an) is a condition on attributes a1,…,an
c c c cR R1 2 2 1
(Commute)
l1 lnR R
l1
. . . (Cascade)
c cn c cnR R1 1 ... . . . (Cascade)
a1,..,an(c(a1,..,an) (R))= c(a1,..,an) (a1,..,an( R ))
Properties of JoinsProperties of Joins
Joins
where c(a1,…,an) is a condition on attributes only in R
(Commute)
(Associative)
(R cS) (S cR) R c1 (S c2T) (R c1S) c2T
c(R c S) c(R) c S
•
•
•
Additional Operator: Outer Additional Operator: Outer JoinJoin
• Joins: : match tuples satisfying a join condition..– Tuples without a matching are
discarded.– Tuples with Null value in at least one
join attribute are discarded.• Outer Joins: : useful when we want to keep
tuples in one or both of the involved relations in a join operation that do not satisfy the join condition. – Left outer join– Right outer join– Full outer join
• Schema of any Outer Join: Schema of Natural Join
52
Additional Operator: Outer Additional Operator: Outer JoinJoin
• An extension of the join operation that avoids loss of information.
• Computes the join and then adds tuples from one relation that does not match tuples in the other relation to the result of the join.
• Uses null values:– null signifies that the value is unknown or does not exist
– All comparisons involving null are (roughly speaking) false by definition.
• Will study precise meaning of comparisons with nulls later
53
Outer Join – ExampleOuter Join – Example
• Relation loan
Relation borrower customer-name loan-number
SimpsonWiggumFlanders
L-170L-230L-155
loan-number amount
L-170L-230L-260
300040001700
branch-name
SpringfieldShelbyville Dublin
54
Outer Join – ExampleOuter Join – Example
• Inner Join
loan Borrower
loan borrower
• Left Outer Join
loan-number amount
L-170L-230
30004000
customer-name
SimpsonWiggum
branch-name
SpringfieldShelbyville
loan-number amount
L-170L-230L-260
300040001700
customer-name
SimpsonWiggumnull
branch-name
SpringfieldShelbyvilleDublin
customer-name loan-number
SimpsonWiggumFlanders
L-170L-230L-155
loan-number amount
L-170L-230L-260
300040001700
branch-name
SpringfieldShelbyville Dublin
55
Outer Join – ExampleOuter Join – Example
Right Outer Join loan borrower
loan-number amount
L-170L-230L-155
30004000null
customer-name
SimpsonWiggumFlanders
loan-number amount
L-170L-230L-260L-155
300040001700null
customer-name
SimpsonWiggumnullFlanders
loan borrower
Full Outer Join
branch-name
SpringfieldShelbyvillenull
branch-name
SpringfieldShelbyvilleDublinnull
customer-name loan-number
SimpsonWiggumFlanders
L-170L-230L-155
loan-number amount
L-170L-230L-260
300040001700
branch-name
SpringfieldShelbyville Dublin
Full Outer JoinFull Outer Join
• Full Outer Join S R : keeps every tuple in both R and S, if not matching tuples are found in R (S), then the attributes in R (S) are set to Null
(S R) (S R)
Cost of Relational Algebra Cost of Relational Algebra OperationsOperations
• Given two relations R, S, such as :
CardinalityCardinality (R) = n, CardinalityCardinality(S) = m
• Cost O(n) c ( R )
a1,…,an ( R )
• Cost O(n x m)
R x S
When working with Cross-product (Joins), keep input relations as small as possible!
Cost of Relational Algebra Cost of Relational Algebra OperationsOperations
Let q= max(m,n),
Set Operations: R S, R S, R – S:Cost O(q log q)
Join Operations: R S, R cSOuter Joins Operations:
R S, R S, R S
Best Case : attributes in the join condition include the primary key in S: Cost O(q log q)
Otherwise : worst case : Cost O(m x n)
When working with these operations, keep input relations as small as possible
59
Summary• The relational model has rigorously defined query
languages that are simple and powerful.• Relational algebra is more operational; useful as
internal representation for query evaluation plans.• Several ways of expressing a given query; a query
optimizer should choose the most efficient version.
• Operations covered: 5 basic operations (selection, projection, union, set difference, cross product), rename, joins (natural join, equi-join, conditional join, outer joins), division