data warehousing 01 a relational algebra0
TRANSCRIPT
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 1/32
Tools Boot camp
Relational Algebra Operations in RDM
•Source : http://www.ics.uci.edu/~ics184/#lectures
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 2/32
SQL -- Historical Perspective
Relational Algebra (invented by Ted Codd in 1973)
SQL Language
SQL Server SQL (Transact-SQL) | Oracle 10G SQL (PL-SQL) | DB2 SQL …
ACCESS (SQL) …
For SQL Server 2005 …
SSIS Tool, Analysis Services Tool, Reporting Services Tool, etc. …
OLTP
OLAP
Purely, from Tools perspective
Database Programming Language (DBPL) Programming Languages (Java,VB, C, C++, …)
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 3/32
Outline
3
Relational Algebra (Retrieval)
A few set-based operators to manipulate relations:
Union, Intersection, Difference:
Usual set operators
Relations must have the same schema
Selection: choose rows from a relation.
Projection: choose columns from a relation.
Cartesian Product and Join: construct a new relation fromseveral relations
Renaming: rename a relation and its attributes
Combining basic operators to form expressions
Update operations (insert, delete and modify)
Remember: creation is done in DDL
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 4/32
Union , Intersection , Difference -
4
Set operators. Relations must have the same schema.
R(name, dept)
Name Dept
Jack Physics
Tom ICS
S(name, dept)
Name Dept
Jack Physics
Mary Math
Name Dept
Jack Physics
Tom ICSMary Math
RSName Dept
Jack Physics
R SName Dept
Tom ICS
R-S
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 5/32
Selection
5
c (R): return tuples in R that satisfy condition C.
Emp (name, dept, salary)
Name Dept Salary
Jane ICS 30KJack Physics 30K
Tom ICS 75K
Joe Math 40K
Jack Math 50K
s salary>35K (Emp)Name Dept Salary
Tom ICS 75K
Joe Math 40K
Jack Math 50K
s dept=ics and salary<40K (Emp)Name Dept Salary
Jane ICS 30K
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 6/32
Projection
6
A1,…,Ak (R): pick columns of attributes A1,…,Ak of R.
Emp (name, dept, salary)
name,dept (Emp)
Name Dept Salary
Jane ICS 30K
Jack Physics 30KTom ICS 75K
Joe Math 40K
Jack Math 50K
Name DeptJane ICS
Jack Physics
Tom ICS
Joe Math
Jack Math
name (Emp)
Name
Jane
Jack
Tom
Joe
Duplicates (“Jack”) eliminated.
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 7/32
Cartesian Product :
7
R S: pair each tuple r in R with each tuple s in S.
Emp (name, dept)
Name Dept
Jack Physics
Tom ICS
Contact(name, addr)
Name Addr
Jack Irvine
Tom LA
Mary Riverside
Emp Contact
E.name Dept C.Name Addr
Jack Physics Jack Irvine
Jack Physics Tom LAJack Physics Mary Riverside
Tom ICS Jack Irvine
Tom ICS Tom LA
Tom ICS Mary Riverside
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 8/32
Join
8
R S = s c (R S)C
Join condition C is of the form:
<cond_1> AND <cond_2> AND … AND <cond_k>
Each cond_i is of the form A op B, where:
A is an attribute of R, B is an attribute of S
op is a comparison operator: =, <, >, , , or .
Different types:
Theta-join
Equi-join
Natural join
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 9/32
Theta-Join
ResultR.A R.B S.C S.D
3 4 2 7
5 7 2 7
9
R S R.A>S.C
R(A,B) S(C,D)
R.A R.B S.C S.D3 4 2 7
3 4 6 8
5 7 2 7
5 7 6 8
C D
2 76 8
A B
3 45 7
R S
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 10/32
Theta-Join
10
Result
R(A,B) S(C,D)
C D
2 76 8
A B
3 45 7
R S R.A>S.C, R.B S.D
R.A R.B S.C S.D3 4 2 7
R.A R.B S.C S.D3 4 2 7
3 4 6 8
5 7 2 7
5 7 6 8
R S
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 11/32
Equi-Join
11
Special kind of theta-join: C only uses the equality operator.
R S R.B=S.D
R(A,B) S(C,D)
C D
2 76 8
A B
3 45 7
R.A R.B S.C S.D
5 7 2 7
R.A R.B S.C S.D
3 4 2 7
3 4 6 8
5 7 2 7
5 7 6 8
R S Result
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 12/32
Natural-Join
12
Relations R and S. Let L be the union of their attributes.
Let A1,…,Ak be their common attributes.
R S = L (R S)R.A1=S.A1,…,R.Ak=S.Ak
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 13/32
Natural-Join
13
Emp (name, dept)
Name Dept
Jack Physics
Tom ICS
Contact(name, addr)
Name Addr
Jack Irvine
Tom LA
Mary Riverside
Name Dept Addr
Jack Physics Irvine
Tom ICS LA
Emp Contact: all employee names, dept, and addresses.
Emp.name Emp.Dept Contact.name Contact.addr
Jack Physics Jack Irvine
Jack Physics Tom LA
Jack Physics Mary Riverside
Tom ICS Jack Irvine
Tom ICS Tom LA
Tom ICS Mary Riverside
Emp Contact
ResultSame as Equi-Join,except that one of theduplicate columns iseliminated
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 14/32
Renaming
14
Motivation: disambiguate attribute names. E.g., in R R, how to
differentiate the attributes from the two instances?
S(B1,…,Bn)(R)
A relation identical to R, with new attributes B1,…,Bn.
Emp(name, dept)
Name Dept
Jack Physics
Tom ICS
emp1(name1,dept1)(Emp)
Name1 Dept1Jack Physics
Tom ICS
Emp1(name1, dept1)
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 15/32
Renaming (cont)
List employees who work in the same department as Tom.
Name Dept Name1 Dept1
Jack Physics Jack Physics
Jack Physics Tom ICS
Jack Physics Mary ICS
Tom ICS Jack Physics
Tom ICS Tom ICS
Tom ICS Mary ICS
Mary ICS Jack Physics
Mary ICS Tom ICS
Mary ICS Mary ICS
Name Dept
Jack Physics
Tom ICS
Mary ICS
15
Emp (name, dept)
Name DeptJack Physics
Tom ICS
Mary ICS
emp1.name1( emp1(name1,dept1)(emp) s name=tom (emp))
emp1.dept1=emp.dept
Name1
Tom
Mary
Emp1(name1, dept1)
Result
Emp Emp1
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 16/32
Outer Joins
Motivation: “join” can lose information
E.g.: natural join of R and S loses info about Tom and Mary, since they
do not join with other tuples.
Called “dangling tuples”.
R
Name Dept
Jack Physics
Tom ICS
S
Name Addr
Jack Irvine
Mike LA
Mary Riverside
Outer join: natural join, but use NULL values to fill in danglingtuples. Remember “natural join” is similar to “equi join”
Three types: “left”, “right”, or “full”
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 17/32
Left Outer Join
R.name R.Dept S.name S.addrJack Physics Jack Irvine
Jack Physics Mike LA
Jack Physics Mary Riverside
Tom ICS Jack Irvine
Tom ICS Mike LA
Tom ICS Mary Riverside
RName DeptJack Physics
Tom ICS
SName AddrJack Irvine
Mike LA
Mary Riverside
Left outer joinR S
Name Dept AddrJack Physics Irvine
Tom ICS NULL
Pad null value for left dangling tuples.
R S
LEFT OUTER JOIN -- It is therelation from which we wishall rows returned, regardlessof whether there is a matchingaddress in the S relation.
List all names, depts and addressesfor all names listed in R
Assumes
“equality”
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 18/32
Right Outer Join Name AddrJack Irvine
Mike LA
Mary Riverside
RName DeptJack Physics
Tom ICS
S
Right outer join
R S
Name Dept Addr
Jack Physics Irvine
Mike NULL LA
Mary NULL Riverside
Pad null value for right dangling tuples.
R.name R.Dept S.name S.addr
Jack Physics Jack Irvine
Jack Physics Mike LA
Jack Physics Mary Riverside
Tom ICS Jack Irvine
Tom ICS Mike LA
Tom ICS Mary Riverside
R S
RIGHT OUTER JOIN -- It isthe relation from which wewish all rows returned,regardless of whether thereis a matching dept in the Rrelation.
List all names, depts and addresses
for all names listed in S
Assumes
“equality”
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 19/32
Full Outer Join
Name Addr
Jack Irvine
Mike LA
Mary Riverside
RName DeptJack Physics
Tom ICS
S
Full outer join
R S
Name Dept Addr
Jack Physics Irvine
Tom ICS NULL
Mike NULL LA
Mary NULL Riverside
Pad null values for both left and right dangling tuples.
R.name R.Dept S.name S.addr
Jack Physics Jack Irvine
Jack Physics Mike LA
Jack Physics Mary Riverside
Tom ICS Jack Irvine
Tom ICS Mike LA
Tom ICS Mary Riverside
R SList all names, depts and addresses
for all names listed in R and S
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 20/32
Combining Different Operations
20
Construct general expressions using basic operations.
Schema of each operation:
, , -: same as the schema of the two relations
Selection s : same as the relation’s schema
Projection : attributes in the projection
Cartesian product : attributes in two relations, use prefix toavoid confusion
Theta Join : same as
Natural Join : union of relations’ attributes, mergecommon attributes
Renaming: new renamed attributes
C
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 21/32
Equivalent Expressions
21
Expressions might be equivalent.
R S = R – (R – S)
How about the following?
(R S) – T = R (S – T)?
(R S) – T = R (S – T)?
A(R S) = A(R) A(S)? A(R - S) = A(R) - A(S)?
R S = L (R S)
R.A1=S.A1,…,R.Ak=S.Ak
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 22/32
Example 1
customer(ssn, name, city)
account(custssn, balance)
“List account balances of Tom.”
balance tomnamessncustssn customer account )))( (( s s
account
customer
s ssncustssn
balance
s name=tom
Tree representation
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 23/32
Example 1(cont)
customer(ssn, name, city)
account(custssn, balance)
“List account balances of Tom.”
account
customer
balance
s name=tom
ssn=custssn
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 24/32
Assignment Operator
24
Motivation: expressions can be complicated
Introduce names for intermediate relations, using the assignment operator“:=”
Then a query can be written as a sequential program consisting of aseries of assignments
balance tomnamessncustssn
customer account )))( (( s s
R1(ssn,name,city) := s name=tom (customer)R2(ssn,name,city,custssn,balance):= s custssn=ssn (account R1)
Answer(balance) := balance (R2) This sequentialprogram is SQL is
called “script”
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 25/32
Example 2
Find names of customers in Irvine or having a balance > 50K.
account
customer
s city=irvinecustssn=ssn
s balance>50Kcustomer
namename
customer(ssn, name, city)
account(custssn, balance)
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 26/32
Example 3
List the highest balance of all the customers.
account
acct1.balance
account
acct1
acct1.balance < account.balance
balance
account
–
account(custssn, balance)
Custssn balance
111 20K
222 15K
333 10K
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 27/32
Example 3
List the highest balance of all the customers.
Custssn balance
111 20K
222 15K
333 10K
Custssn balance
111 20K
222 15K
333 10K
account
acct1.balance
account
acct1
acct1.balance < account.balance
balance
account
–
account(custssn, balance)
Custssn balance
111 20K
222 15K333 10K
acct1 account
Acct1.Custssn
Acct1.balance Account.Custssn
Account.balance
111 20K 111 20K
111 20K 222 15K111 20K 333 10K
222 15K 111 20K
222 15K 222 15K
222 15K 333 10K
333 10K 111 20K
333 10K 222 15K
333 10K 333 10K
Acct1.balance
15K
10K
Account.balance
20K
15K
10K
20K
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 28/32
Example 3 (cont)
List the highest balance of all the customers.
Is the following expression correct?
account
account.balance
account
acct1
acct1.balance<account.balance
account(custssn, balance)
Custssn balance
111 20K
222 15K
333 10K
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 29/32
Example 3 (cont)
How about “the lowest balance”?
How about “the highest balance of customers in Irvine?”
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 30/32
Example 4
List the cities and names of the customers who have the highest balanceof all customers.
account
acct1.balance
account
acct1
acct1.balance<account.balance
balance
account
–
custssn=ssn
customer
name,city
balance=acct2.balance
account
acct2
highest balance
the customers
with this balance
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 31/32
Example 4: another expression?
List the cities and names of the customers who have the highest balanceof all customers.
account
acct1.custssn
account
acct1
acct1.balance<account.balance
custssn=ssn
customer
name,city
custssn
account
–
customer(ssn,name,city)
Ssn Name City
222 Tom irvine
account(custssno,balance)
Custssn balance
222 20K
222 50K333 50K
Tom has two accounts. One
of them is not the highest!
WRONG!
7/29/2019 Data Warehousing 01 a Relational Algebra0
http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 32/32
Limitation of Relational Algebra
Some queries cannot be represented
Example, recursive queries:
Table R(Parent,Child)
How to find all the ancestors of “Tom”?
Impossible to write this query in relational algebra.
More expressive languages needed:
E.g., Datalog