data warehousing 01 a relational algebra0

32
T ools Bo ot ca mp Relational Algebra Operations in RDM Source : http://www.ics.uci.edu/~ics184/#lectures  

Upload: srinivasan-sivaramakrishnan

Post on 04-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 1/32

Tools Boot camp

Relational Algebra Operations in RDM

•Source : http://www.ics.uci.edu/~ics184/#lectures 

Page 2: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 2/32

 SQL -- Historical Perspective

Relational Algebra (invented by Ted Codd in 1973)

SQL Language

SQL Server SQL (Transact-SQL) | Oracle 10G SQL (PL-SQL) | DB2 SQL …

ACCESS (SQL) … 

For SQL Server 2005 … 

SSIS Tool, Analysis Services Tool, Reporting Services Tool, etc. … 

OLTP

OLAP

Purely, from Tools perspective

Database Programming Language (DBPL) Programming Languages (Java,VB, C, C++, …) 

Page 3: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 3/32

Outline

3

Relational Algebra (Retrieval)

A few set-based operators to manipulate relations:

Union, Intersection, Difference:

Usual set operators

Relations must have the same schema

Selection: choose rows from a relation.

Projection: choose columns from a relation.

Cartesian Product and Join: construct a new relation fromseveral relations

Renaming: rename a relation and its attributes

Combining basic operators to form expressions

Update operations (insert, delete and modify)

Remember: creation is done in DDL

Page 4: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 4/32

Union  , Intersection  , Difference -

4

Set operators. Relations must have the same schema. 

R(name, dept)

Name Dept

Jack Physics

Tom ICS

S(name, dept)

Name Dept

Jack Physics

Mary Math

Name Dept

Jack Physics

Tom ICSMary Math

RSName Dept

Jack Physics

R SName Dept

Tom ICS

R-S

Page 5: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 5/32

 Selection 

5

c (R): return tuples in R that satisfy condition C. 

Emp (name, dept, salary)

Name Dept Salary

Jane ICS 30KJack Physics 30K

Tom ICS 75K

Joe Math 40K

Jack Math 50K

s  salary>35K  (Emp)Name Dept Salary

Tom ICS 75K

Joe Math 40K

Jack Math 50K

s dept=ics and salary<40K (Emp)Name Dept Salary

Jane ICS 30K

Page 6: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 6/32

Projection   

6

A1,…,Ak (R):  pick columns of attributes A1,…,Ak of R. 

Emp (name, dept, salary)

name,dept (Emp)

Name Dept Salary

Jane ICS 30K

Jack Physics 30KTom ICS 75K

Joe Math 40K

Jack Math 50K

Name DeptJane ICS

Jack Physics

Tom ICS

Joe Math

Jack Math

name (Emp)

Name

Jane

Jack 

Tom

Joe

Duplicates (“Jack”) eliminated. 

Page 7: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 7/32

Cartesian Product :   

7

R S: pair each tuple r in R with each tuple s in S.

Emp (name, dept)

Name Dept

Jack Physics

Tom ICS

Contact(name, addr)

Name Addr

Jack Irvine

Tom LA

Mary Riverside

Emp Contact

E.name Dept C.Name Addr

Jack Physics Jack Irvine

Jack Physics Tom LAJack Physics Mary Riverside

Tom ICS Jack Irvine

Tom ICS Tom LA

Tom ICS Mary Riverside

Page 8: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 8/32

 Join

8

R S = s c (R S)C 

Join condition C is of the form:

<cond_1> AND <cond_2> AND … AND <cond_k> 

Each cond_i is of the form A op B, where:

A is an attribute of R, B is an attribute of S

op is a comparison operator: =, <, >, , , or  .

Different types:

Theta-join

Equi-join

Natural join

Page 9: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 9/32

Theta-Join

ResultR.A R.B S.C S.D

3 4 2 7

5 7 2 7

9

R S R.A>S.C 

R(A,B) S(C,D)

R.A R.B S.C S.D3 4 2 7

3 4 6 8

5 7 2 7

5 7 6 8

C D

2 76 8

A B

3 45 7

R S

Page 10: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 10/32

Theta-Join

10

Result

R(A,B) S(C,D)

C D

2 76 8

A B

3 45 7

R S R.A>S.C, R.B  S.D

R.A R.B S.C S.D3 4 2 7

R.A R.B S.C S.D3 4 2 7

3 4 6 8

5 7 2 7

5 7 6 8

R S

Page 11: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 11/32

Equi-Join

11

Special kind of theta-join: C only uses the equality operator.

R S R.B=S.D

R(A,B) S(C,D)

C D

2 76 8

A B

3 45 7

R.A R.B S.C S.D

5 7 2 7

R.A R.B S.C S.D

3 4 2 7

3 4 6 8

5 7 2 7

5 7 6 8

R S Result

Page 12: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 12/32

Natural-Join

12

Relations R and S. Let L be the union of their attributes.

Let A1,…,Ak be their common attributes. 

R S =   L (R S)R.A1=S.A1,…,R.Ak=S.Ak  

Page 13: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 13/32

Natural-Join

13

Emp (name, dept)

Name Dept

Jack Physics

Tom ICS

Contact(name, addr)

Name Addr

Jack Irvine

Tom LA

Mary Riverside

Name Dept Addr

Jack Physics Irvine

Tom ICS LA

Emp Contact: all employee names, dept, and addresses.

Emp.name Emp.Dept Contact.name Contact.addr

Jack Physics Jack Irvine

Jack Physics Tom LA

Jack Physics Mary Riverside

Tom ICS Jack Irvine

Tom ICS Tom LA

Tom ICS Mary Riverside

Emp Contact

ResultSame as Equi-Join,except that one of theduplicate columns iseliminated

Page 14: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 14/32

Renaming 

14

Motivation: disambiguate attribute names. E.g., in R R, how to

differentiate the attributes from the two instances?

  S(B1,…,Bn)(R)

A relation identical to R, with new attributes B1,…,Bn. 

Emp(name, dept)

Name Dept

Jack Physics

Tom ICS

  emp1(name1,dept1)(Emp)

Name1 Dept1Jack Physics

Tom ICS

Emp1(name1, dept1)

Page 15: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 15/32

Renaming  (cont)

List employees who work in the same department as Tom.

Name Dept Name1 Dept1

Jack Physics Jack Physics

Jack Physics Tom ICS

Jack Physics Mary ICS

Tom ICS Jack Physics

Tom ICS Tom ICS

Tom ICS Mary ICS

Mary ICS Jack Physics

Mary ICS Tom ICS

Mary ICS Mary ICS

Name Dept

Jack Physics

Tom ICS

Mary ICS

15

Emp (name, dept)

Name DeptJack Physics

Tom ICS

Mary ICS

emp1.name1(  emp1(name1,dept1)(emp) s  name=tom (emp))

emp1.dept1=emp.dept

Name1

Tom

Mary

Emp1(name1, dept1)

Result

Emp Emp1

Page 16: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 16/32

Outer Joins

Motivation: “join” can lose information 

E.g.: natural join of R and S loses info about Tom and Mary, since they

do not join with other tuples.

Called “dangling tuples”. 

R

Name Dept

Jack Physics

Tom ICS

S

Name Addr

Jack Irvine

Mike LA

Mary Riverside

Outer join: natural join, but use NULL values to fill in danglingtuples. Remember “natural join” is similar to “equi join” 

Three types: “left”, “right”, or “full” 

Page 17: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 17/32

Left Outer Join

R.name R.Dept S.name S.addrJack Physics Jack Irvine

Jack Physics Mike LA

Jack Physics Mary Riverside

Tom ICS Jack Irvine

Tom ICS Mike LA

Tom ICS Mary Riverside

RName DeptJack Physics

Tom ICS

SName AddrJack Irvine

Mike LA

Mary Riverside

Left outer joinR S

Name Dept AddrJack Physics Irvine

Tom ICS NULL

Pad null value for left dangling tuples.

R  S

LEFT OUTER JOIN -- It is therelation from which we wishall rows returned, regardlessof whether there is a matchingaddress in the S relation.

List all names, depts and addressesfor all names listed in R

 Assumes

“equality” 

Page 18: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 18/32

Right Outer Join Name AddrJack Irvine

Mike LA

Mary Riverside

RName DeptJack Physics

Tom ICS

S

Right outer join

R S

Name Dept Addr

Jack Physics Irvine

Mike NULL LA

Mary NULL Riverside

Pad null value for right dangling tuples.

R.name R.Dept S.name S.addr

Jack Physics Jack Irvine

Jack Physics Mike LA

Jack Physics Mary Riverside

Tom ICS Jack Irvine

Tom ICS Mike LA

Tom ICS Mary Riverside

R  S

RIGHT OUTER JOIN -- It isthe relation from which wewish all rows returned,regardless of whether thereis a matching dept in the Rrelation.

List all names, depts and addresses

for all names listed in S

 Assumes

“equality” 

Page 19: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 19/32

Full Outer Join

Name Addr

Jack Irvine

Mike LA

Mary Riverside

RName DeptJack Physics

Tom ICS

S

Full outer join

R S

Name Dept Addr

Jack Physics Irvine

Tom ICS NULL

Mike NULL LA

Mary NULL Riverside

Pad null values for both left and right dangling tuples.

R.name R.Dept S.name S.addr

Jack Physics Jack Irvine

Jack Physics Mike LA

Jack Physics Mary Riverside

Tom ICS Jack Irvine

Tom ICS Mike LA

Tom ICS Mary Riverside

R  SList all names, depts and addresses

for all names listed in R and S

Page 20: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 20/32

Combining Different Operations

20

Construct general expressions using basic operations.

Schema of each operation:

, , -: same as the schema of the two relations

Selection s  : same as the relation’s schema 

Projection : attributes in the projection

Cartesian product  : attributes in two relations, use prefix toavoid confusion 

Theta Join : same as  

 Natural Join : union of relations’ attributes, mergecommon attributes

Renaming: new renamed attributes

Page 21: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 21/32

Equivalent Expressions

21

Expressions might be equivalent.

R   S = R – (R – S)

How about the following?

(R S) – T = R (S – T)?

(R S) – T = R (S – T)?

A(R S) = A(R)   A(S)? A(R - S) = A(R) -  A(S)?

R S =   L (R S)

R.A1=S.A1,…,R.Ak=S.Ak  

Page 22: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 22/32

Example 1

customer(ssn, name, city)

account(custssn, balance)

“List account balances of Tom.” 

balance tomnamessncustssn customer account  )))( (( s s 

account

customer

 s  ssncustssn

balance

s  name=tom

Tree representation

Page 23: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 23/32

Example 1(cont)

customer(ssn, name, city)

account(custssn, balance)

“List account balances of Tom.” 

account

customer

balance

s  name=tom

ssn=custssn

Page 24: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 24/32

 Assignment Operator 

24

Motivation: expressions can be complicated

Introduce names for intermediate relations, using the assignment operator“:=” 

Then a query can be written as a sequential program consisting of aseries of assignments

balance tomnamessncustssn

customer account  )))( (( s s 

R1(ssn,name,city) := s name=tom (customer)R2(ssn,name,city,custssn,balance):= s custssn=ssn (account R1)

Answer(balance) := balance (R2) This sequentialprogram is SQL is

called “script” 

Page 25: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 25/32

Example 2

Find names of customers in Irvine or having a balance > 50K. 

account

customer

s  city=irvinecustssn=ssn

s  balance>50Kcustomer

namename

 customer(ssn, name, city)

account(custssn, balance)

Page 26: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 26/32

Example 3

List the highest balance of all the customers.

account

acct1.balance

account

  acct1

acct1.balance < account.balance

balance

account

 –  

account(custssn, balance)

Custssn balance

111 20K

222 15K

333 10K

Page 27: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 27/32

Example 3

List the highest balance of all the customers.

Custssn balance

111 20K

222 15K

333 10K

Custssn balance

111 20K

222 15K

333 10K

account

acct1.balance

account

  acct1

acct1.balance < account.balance

balance

account

 –  

account(custssn, balance)

Custssn balance

111 20K

222 15K333 10K

acct1 account

Acct1.Custssn

Acct1.balance Account.Custssn

Account.balance

111 20K 111 20K

111 20K 222 15K111 20K 333 10K

222 15K 111 20K

222 15K 222 15K

222 15K 333 10K

333 10K 111 20K

333 10K 222 15K

333 10K 333 10K

Acct1.balance

15K

10K

Account.balance

20K

15K

10K

20K

Page 28: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 28/32

Example 3 (cont)

List the highest balance of all the customers.

Is the following expression correct?

account

account.balance

account

  acct1

acct1.balance<account.balance

account(custssn, balance)

Custssn balance

111 20K

222 15K

333 10K

Page 29: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 29/32

Example 3 (cont)

How about “the lowest balance”? 

How about “the highest balance of customers in Irvine?” 

Page 30: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 30/32

Example 4

List the cities and names of the customers who have the highest balanceof all customers.

account

acct1.balance

account

  acct1

acct1.balance<account.balance

balance

account

 –  

custssn=ssn

customer

name,city

balance=acct2.balance

account

  acct2

highest balance

the customers

with this balance

Page 31: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 31/32

Example 4: another expression?

List the cities and names of the customers who have the highest balanceof all customers.

account

acct1.custssn

account

  acct1

acct1.balance<account.balance

custssn=ssn

customer

name,city

custssn

account

 –  

customer(ssn,name,city)

Ssn Name City

222 Tom irvine

account(custssno,balance)

Custssn balance

222 20K

222 50K333 50K

Tom has two accounts. One

of them is not the highest!

WRONG!

Page 32: Data Warehousing 01 a Relational Algebra0

7/29/2019 Data Warehousing 01 a Relational Algebra0

http://slidepdf.com/reader/full/data-warehousing-01-a-relational-algebra0 32/32

Limitation of Relational Algebra

Some queries cannot be represented

Example, recursive queries:

Table R(Parent,Child)

How to find all the ancestors of “Tom”? 

Impossible to write this query in relational algebra.

More expressive languages needed:

E.g., Datalog