15/1/20091 lecture 2 on relational algebra this lecture introduces relational data model with...

35
15/1/2009 1 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational operations can be derived from the set operations of Select, Project, Join, Semi-join, Union, Intersect, Difference, Natural Join, Natural Semi-join, Outer Join and Cartesian Product.

Post on 19-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 1

Lecture 2 on Relational Algebra

This lecture introduces relational data model with relational algebra as its mathematical foundation Relational operations can be derived from the set operations of Select, Project, Join, Semi-join, Union, Intersect, Difference, Natural Join, Natural Semi-join, Outer Join and Cartesian Product.

Page 2: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 2

Relational databaseA relational database has been defined as a collection of tables. The major advantages

of the relational approach are its simplicity and generality as follows:

• An interface for a high-level, non-procedural data language

• Efficient file structures store the database

• An efficient optimizer to help meet the response-time requirements

• User views and snapshots of the stored database.

• Integrity control – validation of semantic constraints on the database

• Concurrence control – synchronization updates to a shared database by multiple users

• Selective access control – authorization of access privileges to one user’s database

• Recovery from both soft and hard crashes.

• A report generator for a display of the results of interactions against the database.

Page 3: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 3

Relational model

Relational systems are based on an underlying set of theoretical ideas known as the relational model. The relational model can be characterized as a way of looking at data. It is concerned with three aspects of data: data structure, data integrity, and data manipulation.

Page 4: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 4

Relational data structure• Relation – may be seen as a table or record type.

• Attribute – All values that occur in a specific field type, or column.

• Tuple – A row or record occurrence of a table

• Domain – A set of possible values for some attributes.

• Primary key – Record identifier for uniquely identifying rows in a relation without null value.

• Candidate key – Any set of attributes that could be chosen as a key of a relation.

• Composite key – A primary key consisting of more than one attribute.

• Foreign key – A set of attributes in one relation that constitute a key in some other relation; used to indicate logical links between relations.

• Domain – domains are pools of values, from which the actual values appearing in attributes are drawn.

Page 5: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 5

Relations

Relation may be seen as a table. A table provides a natural mechanism for conveying information in a compact form. In a table, there is a number of column, one for each attribute of the objects described. Each entry in the table is a row containing values for each attribute, i.e. a tuple.

Page 6: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 6

S# SNAME STATUS CITY

S1 Smith 20 London

S2 Jones 10 Paris

S3 Blake 30 Paris

S4 Clark 20 London

S5 Adams 30 Athens

P# PNAME COLOR WEIGHT CITY

P1 Nut Red 12 London

P2 Bolt Green 17 Paris

P3 Screw Blue 17 Rome

P4 Screw Red 14 London

P5 Cam Blue 12 Paris

P6 Cog Red 19 London

S# P# QTY

S1 P1 300

S1 P2 200

S1 P3 400

S1 P4 200

S1 P5 100

S1 P6 100

S2 P1 300

S2 P2 400

S3 P2 200

S4 P2 200

S4 P4 300

S4 P5 400

S

P

SP

The Suppliers-and-parts database (sample values)

Page 7: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 7

Relational Schema (Data Definition Language)• Create Table – create relation as a relational schema.

• Create View – create virtual (read only) relation

• Create Index – create indexes part of relation

• Alter Table – change relation structure

• Drop Table – delete relation from relational schema

• Drop View – delete virtual relation

• Drop Index – delete indexes part of relation

Page 8: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 8

Relational Operations

• Select – exacts specified tuples from a specific relation.

• Project – exacts specified attributes from a specific relation.

• Join – builds a relation from 2 specified relations consisting of all possible concatenated pairs of tuples such that the two tuples satisfy some specified condition.

• Divide – takes two relations, one binary and one unary, and builds a relation consisting of all values of one attribute of the binary relation that match all values in the unary relation.

Page 9: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 9

• Natural Join – A special case of the join operation R NJN S is an equijoin in which equalities are specified on all fields having the same name in relation R and relation S.

• Semi Join – A special case of the join operation R SJ S is a projection of all fields (attributes) of the first relation operand after a join operation of relation R and relation S.

• Natural Semi Join - A special case of the natural join operation R NSJ S is a projection of all fields (attributes) of the first relation operand after a natural join operation of relation R and relation S.

Page 10: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 10

Mathematical Set Operations

• Union – builds a relation consisting of all tuples appearing in either of both of two specified relations.

• Intersection – builds a relation consisting of all tuples appearing in both of two specified relations.

• Difference – builds a relation consisting of all tuples appearing in the first and not the second of two specified relations.

• Cartesian product – builds a relation from 2 specified relations consisting of all possible concatenated pairs of tuples, one from each of the two specified relations.

Page 11: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 11

Overview of Relational Algebra

a

b

c

X

Y

a x

a y

b x

b y

c x

c y

a1 b1

a2 b1

a3 b2

b1 c1

b2 c2

b3 c3

a1 b1 c1

a2 b1 c1

a3 b2 c2

aX

Z

a x

a y

a z

b x

c y

Select Project Product

Union Intersection Difference

Natural Join

Divide

Page 12: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

26/1/2009 12

Operations of Relational Algebra

Semi-

Page 13: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 13

Formal specification of relational operationsSelect – Selection is called theta-selection in which theta represent any valid scalar c

omparison operator. For example, R where R.X theta R.Y where R is a relation. X and Y are the attributes of R, or constant values which must be on the same domain. Theta is a valid scalar comparison operator such as =, , >, <, > or < etc.

Relational algebra Form: SLF R

where SL means select operation, F means formula and R is the operand relation.

For instance, SLCity=‘London’ S

Result

S# SNAME STATUS CITY

S1 Smith 20 London

S4 Clark 20 London

Page 14: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 14

Project – the projection operator yields a “vertical” subset of a given relation. For example, R[X,Y…Z] where R is a relation and X Y..Z are attributes of R.

Relational algebra form: PJAttr R

where PJ is project operation, attr is the attributes to be projects and Relation R is the operand

For instance, PJ S#, Sname, Status, City S

Result

S# SNAME STATUS CITY

S1 Smith 20 London

S2 Jones 10 Paris

S3 Blake 30 Paris

S4 Clark 20 London

S5 Adams 30 Athens

Page 15: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

26/1/2009 15

Join – the Join operation is called theta-join which is not a primitive operation. If theta is a formula, then it is a join operation based on the selection criteria of the formula and then perform the cartesian product.

Relational algebra form: R JNF S

where JN is a join operation, F is the join formula and relations R and S are operands

For instance, S JN S=S1 or S=S2 or S=S3 SP

Result

200P2Paris30BlakeS3

400P2Paris10JonesS2

300P1Paris10JonesS2

100P6London20SmithS1

100P5London20SmithS1

200P4London20SmithS1

400P3London20SmithS1

200P2London20SmithS1

300P1London20SmithS1

SP.QTYSP.P#S.CityS.StatusS.SnameS.S#

200P2Paris30BlakeS3

400P2Paris10JonesS2

300P1Paris10JonesS2

100P6London20SmithS1

100P5London20SmithS1

200P4London20SmithS1

400P3London20SmithS1

200P2London20SmithS1

300P1London20SmithS1

SP.QTYSP.P#S.CityS.StatusS.SnameS.S# SP.S#

S1

S1S1

S1

S1

S1

S2

S2

S2

Page 16: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 16

Semi Join – the Join operation is called theta-join which is not a primitive operation. If theta is a formula, then it is a join operation based on the selection criteria of the formula and then perform the cartesian product with the result including the attributes of first operand only.

Relational algebra form: R SJF S

where SJF is semi join operation, F is the join formula and relations R and S are operands

For instance, S SJ S=S1 or S=S2 or S=S3 SP

Result

S# SNAME STATUS CITY

S1 Smith 20 London

S2 Jones 10 Paris

S3 Blake 30 Paris

Page 17: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 17

Natural Join – the Join operation is called theta-join which is not a primitive operation. If theta is equality, the theta-join is an equijoin. The result of an equijoin must include 2 identical attributes. If one of those two attributes is eliminated, it is a natural join.

Relational algebra form: R NJN S

where NJN is natural join operation, and relations R and S are operands

For instance, S NJN SP

Result S.S# S.Sname S.Status S.City SP.P# SP.QTY

S1 Smith 20 London P1 300

S1 Smith 20 London P2 200

S1 Smith 20 London P3 400

S1 Smith 20 London P4 200

S1 Smith 20 London P5 100

S1 Smith 20 London P6 100

S2 Jones 10 Paris P1 300

S2 Jones 10 Paris P2 400

S3 Blake 30 Paris P2 200

S4 Clark 20 London P2 200

S4 Clark 20 London P4 300

S4 Clark 20 London P5 400

Page 18: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

19/1/2009 18

Natural Semi Join – the Join operation is called theta-join which is not a primitive operation. If theta is equality, the theta-join is an equijoin. The result of an equijoin must include 2 identical attributes. If one of those two attributes is eliminated, and result are attributes of first operand only, then it is a natural semi join.

Relational algebra form: R NSJ S

where NSJ is natural semi join operation, and relations R and S are operands

For instance, S NSJ SP

Result

S# SNAME STATUS CITY

S1 Smith 20 London

S2 Jones 10 Paris

S3 Blake 30 Paris

S4 Clark 20 London

Page 19: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

19/1/2009 19

Divide – the division operator divides a dividend relation A of degree m by a divisor relation B of degree n and produces a quotient relation of degree m.

(Dividend ÷ Divisor = Quotient)Relational algebra form: A DI BAttr

where DI is divide operation, attr are attributes of divisor and relations A and B are operands.

For instance: SP DI DORS#

Given SP DI DOR => Result

S#

S1

S2

P#

P1

P2

S# P# QTY

S1 P1 300

S1 P2 200

S1 P3 400

S1 P4 200

S1 P5 100

S1 P6 100

S2 P1 300

S2 P2 400

S3 P2 200

S4 P2 200

S4 P4 300

S4 P5 400

Page 20: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 20

Union –For all set operations, the 2 relations must be union compatible of the same degree and domain. Union of two (union-compatible) relations is the set of all tuples belonging to either one or both.

Relational algebra form: A UN B

where UN is union operation, relations A and B are operands

For instance: J UN HS

Result

S# NAME MAJOR

123 JONES HISTORY

158 PARKS MATH

271 SMITH HISTORY

NUM NAME INTEREST

105 ANDERSON MANAGEMENT

123 JONES HISTORY

S#/Num Name Major/Interest

123 Jones History

158 Parks Math

271 Smith History

105 Anderson Management

JHS

S# CNAME POS#

123 H350 1

105 BA490 3

123 BA490 7

E

Page 21: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 21

Intersection – intersection is a set operation. Intersection of 2 compatible relations is the set of all tuples belonging to both relations. The result has all the tuples that occur in both relations.

Relational algebra form: A IN B

where IN is intersection operation, relations A and B are operands

For instance: J IN HS

Result:

S#/Num Name Major/Interest

123 Jones History

Page 22: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 22

Difference – difference is a set operation. Difference between 2 compatible relations is the set of all tuples belonging to one relation and not to another.

Relational algebra form: A DF B

where DF is difference operation, relations A and B are operands.

For instance, J DF HS

Result:S#/Num Name Major/Interest

158 Parks Math

271 Smith History

Page 23: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 23

Extended Cartesian Product – it is a set operation. The extended Cartesian Product of 2 relations is the set of all tuples such that the set is the concatenation of a tuple belonging to one relation and a tuple belonging to another.

Relational algebra form: A CP B

where CP is Cartesian product and relations A and B are operands.

For instance: J CP E

Result: s1 Name Major Snum Cname Pnum

123 Jones History 123 H350 1

123 Jones History 105 BA490 3

123 Jones History 123 BA490 7

158 Parks Math 123 H350 1

158 Parks Math 105 BA490 3

158 Parks Math 123 BA490 7

271 Smith History 123 H350 1

271 Smith History 105 BA490 3

271 Smith History 123 BA490 7

Page 24: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 24

TF

TFD

FD

Outer Join

Some tuples of two tables may not match their values according to join condition, which results in null value and is called Outer Join. In the following example: TF outer Join FD

Into TFD

Teaching Assistant Faculty Member

John Vincent

Mary Vincent

Tim Tom

Faculty member Department

Vincent CS

Tom Math

Chow Math

Teaching Assistant Faculty Member Department

John Vincent CS

Mary Vincent CS

Tim Tom Math

Null Chow Math

Page 25: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 25

Outer Union

The outer union operation is to take the union of two relations if the relations are not union compatible, then the tuples that have no values for these attributes are padded with null values.

Page 26: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 26

HistoryJones123

Major/InterestNameS#/Num

HistoryJones123

Major/InterestNameS#/Num

HistoryJones123

Major/InterestNameS#/Num

HistoryJones123

Major/InterestNameS#/Num

=è Union

SwimmingSmith271

TennisParks158

HobbyNameS#/Num

SwimmingSmith271

TennisParks158

HobbyNameS#/Num

Hobby

NULL

158 Parks NULL Tennis

271 Smith NULL Swimming

Example on Outer Union

Page 27: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 27

Referential integrity

A child relation must have a foreign key referring to the primary key of its parent table. In creation, a parent relation must be created first before child relation. In deletion, a child relation must be deleted first before parent relation.

For example, a department has many employees. Relation department is a parent relation and relation employee is a child relation which has a foreign key referring to its parent relation.

Page 28: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 28

Parent Relation Department (Dept_name,….)

Child Relation Employee (EMP_id, ….*Dept_name)

Insert parent relation Department tuple before inserting correspondent child relation Employee tuple.

Delete child relation Employee tuple before deleting correspondent parent relation Department tuple.

Referential integrity

Page 29: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 29

Entity integrity

When creating a relation, the primary key value must be unique and cannot be null value.

For example, when creating a relation student, Relation Student (Student_id, address)

Where the student id is used as a primary key and cannot be null value.

Page 30: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 30

Relational algebra expressionRelational algebra execution sequence is from right to left with parenthesis:

For example, given a relation S (supplier) and relation SP (supplier parts), where are the cities of the supplier who supply more than 300 parts?

Perform the following relational algebra:PJ CITY (S NJN (SL QTY > 300 SP) )

Processing sequence1. SL QTY > 300 SP

2. S NJN SP

3. PJ CITY SThe result is CITY

LondonParis

Page 31: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 31

Operator tree

PJ City

NJN

SSL QTY>300

SP

Page 32: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 32

Lecture summary

All relational database operations can be expressed in relational algebra. By using set theory, we can optimize these operations. In general, reducing the operands size as early as possible is a way for efficient processing of relational operations.

For example, with same operand relations, the size of the resultant semi-join operation tuples is smaller of the size of the resultant join operation tuples.

Page 33: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 33

Review Question

What are the major differences among Join, Semi Join, Natural Join and Natural Semi Join operations in terms of resultant attributes?

What is the ranking in sequence of efficiency of these four operations and why?

Page 34: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 34

Tutorial Question Given the following relations:

and the nine elementary operations of relational algebra:Selection (SL), projection (PJ), union (UN), cartesian product (CP), join (JN),

natural join (NJN), semi-join (SJ), natural semi-join (NSJ) and difference (DI).Use no more than Three of the nine elementary operations to implement the following query operation:“Find the employee#, responsibility and duration of all employees who are assigned to work on job under project Database Development”.

(a) Show the three elementary operations in a relational algebra(b) Show the three elementary operations in an operator tree

Page 35: 15/1/20091 Lecture 2 on Relational Algebra This lecture introduces relational data model with relational algebra as its mathematical foundation Relational

15/1/2009 35

Reading Assignment

Pages 167-195 of Chapter 6 Relational Algebra and Calculus of “Fundamentals of Database Systems” 5th editon, by Elmasri & Navathe, Pearson, 2007.