lec02

35
15/2/2012 DB2 Dr.Suresh Sankaranarayanan 1 DB2 Database Management Systems Relational Algebra

Upload: pessuresh

Post on 31-May-2015

439 views

Category:

Education


4 download

DESCRIPTION

Relational Algebra

TRANSCRIPT

Page 1: Lec02

15/2/2012DB2 Dr.Suresh Sankaranarayanan 1

DB2 Database Management Systems

Relational Algebra

Page 2: Lec02

15/2/2012

DB2 Dr.Suresh Sankaranarayanan

2

Textbook Used

Database System Concepts by Abraham Database System Concepts by Abraham Silberschatz, 5Silberschatz, 5thth Edition Edition

Page 3: Lec02

15/2/2012DB2 Dr.Suresh Sankaranarayanan 3

Outline

RelationRelation

KeysKeys

Relational Algebra operationsRelational Algebra operations

Additional Relational-Algebra operationsAdditional Relational-Algebra operations

Extended Relational –Algebra OperationsExtended Relational –Algebra Operations

Modification of DatabaseModification of Database

Page 4: Lec02

Basic Structure of Relation

Let DLet D11, , DD22, …. , …. DDnn a relation a relation r r is a subset of is a subset of

DD11 x x DD2 2 x … x x … x DDnn

Thus, a relation is a set of Thus, a relation is a set of nn-tuples (-tuples (aa11,, a a22, …, , …, aann) where each ) where each aaii DDii

Example: IfExample: If

customer_name = {Jones, Smith, Curry, Lindsay, …}

/* Set of all customer names */

customer_street = {Main, North, Park, …} /* set of all street names*/

customer_city = {Harrison, Rye, Pittsfield, …} /* set of all city names */

Then r = { (Jones, Main, Harrison),

(Smith, North, Rye),

(Curry, North, Rye),

(Lindsay, Park, Pittsfield) }

is a relation over

customer_name x customer_street x customer_citycustomer_name x customer_street x customer_city

15/2/2012DB2 Dr.Suresh Sankaranarayanan 4

Page 5: Lec02

Relation Schema

AA11, , AA22, …, , …, AAnn are are attributesattributes

RR = ( = (AA11, , AA22, …, , …, AAnn ) is a ) is a relation schemarelation schema

Example:Example:

Customer_schemaCustomer_schema = ( = (customer_name, customer_street, customer_name, customer_street,

customer_citycustomer_city))

rr((RR) denotes a ) denotes a relationrelation rr on the on the relation schema Rrelation schema R

Example:Example:

customer (Customer_schema)customer (Customer_schema)

15/2/2012DB2 Dr.Suresh Sankaranarayanan 5

Page 6: Lec02

Relation Instance

The current values (The current values (relation instancerelation instance) of a relation are ) of a relation are specified by a tablespecified by a table

An element An element tt of of rr is a is a tupletuple, represented by a , represented by a row row in a in a tabletable

15/2/2012DB2 Dr.Suresh Sankaranarayanan 6

JonesSmithCurryLindsay

customer_name

MainNorthNorthPark

customer_street

HarrisonRyeRyePittsfield

customer_city

customer

attributes(or columns)

tuples(or rows)

Page 7: Lec02

Keys Values of the attribute must be such that they can uniquely identify Values of the attribute must be such that they can uniquely identify

the tuple.the tuple.

Based on that there are three keys defined :Based on that there are three keys defined :

a. Superkeya. Superkey

b. Candidate Keyb. Candidate Key

c. Primary key.c. Primary key.

Superkey Superkey

Set of one or more attributes taken collectively to identify uniquely a Set of one or more attributes taken collectively to identify uniquely a tuple in a relationtuple in a relation

Example: Customer_id attribute of relation customer is sufficient to Example: Customer_id attribute of relation customer is sufficient to distinguish one customer tuple from another. Customer_id is distinguish one customer tuple from another. Customer_id is superkey.superkey.

15/2/2012DB2 Dr.Suresh Sankaranarayanan 7

Page 8: Lec02

Keys(contd..)

Candidate key Candidate key

Several distinct set of attributes serve as candidate keySeveral distinct set of attributes serve as candidate key

Example: Combination of {customer_name, customer_street} is Example: Combination of {customer_name, customer_street} is sufficient to distinguish among members of relation customer. So sufficient to distinguish among members of relation customer. So both customer_id and {Customer_name , customer_street} form both customer_id and {Customer_name , customer_street} form candidate keys.candidate keys.

Primary key: Primary key:

Candidate key chosen as the principal means of identifying tuples Candidate key chosen as the principal means of identifying tuples

within a relation. Should choose an attribute whose value never, or within a relation. Should choose an attribute whose value never, or

very rarely, changes.very rarely, changes.

Example: email address is unique, but may change

15/2/2012DB2 Dr.Suresh Sankaranarayanan 8

Page 9: Lec02

Keys(contd..)

A relation schema may have an attribute that corresponds to the primary A relation schema may have an attribute that corresponds to the primary key of another relation. The attribute is called a foreign key.key of another relation. The attribute is called a foreign key. E.g. customer_name and account_number attributes of depositor are

foreign keys to customer and account respectively. Only values occurring in the primary key attribute of the referenced

relation may occur in the foreign key attribute of the referencing relation.

15/2/2012DB2 Dr.Suresh Sankaranarayanan 9

Page 10: Lec02

Query Languages

Language in which user requests information from the database.Language in which user requests information from the database.

Categories of languagesCategories of languages Procedural Non-procedural, or declarative

““Pure” languages:Pure” languages: Relational algebra Tuple relational calculus Domain relational calculus

Pure languages form underlying basis of query languages that people use.Pure languages form underlying basis of query languages that people use.

15/2/2012DB2 Dr.Suresh Sankaranarayanan 10

Page 11: Lec02

Relational Algebra

Procedural languageProcedural language

Six basic operatorsSix basic operators

select: project: union: set difference: – Cartesian product: x rename:

The operators take one or two relations as inputs and produce a new The operators take one or two relations as inputs and produce a new relation as a result.relation as a result.

15/2/2012DB2 Dr.Suresh Sankaranarayanan 11

Page 12: Lec02

Banking Example

branch (branch_name, branch_city, assets)branch (branch_name, branch_city, assets)

customer (customer_name, customer_street, customer (customer_name, customer_street, customer_city)customer_city)

account (account_number, branch_name, balance)account (account_number, branch_name, balance)

loan (loan_number, branch_name, amount)loan (loan_number, branch_name, amount)

depositor (customer_name, account_number)depositor (customer_name, account_number)

borrower (customer_name, loan_number)borrower (customer_name, loan_number)

15/2/2012 12DB2 Dr.Suresh Sankaranarayanan

Page 13: Lec02

Select Operation

Notation: Notation: pp((rr))

where pwhere p is called the selection predicate is called the selection predicate

Defined as:Defined as:

pp((rr) = {) = {tt | | tt rr and and p(t)p(t)}}

WhereWhere p p is a formula in propositional calculus consisting of terms is a formula in propositional calculus consisting of terms connected by : connected by : (and), (and), (or), (or), (not) (not)Each term is one of:Each term is one of:

<attribute><attribute>opop <attribute> or <constant><attribute> or <constant>

where where opop is one of: =, is one of: =, , >, , >, . <. . <.

Example : Find tuples of the relation borrow where branch name is Example : Find tuples of the relation borrow where branch name is perryridgeperryridge branch_name=“Perryridge”branch_name=“Perryridge”((borrowborrow))

15/2/2012DB2 Dr.Suresh Sankaranarayanan 13

Page 14: Lec02

Project Operation

Notation:Notation:

∏∏A1 , A2,…….. Ak A1 , A2,…….. Ak (r)(r)

where where AA11, A, A22 are attribute names and are attribute names and rr is a relation name. is a relation name.

The result is defined as the relation of The result is defined as the relation of kk columns obtained columns obtained by erasing the columns that are not listedby erasing the columns that are not listed

Duplicate rows removed from result, since relations are Duplicate rows removed from result, since relations are setssets

Example: To eliminate the Example: To eliminate the branch_namebranch_name attribute of attribute of borrowborrow

loan_number, amount, customer_nameloan_number, amount, customer_name ( (borrowborrow) )

15/2/2012DB2 Dr.Suresh Sankaranarayanan 14

Page 15: Lec02

Union Operation

Notation: Notation: rr ss

Defined as: Defined as:

rr ss = { = {tt | | tt rr or or t t ss}}

For For rr ss to be valid. to be valid.

1. 1. r,r, ss must have the must have the same same arity (same number of attributes)arity (same number of attributes)

2. The attribute domains must be compatible (example: 22. The attribute domains must be compatible (example: 2ndnd column column of of rr deals with the same type of values as does the 2 deals with the same type of values as does the 2nd nd

column of column of ss))

Example: Find all customers with either an account or a loanExample: Find all customers with either an account or a loan

customer_namecustomer_name ( (depositordepositor) ) customer_namecustomer_name ( (borrower)borrower)

15/2/2012DB2 Dr.Suresh Sankaranarayanan 15

Page 16: Lec02

Set Difference Operation

Notation Notation r – sr – s

Defined as:Defined as:

r – sr – s = { = {tt | | tt rr and t and t ss}}

Set differences must be taken between compatible relations.Set differences must be taken between compatible relations.

r and s must have the same arity (same number of attributes) attribute domains of r and s must be compatible

Example: Find all customers of a bank who have an account but not loan customer_name (depositor) - customer_name (borrower)

15/2/2012DB2 Dr.Suresh Sankaranarayanan 16

Page 17: Lec02

Cartesian-Product operation

Combines information from any two relationsCombines information from any two relations

NotationNotation r r xx s s

Defined as:Defined as:

rr x x ss = { = {t q t q || t t r r and and q q ss}}

Assume that attributes of r(R) and s(S) are disjoint. (That is, Assume that attributes of r(R) and s(S) are disjoint. (That is, RR S S = = ).).

If attributes of If attributes of r(R)r(R) and and s(Ss(S) are not disjoint, then renaming must be ) are not disjoint, then renaming must be used.used.

Example: Find names of all customers who have loan at perryridge Example: Find names of all customers who have loan at perryridge branch. For this we need information from both loan and borrower branch. For this we need information from both loan and borrower relation which we write as :relation which we write as :

branch_name=“Perryridge”branch_name=“Perryridge”((borrower X loanborrower X loan))

15/2/2012DB2 Dr.Suresh Sankaranarayanan 17

Page 18: Lec02

Rename Operation

Allows us to name, and therefore to refer to, the results of relational-Allows us to name, and therefore to refer to, the results of relational-algebra expressions.algebra expressions.

Allows us to refer to a relation by more than one name.Allows us to refer to a relation by more than one name.

Example:Example:

xx ( (EE))

returns the expression returns the expression EE under the name under the name XX

If a relational-algebra expression If a relational-algebra expression EE has arity has arity nn, then , then

returns the result of expression returns the result of expression EE under the name under the name XX, and with the, and with the

attributes renamed to attributes renamed to AA1 1 , A, A2 2 , …., A, …., An n ..

15/2/2012DB2 Dr.Suresh Sankaranarayanan 18

)(),...,,( 21E

nAAAx

Page 19: Lec02

Formal Definition

A basic expression in the relational algebra consists of either one of the A basic expression in the relational algebra consists of either one of the following:following:

A relation in the database

A constant relation

Let Let EE11 and and EE22 be relational-algebra expressions; the following are all be relational-algebra expressions; the following are all

relational-algebra expressions:relational-algebra expressions:

E1 E2

E1 – E2

E1 x E2

p (E1), P is a predicate on attributes in E1

s(E1), S is a list consisting of some of the attributes in E1

x (E1), x is the new name for the result of E1

15/2/2012DB2 Dr.Suresh Sankaranarayanan 19

Page 20: Lec02

Set-Intersection Operation

Notation: Notation: rr ss

Defined as:Defined as:

rr ss = { = { t t | | tt rr and and tt ss } }

Assume: Assume: r, s have the same arity attributes of r and s are compatible

Note: Note: rr ss = = rr – ( – (rr – – ss))

Example: Find all customers who have both loan and Example: Find all customers who have both loan and accountaccount

customer_namecustomer_name((borrowerborrower) ) customer_namecustomer_name((depositordepositor) )

15/2/2012DB2 Dr.Suresh Sankaranarayanan 20

Page 21: Lec02

Natural-Join Operation

Notation: r sNotation: r s

Let Let rr and and ss be relations on schemas be relations on schemas RR and and SS respectively. respectively.

Then, r s is a relation on schema Then, r s is a relation on schema R R SS obtained as follows: obtained as follows:

Consider each pair of tuples tr from r and ts from s.

If tr and ts have the same value on each of the attributes in R S, add a

tuple t to the result, where

t has the same value as tr on r

t has the same value as ts on s

Example: Find names of all customers who have loan at the bank and find the Example: Find names of all customers who have loan at the bank and find the amount of the loanamount of the loan

customer_name, loan_number ,customer_name, loan_number , amount (borrower loan)amount (borrower loan)

15/2/2012DB2 Dr.Suresh Sankaranarayanan 21

Page 22: Lec02

Division Operation

Notation: Notation:

Suited to queries that include the phrase “for all”.Suited to queries that include the phrase “for all”.

Example: Find all customers who have account at all branches located in Example: Find all customers who have account at all branches located in Brooklyn.Brooklyn.

customer_name, branch_namecustomer_name, branch_name ((depositordepositor accountaccount)) branch_name branch_name ((branch_citybranch_city = “Brooklyn” = “Brooklyn” ((branchbranch))))

15/2/2012DB2 Dr.Suresh Sankaranarayanan 22

Page 23: Lec02

Assignment Operation

The assignment operation (The assignment operation () provides a convenient way to express complex queries.) provides a convenient way to express complex queries.

Write query as a sequential program consisting ofWrite query as a sequential program consisting of a series of assignments followed by an expression whose value is displayed as a result of the query.

Assignment must always be made to a temporary relation variable.

Example: Write Example: Write rr ss as as

temp1temp1 R-SR-S ( (r r ) )

temp2temp2 R-SR-S (( ((temp1temp1 x x s s ) – ) – R-S,S R-S,S ((r r ))))

resultresult = = temp1temp1 – – temp2 temp2

The result to the right of the is assigned to the relation variable on the left of the

.

May use variable in subsequent expressions.

15/2/2012DB2 Dr.Suresh Sankaranarayanan 23

Page 24: Lec02

Extended Relation Algebra Operations

Generalized ProjectionGeneralized Projection

Aggregate FunctionsAggregate Functions

Outer JoinOuter Join

15/2/2012DB2 Dr.Suresh Sankaranarayanan 24

Page 25: Lec02

Generalised Projection

Extends the projection operation by allowing arithmetic Extends the projection operation by allowing arithmetic functions to be used in the projection list.functions to be used in the projection list.

EE is any relational-algebra expression is any relational-algebra expression

Each of Each of FF11, , FF22, …, , …, FFnn are are arithmetic expressions are are arithmetic expressions

involving constants and attributes in the schema of involving constants and attributes in the schema of EE..

Given relation Given relation credit_info(customer_name, limit, credit_info(customer_name, limit, credit_balance),credit_balance), find how much more each person can find how much more each person can spend: spend:

customer_name, limit – credit_balancecustomer_name, limit – credit_balance (credit_info) (credit_info)

15/2/2012DB2 Dr.Suresh Sankaranarayanan 25

)( ,...,,21

EnFFF

Page 26: Lec02

Aggregate Functions and Operations

Aggregation function takes a collection of values and returns a single value Aggregation function takes a collection of values and returns a single value as a result.as a result.

avg: average valueavg: average valuemin: minimum valuemin: minimum valuemax: maximum valuemax: maximum valuesum: sum of valuessum: sum of valuescount: number of valuescount: number of values

Aggregate operation in relational algebra Aggregate operation in relational algebra

EE is any relational-algebra expression is any relational-algebra expression

G1, G2 …, Gn is a list of attributes on which to group (can be empty)

Each Fi is an aggregate function

Each Ai is an attribute name

Example: Find total sum of salaries of all part-time employees in bank

g sum(salary) (pt_works)

15/2/2012DB2 Dr.Suresh Sankaranarayanan 26

)()(,,(),(,,, 221121E

nnn AFAFAFGGG

Page 27: Lec02

Outer Join

Extension of join operation to deal with missing informationExtension of join operation to deal with missing information

There are actually three forms of operation : Left outer join, Right There are actually three forms of operation : Left outer join, Right outer join, Full outer joinouter join, Full outer join

All three forms of join computes the join and then adds tuples form All three forms of join computes the join and then adds tuples form one relation that does not match tuples in the other relation to the one relation that does not match tuples in the other relation to the result of the join. result of the join.

Left outer join takes all tuples in the left relation that did not match Left outer join takes all tuples in the left relation that did not match with any tuple in the right relation, pads the tuples with null values with any tuple in the right relation, pads the tuples with null values for all other attributes from the right relation and adds them to the for all other attributes from the right relation and adds them to the results of the natural join.results of the natural join.

Right outer join takes all tuples in the right relation that did not Right outer join takes all tuples in the right relation that did not match with any tuples in the left relation, pads the tuples with null match with any tuples in the left relation, pads the tuples with null values for all other attributes from the left relation and adds them to values for all other attributes from the left relation and adds them to the results of the natural join.the results of the natural join.

Full outer join is a hybrid of both left and right outer join.Full outer join is a hybrid of both left and right outer join.

15/2/2012DB2 Dr.Suresh Sankaranarayanan 27

Page 28: Lec02

Outer Join

Example: Consider two relations borrower and loanExample: Consider two relations borrower and loan

15/2/2012DB2 Dr.Suresh Sankaranarayanan 28

300040001700

loan_number amount

L-170L-230L-260

branch_name

DowntownRedwoodPerryridge

Loan Relation

customer_name loan_number

JonesSmithHayes

L-170L-230L-155

borrower Relation

loan_number amount

L-170L-230

30004000

customer_name

JonesSmith

branch_name

DowntownRedwood

Join: loan borrower

Page 29: Lec02

Outer Join

15/2/2012DB2 Dr.Suresh Sankaranarayanan 29

Left Outer JoinLoan borrowerJones

Smithnull

loan_number amount

L-170L-230L-260

300040001700

customer_namebranch_name

DowntownRedwoodPerryridge

loan_number amount

L-170L-230L-155

30004000null

customer_name

JonesSmithHayes

branch_name

DowntownRedwoodnull

Right Outer JoinLoan borrower

loan_number amount

L-170L-230L-260L-155

300040001700null

customer_name

JonesSmithnullHayes

branch_name

DowntownRedwoodPerryridgenull

Full Outer JoinLoan borrower

Page 30: Lec02

Null Values

It is possible for tuples to have a null value, denoted by It is possible for tuples to have a null value, denoted by nullnull, for some , for some

of their attributesof their attributes

nullnull signifies an unknown value or that a value does not exist. signifies an unknown value or that a value does not exist.

The result of any arithmetic expression involving The result of any arithmetic expression involving nullnull is is null.null.

Aggregate functions simply ignore null values (as in SQL)Aggregate functions simply ignore null values (as in SQL)

For duplicate elimination and grouping, null is treated like any other For duplicate elimination and grouping, null is treated like any other

value, and two nulls are assumed to be the same (as in SQL)value, and two nulls are assumed to be the same (as in SQL)

15/2/2012DB2 Dr.Suresh Sankaranarayanan 30

Page 31: Lec02

Comparisons with null values return the special truth value: Comparisons with null values return the special truth value: unknownunknown

If false was used instead of unknown, then not (A < 5) would not be equivalent to A >= 5

Three-valued logic using the truth value Three-valued logic using the truth value unknownunknown::

OR: (unknown or true) = true, (unknown or false) = unknown (unknown or unknown) = unknown

AND: (true and unknown) = unknown, (false and unknown) = false, (unknown and unknown) = unknown

NOT: (not unknown) = unknown

In SQL “P is unknown” evaluates to true if predicate P evaluates to unknown

Result of select predicate is treated as Result of select predicate is treated as false false if it evaluates to if it evaluates to unknownunknown

15/2/2012DB2 Dr.Suresh Sankaranarayanan 31

Page 32: Lec02

Modification of Database

The content of the database may be modified using the The content of the database may be modified using the following operations:following operations: Deletion Insertion Updating

All these operations are expressed using the assignment All these operations are expressed using the assignment operator.operator.

15/2/2012DB2 Dr.Suresh Sankaranarayanan 32

Page 33: Lec02

Deletion

A delete request is expressed similarly to a query, except A delete request is expressed similarly to a query, except instead of displaying tuples to the user, the selected instead of displaying tuples to the user, the selected tuples are removed from the database.tuples are removed from the database.

Can delete only whole tuples; cannot delete values on Can delete only whole tuples; cannot delete values on only particular attributesonly particular attributes

A deletion is expressed in relational algebra by:A deletion is expressed in relational algebra by:

rr rr – – EE

where where rr is a relation and is a relation and EE is a relational algebra query. is a relational algebra query.

Example: Delete all account records in the Perryridge Example: Delete all account records in the Perryridge branch.branch.

account account account account – – branch_name = “Perryridge”branch_name = “Perryridge” ((account account ))

15/2/2012DB2 Dr.Suresh Sankaranarayanan 33

Page 34: Lec02

Insertion

To insert data into a relation, we either:To insert data into a relation, we either:

specify a tuple to be inserted

write a query whose result is a set of tuples to be inserted

in relational algebra, an insertion is expressed by:in relational algebra, an insertion is expressed by:

r r r r EE

where where rr is a relation and is a relation and EE is a relational algebra expression. is a relational algebra expression.

The insertion of a single tuple is expressed by letting The insertion of a single tuple is expressed by letting EE be a constant be a constant relation containing one tuple. relation containing one tuple.

Example:Example:Insert information in the database specifying that Smith has Insert information in the database specifying that Smith has $1200 in account A-973 at the Perryridge branch.$1200 in account A-973 at the Perryridge branch.

account account account account {(“A-973”, “Perryridge”, 1200)} {(“A-973”, “Perryridge”, 1200)}

depositor depositor depositor depositor {(“Smith”, “A-973”)} {(“Smith”, “A-973”)}

15/2/2012DB2 Dr.Suresh Sankaranarayanan 34

Page 35: Lec02

Updating

A mechanism to change a value in a tuple without charging A mechanism to change a value in a tuple without charging allall values in the values in the tupletuple

Use the generalized projection operator to do this taskUse the generalized projection operator to do this task

Each Each FFii is either is either

the I th attribute of r, if the I th attribute is not updated, or, if the attribute is to be updated Fi is an expression, involving only

constants and the attributes of r, which gives the new value for the attribute

Example: Make interest payments by increasing all balances by 5 percent

account account account_numberaccount_number, , branch_namebranch_name, , balance balance * 1.05* 1.05 ((accountaccount))

15/2/2012DB2 Dr.Suresh Sankaranarayanan 35

)(,,,, 21rr

lFFF