lec02
DESCRIPTION
Relational AlgebraTRANSCRIPT
15/2/2012DB2 Dr.Suresh Sankaranarayanan 1
DB2 Database Management Systems
Relational Algebra
15/2/2012
DB2 Dr.Suresh Sankaranarayanan
2
Textbook Used
Database System Concepts by Abraham Database System Concepts by Abraham Silberschatz, 5Silberschatz, 5thth Edition Edition
15/2/2012DB2 Dr.Suresh Sankaranarayanan 3
Outline
RelationRelation
KeysKeys
Relational Algebra operationsRelational Algebra operations
Additional Relational-Algebra operationsAdditional Relational-Algebra operations
Extended Relational –Algebra OperationsExtended Relational –Algebra Operations
Modification of DatabaseModification of Database
Basic Structure of Relation
Let DLet D11, , DD22, …. , …. DDnn a relation a relation r r is a subset of is a subset of
DD11 x x DD2 2 x … x x … x DDnn
Thus, a relation is a set of Thus, a relation is a set of nn-tuples (-tuples (aa11,, a a22, …, , …, aann) where each ) where each aaii DDii
Example: IfExample: If
customer_name = {Jones, Smith, Curry, Lindsay, …}
/* Set of all customer names */
customer_street = {Main, North, Park, …} /* set of all street names*/
customer_city = {Harrison, Rye, Pittsfield, …} /* set of all city names */
Then r = { (Jones, Main, Harrison),
(Smith, North, Rye),
(Curry, North, Rye),
(Lindsay, Park, Pittsfield) }
is a relation over
customer_name x customer_street x customer_citycustomer_name x customer_street x customer_city
15/2/2012DB2 Dr.Suresh Sankaranarayanan 4
Relation Schema
AA11, , AA22, …, , …, AAnn are are attributesattributes
RR = ( = (AA11, , AA22, …, , …, AAnn ) is a ) is a relation schemarelation schema
Example:Example:
Customer_schemaCustomer_schema = ( = (customer_name, customer_street, customer_name, customer_street,
customer_citycustomer_city))
rr((RR) denotes a ) denotes a relationrelation rr on the on the relation schema Rrelation schema R
Example:Example:
customer (Customer_schema)customer (Customer_schema)
15/2/2012DB2 Dr.Suresh Sankaranarayanan 5
Relation Instance
The current values (The current values (relation instancerelation instance) of a relation are ) of a relation are specified by a tablespecified by a table
An element An element tt of of rr is a is a tupletuple, represented by a , represented by a row row in a in a tabletable
15/2/2012DB2 Dr.Suresh Sankaranarayanan 6
JonesSmithCurryLindsay
customer_name
MainNorthNorthPark
customer_street
HarrisonRyeRyePittsfield
customer_city
customer
attributes(or columns)
tuples(or rows)
Keys Values of the attribute must be such that they can uniquely identify Values of the attribute must be such that they can uniquely identify
the tuple.the tuple.
Based on that there are three keys defined :Based on that there are three keys defined :
a. Superkeya. Superkey
b. Candidate Keyb. Candidate Key
c. Primary key.c. Primary key.
Superkey Superkey
Set of one or more attributes taken collectively to identify uniquely a Set of one or more attributes taken collectively to identify uniquely a tuple in a relationtuple in a relation
Example: Customer_id attribute of relation customer is sufficient to Example: Customer_id attribute of relation customer is sufficient to distinguish one customer tuple from another. Customer_id is distinguish one customer tuple from another. Customer_id is superkey.superkey.
15/2/2012DB2 Dr.Suresh Sankaranarayanan 7
Keys(contd..)
Candidate key Candidate key
Several distinct set of attributes serve as candidate keySeveral distinct set of attributes serve as candidate key
Example: Combination of {customer_name, customer_street} is Example: Combination of {customer_name, customer_street} is sufficient to distinguish among members of relation customer. So sufficient to distinguish among members of relation customer. So both customer_id and {Customer_name , customer_street} form both customer_id and {Customer_name , customer_street} form candidate keys.candidate keys.
Primary key: Primary key:
Candidate key chosen as the principal means of identifying tuples Candidate key chosen as the principal means of identifying tuples
within a relation. Should choose an attribute whose value never, or within a relation. Should choose an attribute whose value never, or
very rarely, changes.very rarely, changes.
Example: email address is unique, but may change
15/2/2012DB2 Dr.Suresh Sankaranarayanan 8
Keys(contd..)
A relation schema may have an attribute that corresponds to the primary A relation schema may have an attribute that corresponds to the primary key of another relation. The attribute is called a foreign key.key of another relation. The attribute is called a foreign key. E.g. customer_name and account_number attributes of depositor are
foreign keys to customer and account respectively. Only values occurring in the primary key attribute of the referenced
relation may occur in the foreign key attribute of the referencing relation.
15/2/2012DB2 Dr.Suresh Sankaranarayanan 9
Query Languages
Language in which user requests information from the database.Language in which user requests information from the database.
Categories of languagesCategories of languages Procedural Non-procedural, or declarative
““Pure” languages:Pure” languages: Relational algebra Tuple relational calculus Domain relational calculus
Pure languages form underlying basis of query languages that people use.Pure languages form underlying basis of query languages that people use.
15/2/2012DB2 Dr.Suresh Sankaranarayanan 10
Relational Algebra
Procedural languageProcedural language
Six basic operatorsSix basic operators
select: project: union: set difference: – Cartesian product: x rename:
The operators take one or two relations as inputs and produce a new The operators take one or two relations as inputs and produce a new relation as a result.relation as a result.
15/2/2012DB2 Dr.Suresh Sankaranarayanan 11
Banking Example
branch (branch_name, branch_city, assets)branch (branch_name, branch_city, assets)
customer (customer_name, customer_street, customer (customer_name, customer_street, customer_city)customer_city)
account (account_number, branch_name, balance)account (account_number, branch_name, balance)
loan (loan_number, branch_name, amount)loan (loan_number, branch_name, amount)
depositor (customer_name, account_number)depositor (customer_name, account_number)
borrower (customer_name, loan_number)borrower (customer_name, loan_number)
15/2/2012 12DB2 Dr.Suresh Sankaranarayanan
Select Operation
Notation: Notation: pp((rr))
where pwhere p is called the selection predicate is called the selection predicate
Defined as:Defined as:
pp((rr) = {) = {tt | | tt rr and and p(t)p(t)}}
WhereWhere p p is a formula in propositional calculus consisting of terms is a formula in propositional calculus consisting of terms connected by : connected by : (and), (and), (or), (or), (not) (not)Each term is one of:Each term is one of:
<attribute><attribute>opop <attribute> or <constant><attribute> or <constant>
where where opop is one of: =, is one of: =, , >, , >, . <. . <.
Example : Find tuples of the relation borrow where branch name is Example : Find tuples of the relation borrow where branch name is perryridgeperryridge branch_name=“Perryridge”branch_name=“Perryridge”((borrowborrow))
15/2/2012DB2 Dr.Suresh Sankaranarayanan 13
Project Operation
Notation:Notation:
∏∏A1 , A2,…….. Ak A1 , A2,…….. Ak (r)(r)
where where AA11, A, A22 are attribute names and are attribute names and rr is a relation name. is a relation name.
The result is defined as the relation of The result is defined as the relation of kk columns obtained columns obtained by erasing the columns that are not listedby erasing the columns that are not listed
Duplicate rows removed from result, since relations are Duplicate rows removed from result, since relations are setssets
Example: To eliminate the Example: To eliminate the branch_namebranch_name attribute of attribute of borrowborrow
loan_number, amount, customer_nameloan_number, amount, customer_name ( (borrowborrow) )
15/2/2012DB2 Dr.Suresh Sankaranarayanan 14
Union Operation
Notation: Notation: rr ss
Defined as: Defined as:
rr ss = { = {tt | | tt rr or or t t ss}}
For For rr ss to be valid. to be valid.
1. 1. r,r, ss must have the must have the same same arity (same number of attributes)arity (same number of attributes)
2. The attribute domains must be compatible (example: 22. The attribute domains must be compatible (example: 2ndnd column column of of rr deals with the same type of values as does the 2 deals with the same type of values as does the 2nd nd
column of column of ss))
Example: Find all customers with either an account or a loanExample: Find all customers with either an account or a loan
customer_namecustomer_name ( (depositordepositor) ) customer_namecustomer_name ( (borrower)borrower)
15/2/2012DB2 Dr.Suresh Sankaranarayanan 15
Set Difference Operation
Notation Notation r – sr – s
Defined as:Defined as:
r – sr – s = { = {tt | | tt rr and t and t ss}}
Set differences must be taken between compatible relations.Set differences must be taken between compatible relations.
r and s must have the same arity (same number of attributes) attribute domains of r and s must be compatible
Example: Find all customers of a bank who have an account but not loan customer_name (depositor) - customer_name (borrower)
15/2/2012DB2 Dr.Suresh Sankaranarayanan 16
Cartesian-Product operation
Combines information from any two relationsCombines information from any two relations
NotationNotation r r xx s s
Defined as:Defined as:
rr x x ss = { = {t q t q || t t r r and and q q ss}}
Assume that attributes of r(R) and s(S) are disjoint. (That is, Assume that attributes of r(R) and s(S) are disjoint. (That is, RR S S = = ).).
If attributes of If attributes of r(R)r(R) and and s(Ss(S) are not disjoint, then renaming must be ) are not disjoint, then renaming must be used.used.
Example: Find names of all customers who have loan at perryridge Example: Find names of all customers who have loan at perryridge branch. For this we need information from both loan and borrower branch. For this we need information from both loan and borrower relation which we write as :relation which we write as :
branch_name=“Perryridge”branch_name=“Perryridge”((borrower X loanborrower X loan))
15/2/2012DB2 Dr.Suresh Sankaranarayanan 17
Rename Operation
Allows us to name, and therefore to refer to, the results of relational-Allows us to name, and therefore to refer to, the results of relational-algebra expressions.algebra expressions.
Allows us to refer to a relation by more than one name.Allows us to refer to a relation by more than one name.
Example:Example:
xx ( (EE))
returns the expression returns the expression EE under the name under the name XX
If a relational-algebra expression If a relational-algebra expression EE has arity has arity nn, then , then
returns the result of expression returns the result of expression EE under the name under the name XX, and with the, and with the
attributes renamed to attributes renamed to AA1 1 , A, A2 2 , …., A, …., An n ..
15/2/2012DB2 Dr.Suresh Sankaranarayanan 18
)(),...,,( 21E
nAAAx
Formal Definition
A basic expression in the relational algebra consists of either one of the A basic expression in the relational algebra consists of either one of the following:following:
A relation in the database
A constant relation
Let Let EE11 and and EE22 be relational-algebra expressions; the following are all be relational-algebra expressions; the following are all
relational-algebra expressions:relational-algebra expressions:
E1 E2
E1 – E2
E1 x E2
p (E1), P is a predicate on attributes in E1
s(E1), S is a list consisting of some of the attributes in E1
x (E1), x is the new name for the result of E1
15/2/2012DB2 Dr.Suresh Sankaranarayanan 19
Set-Intersection Operation
Notation: Notation: rr ss
Defined as:Defined as:
rr ss = { = { t t | | tt rr and and tt ss } }
Assume: Assume: r, s have the same arity attributes of r and s are compatible
Note: Note: rr ss = = rr – ( – (rr – – ss))
Example: Find all customers who have both loan and Example: Find all customers who have both loan and accountaccount
customer_namecustomer_name((borrowerborrower) ) customer_namecustomer_name((depositordepositor) )
15/2/2012DB2 Dr.Suresh Sankaranarayanan 20
Natural-Join Operation
Notation: r sNotation: r s
Let Let rr and and ss be relations on schemas be relations on schemas RR and and SS respectively. respectively.
Then, r s is a relation on schema Then, r s is a relation on schema R R SS obtained as follows: obtained as follows:
Consider each pair of tuples tr from r and ts from s.
If tr and ts have the same value on each of the attributes in R S, add a
tuple t to the result, where
t has the same value as tr on r
t has the same value as ts on s
Example: Find names of all customers who have loan at the bank and find the Example: Find names of all customers who have loan at the bank and find the amount of the loanamount of the loan
customer_name, loan_number ,customer_name, loan_number , amount (borrower loan)amount (borrower loan)
15/2/2012DB2 Dr.Suresh Sankaranarayanan 21
Division Operation
Notation: Notation:
Suited to queries that include the phrase “for all”.Suited to queries that include the phrase “for all”.
Example: Find all customers who have account at all branches located in Example: Find all customers who have account at all branches located in Brooklyn.Brooklyn.
customer_name, branch_namecustomer_name, branch_name ((depositordepositor accountaccount)) branch_name branch_name ((branch_citybranch_city = “Brooklyn” = “Brooklyn” ((branchbranch))))
15/2/2012DB2 Dr.Suresh Sankaranarayanan 22
Assignment Operation
The assignment operation (The assignment operation () provides a convenient way to express complex queries.) provides a convenient way to express complex queries.
Write query as a sequential program consisting ofWrite query as a sequential program consisting of a series of assignments followed by an expression whose value is displayed as a result of the query.
Assignment must always be made to a temporary relation variable.
Example: Write Example: Write rr ss as as
temp1temp1 R-SR-S ( (r r ) )
temp2temp2 R-SR-S (( ((temp1temp1 x x s s ) – ) – R-S,S R-S,S ((r r ))))
resultresult = = temp1temp1 – – temp2 temp2
The result to the right of the is assigned to the relation variable on the left of the
.
May use variable in subsequent expressions.
15/2/2012DB2 Dr.Suresh Sankaranarayanan 23
Extended Relation Algebra Operations
Generalized ProjectionGeneralized Projection
Aggregate FunctionsAggregate Functions
Outer JoinOuter Join
15/2/2012DB2 Dr.Suresh Sankaranarayanan 24
Generalised Projection
Extends the projection operation by allowing arithmetic Extends the projection operation by allowing arithmetic functions to be used in the projection list.functions to be used in the projection list.
EE is any relational-algebra expression is any relational-algebra expression
Each of Each of FF11, , FF22, …, , …, FFnn are are arithmetic expressions are are arithmetic expressions
involving constants and attributes in the schema of involving constants and attributes in the schema of EE..
Given relation Given relation credit_info(customer_name, limit, credit_info(customer_name, limit, credit_balance),credit_balance), find how much more each person can find how much more each person can spend: spend:
customer_name, limit – credit_balancecustomer_name, limit – credit_balance (credit_info) (credit_info)
15/2/2012DB2 Dr.Suresh Sankaranarayanan 25
)( ,...,,21
EnFFF
Aggregate Functions and Operations
Aggregation function takes a collection of values and returns a single value Aggregation function takes a collection of values and returns a single value as a result.as a result.
avg: average valueavg: average valuemin: minimum valuemin: minimum valuemax: maximum valuemax: maximum valuesum: sum of valuessum: sum of valuescount: number of valuescount: number of values
Aggregate operation in relational algebra Aggregate operation in relational algebra
EE is any relational-algebra expression is any relational-algebra expression
G1, G2 …, Gn is a list of attributes on which to group (can be empty)
Each Fi is an aggregate function
Each Ai is an attribute name
Example: Find total sum of salaries of all part-time employees in bank
g sum(salary) (pt_works)
15/2/2012DB2 Dr.Suresh Sankaranarayanan 26
)()(,,(),(,,, 221121E
nnn AFAFAFGGG
Outer Join
Extension of join operation to deal with missing informationExtension of join operation to deal with missing information
There are actually three forms of operation : Left outer join, Right There are actually three forms of operation : Left outer join, Right outer join, Full outer joinouter join, Full outer join
All three forms of join computes the join and then adds tuples form All three forms of join computes the join and then adds tuples form one relation that does not match tuples in the other relation to the one relation that does not match tuples in the other relation to the result of the join. result of the join.
Left outer join takes all tuples in the left relation that did not match Left outer join takes all tuples in the left relation that did not match with any tuple in the right relation, pads the tuples with null values with any tuple in the right relation, pads the tuples with null values for all other attributes from the right relation and adds them to the for all other attributes from the right relation and adds them to the results of the natural join.results of the natural join.
Right outer join takes all tuples in the right relation that did not Right outer join takes all tuples in the right relation that did not match with any tuples in the left relation, pads the tuples with null match with any tuples in the left relation, pads the tuples with null values for all other attributes from the left relation and adds them to values for all other attributes from the left relation and adds them to the results of the natural join.the results of the natural join.
Full outer join is a hybrid of both left and right outer join.Full outer join is a hybrid of both left and right outer join.
15/2/2012DB2 Dr.Suresh Sankaranarayanan 27
Outer Join
Example: Consider two relations borrower and loanExample: Consider two relations borrower and loan
15/2/2012DB2 Dr.Suresh Sankaranarayanan 28
300040001700
loan_number amount
L-170L-230L-260
branch_name
DowntownRedwoodPerryridge
Loan Relation
customer_name loan_number
JonesSmithHayes
L-170L-230L-155
borrower Relation
loan_number amount
L-170L-230
30004000
customer_name
JonesSmith
branch_name
DowntownRedwood
Join: loan borrower
Outer Join
15/2/2012DB2 Dr.Suresh Sankaranarayanan 29
Left Outer JoinLoan borrowerJones
Smithnull
loan_number amount
L-170L-230L-260
300040001700
customer_namebranch_name
DowntownRedwoodPerryridge
loan_number amount
L-170L-230L-155
30004000null
customer_name
JonesSmithHayes
branch_name
DowntownRedwoodnull
Right Outer JoinLoan borrower
loan_number amount
L-170L-230L-260L-155
300040001700null
customer_name
JonesSmithnullHayes
branch_name
DowntownRedwoodPerryridgenull
Full Outer JoinLoan borrower
Null Values
It is possible for tuples to have a null value, denoted by It is possible for tuples to have a null value, denoted by nullnull, for some , for some
of their attributesof their attributes
nullnull signifies an unknown value or that a value does not exist. signifies an unknown value or that a value does not exist.
The result of any arithmetic expression involving The result of any arithmetic expression involving nullnull is is null.null.
Aggregate functions simply ignore null values (as in SQL)Aggregate functions simply ignore null values (as in SQL)
For duplicate elimination and grouping, null is treated like any other For duplicate elimination and grouping, null is treated like any other
value, and two nulls are assumed to be the same (as in SQL)value, and two nulls are assumed to be the same (as in SQL)
15/2/2012DB2 Dr.Suresh Sankaranarayanan 30
Comparisons with null values return the special truth value: Comparisons with null values return the special truth value: unknownunknown
If false was used instead of unknown, then not (A < 5) would not be equivalent to A >= 5
Three-valued logic using the truth value Three-valued logic using the truth value unknownunknown::
OR: (unknown or true) = true, (unknown or false) = unknown (unknown or unknown) = unknown
AND: (true and unknown) = unknown, (false and unknown) = false, (unknown and unknown) = unknown
NOT: (not unknown) = unknown
In SQL “P is unknown” evaluates to true if predicate P evaluates to unknown
Result of select predicate is treated as Result of select predicate is treated as false false if it evaluates to if it evaluates to unknownunknown
15/2/2012DB2 Dr.Suresh Sankaranarayanan 31
Modification of Database
The content of the database may be modified using the The content of the database may be modified using the following operations:following operations: Deletion Insertion Updating
All these operations are expressed using the assignment All these operations are expressed using the assignment operator.operator.
15/2/2012DB2 Dr.Suresh Sankaranarayanan 32
Deletion
A delete request is expressed similarly to a query, except A delete request is expressed similarly to a query, except instead of displaying tuples to the user, the selected instead of displaying tuples to the user, the selected tuples are removed from the database.tuples are removed from the database.
Can delete only whole tuples; cannot delete values on Can delete only whole tuples; cannot delete values on only particular attributesonly particular attributes
A deletion is expressed in relational algebra by:A deletion is expressed in relational algebra by:
rr rr – – EE
where where rr is a relation and is a relation and EE is a relational algebra query. is a relational algebra query.
Example: Delete all account records in the Perryridge Example: Delete all account records in the Perryridge branch.branch.
account account account account – – branch_name = “Perryridge”branch_name = “Perryridge” ((account account ))
15/2/2012DB2 Dr.Suresh Sankaranarayanan 33
Insertion
To insert data into a relation, we either:To insert data into a relation, we either:
specify a tuple to be inserted
write a query whose result is a set of tuples to be inserted
in relational algebra, an insertion is expressed by:in relational algebra, an insertion is expressed by:
r r r r EE
where where rr is a relation and is a relation and EE is a relational algebra expression. is a relational algebra expression.
The insertion of a single tuple is expressed by letting The insertion of a single tuple is expressed by letting EE be a constant be a constant relation containing one tuple. relation containing one tuple.
Example:Example:Insert information in the database specifying that Smith has Insert information in the database specifying that Smith has $1200 in account A-973 at the Perryridge branch.$1200 in account A-973 at the Perryridge branch.
account account account account {(“A-973”, “Perryridge”, 1200)} {(“A-973”, “Perryridge”, 1200)}
depositor depositor depositor depositor {(“Smith”, “A-973”)} {(“Smith”, “A-973”)}
15/2/2012DB2 Dr.Suresh Sankaranarayanan 34
Updating
A mechanism to change a value in a tuple without charging A mechanism to change a value in a tuple without charging allall values in the values in the tupletuple
Use the generalized projection operator to do this taskUse the generalized projection operator to do this task
Each Each FFii is either is either
the I th attribute of r, if the I th attribute is not updated, or, if the attribute is to be updated Fi is an expression, involving only
constants and the attributes of r, which gives the new value for the attribute
Example: Make interest payments by increasing all balances by 5 percent
account account account_numberaccount_number, , branch_namebranch_name, , balance balance * 1.05* 1.05 ((accountaccount))
15/2/2012DB2 Dr.Suresh Sankaranarayanan 35
)(,,,, 21rr
lFFF