database systems relational algebra assoc. prof., dr. vladimir dimitrov e-mail: [email protected]...

88
Database Systems Database Systems Relational Algebra Relational Algebra assoc. prof., dr. assoc. prof., dr. Vladimir Dimitrov Vladimir Dimitrov e-mail: e-mail: [email protected] [email protected] sofia.bg sofia.bg web: is.fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Upload: charleen-hilary-reynolds

Post on 05-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Database SystemsDatabase SystemsRelational AlgebraRelational Algebra

assoc. prof., dr. Vladimir assoc. prof., dr. Vladimir DimitrovDimitrov

e-mail: e-mail: [email protected]@fmi.uni-sofia.bg

web: is.fmi.uni-sofia.bgweb: is.fmi.uni-sofia.bg

Page 2: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ContentsContents

An Example Database SchemaAn Example Database SchemaAn Algebra of Relational OperationsAn Algebra of Relational Operations Basics of Relational AlgebraBasics of Relational Algebra Set Operations on RelationsSet Operations on Relations ProjectionProjection SelectionSelection Cartesian ProductCartesian Product Natural JoinsNatural Joins Theta-JoinsTheta-Joins Combining Operations to Form QueriesCombining Operations to Form Queries RenamingRenaming Dependent and Independent OperationsDependent and Independent Operations A Linear Notation for Algebraic A Linear Notation for Algebraic

ExpressionsExpressionsRelational Operations on BagsRelational Operations on Bags Why Bags? Why Bags? Union, Intersection, and Difference of Union, Intersection, and Difference of

BagsBags Projection of BagsProjection of Bags Selection on BagsSelection on Bags Product of BagsProduct of Bags Joins of BagsJoins of Bags

Extended Operators of Relational AlgebraExtended Operators of Relational Algebra Duplicate EliminationDuplicate Elimination Aggregation OperatorsAggregation Operators GroupingGrouping The Grouping OperatorThe Grouping Operator Extending the Projection OperatorExtending the Projection Operator The Sorting OperatorThe Sorting Operator OuterjoinsOuterjoinsConstraints on RelationsConstraints on Relations Relational Algebra as a Constraint Relational Algebra as a Constraint

LanguageLanguage Referential Integrity ConstraintsReferential Integrity Constraints Additional Constraint Examples Additional Constraint Examples SummarySummary

Page 3: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Relational AlgebraRelational Algebra

We begin a study of database programming, that is, how the user can ask queries of We begin a study of database programming, that is, how the user can ask queries of the database and can modify the contents of the database. Our focus is on the the database and can modify the contents of the database. Our focus is on the relational model, and in particular on a notation for describing queries about the relational model, and in particular on a notation for describing queries about the content of relations called "relational algebra."content of relations called "relational algebra."

While ODL uses methods that, in principle, can perform any operation on data, and While ODL uses methods that, in principle, can perform any operation on data, and the E/R model does not embrace a specific way of manipulating data, the the E/R model does not embrace a specific way of manipulating data, the relational model has a concrete set of "standard" operations on data. relational model has a concrete set of "standard" operations on data. Surprisingly, these operations are not "Turing complete" the way ordinary Surprisingly, these operations are not "Turing complete" the way ordinary programming languages are. Thus, there are operations we cannot express in programming languages are. Thus, there are operations we cannot express in relational algebra that could be expressed, for instance, in ODL methods written relational algebra that could be expressed, for instance, in ODL methods written in C++. This situation is not a defect of the relational model or relational algebra, in C++. This situation is not a defect of the relational model or relational algebra, because the advantage of limiting the scope of operations is that it becomes because the advantage of limiting the scope of operations is that it becomes possible to optimize queries written in a very high level language such as SQL.possible to optimize queries written in a very high level language such as SQL.

We begin by introducing the operations of relational algebra. This algebra formally We begin by introducing the operations of relational algebra. This algebra formally applies to sets of tuples, i.e., relations. However, commercial DBMS's use a applies to sets of tuples, i.e., relations. However, commercial DBMS's use a slightly different model of relations, which are bags, not sets. That is, relations in slightly different model of relations, which are bags, not sets. That is, relations in practice may contain duplicate tuples. While it is often useful to think of relational practice may contain duplicate tuples. While it is often useful to think of relational algebra as a set algebra, we also need to be conscious of the effects of duplicates algebra as a set algebra, we also need to be conscious of the effects of duplicates on the results of the operations in relational algebra. In the final section of this on the results of the operations in relational algebra. In the final section of this chapter, we consider the matter of how constraints on relations can be chapter, we consider the matter of how constraints on relations can be expressed.expressed.

Later we will see the languages and features that today's commercial DBMS's offer Later we will see the languages and features that today's commercial DBMS's offer the user. The operations of relational algebra are all implemented by the SQL the user. The operations of relational algebra are all implemented by the SQL query language. These algebraic operations also appear in the OQL language, an query language. These algebraic operations also appear in the OQL language, an object-oriented query language based on the ODL data model.object-oriented query language based on the ODL data model.

Page 4: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

An Example Database An Example Database SchemaSchema

As we begin our focus on database programming in As we begin our focus on database programming in the relational model, it is useful to have a specific the relational model, it is useful to have a specific schema on which to base our examples of queries. schema on which to base our examples of queries. Our chosen database schema draws upon the Our chosen database schema draws upon the running example of movies, stars, and studios, and running example of movies, stars, and studios, and it uses normalized relations similar to the ones that it uses normalized relations similar to the ones that we. However, it includes some attributes that we we. However, it includes some attributes that we have not used previously in examples, and it have not used previously in examples, and it includes one relation — includes one relation — MovieExecMovieExec — that has not — that has not appeared before. The purpose of these changes is appeared before. The purpose of these changes is to give us some opportunities to study different to give us some opportunities to study different data types and different ways of representing data types and different ways of representing information.information.

Page 5: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Example database schema Example database schema about moviesabout movies

Movie(Movie(TITLE: string,TITLE: string,YEAR: integer,YEAR: integer,length: integer,length: integer,inColor: boolean,inColor: boolean,studioName: string,studioName: string,producerC#: integer)producerC#: integer)

Stars In(Stars In(MOVIETITLE: string,MOVIETITLE: string,MOVIEYEAR: integer,MOVIEYEAR: integer,STARNAME: string)STARNAME: string)

MovieStar(MovieStar(NAME: string,NAME: string,address: string,address: string,gender: char,gender: char,birthdate: date)birthdate: date)

MovieExec(MovieExec(name: string,name: string,address: string,address: string,CERT#: integer,CERT#: integer,netWorth: integer)netWorth: integer)

Studio(Studio(NAME: string,NAME: string,address: string,address: string,presC#: integer)presC#: integer)

Page 6: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

An Example Database An Example Database SchemaSchema

Our schema has five relations. The attributes of each relation are listed, along with the intended Our schema has five relations. The attributes of each relation are listed, along with the intended domain for that attribute. The key attributes for a relation are shown in capitals. although domain for that attribute. The key attributes for a relation are shown in capitals. although when we refer to them in text, they will be lower-case as they have been heretofore. For when we refer to them in text, they will be lower-case as they have been heretofore. For instance, all three attributes together form the key for relation instance, all three attributes together form the key for relation StarsInStarsIn. Relation . Relation MovieMovie has has six attributes: title and year together constitute the key for six attributes: title and year together constitute the key for MovieMovie, as they have previously. , as they have previously. Attribute Attribute titletitle is a string, and is a string, and yearyear is an integer. is an integer.

The major modifications to the schema compared with what we have seen so far are:The major modifications to the schema compared with what we have seen so far are: There is a notion of a There is a notion of a certificate numbercertificate number for movie executives — studio presidents and for movie executives — studio presidents and

movie producers. This certificate is a unique integer that we imagine is maintained by some movie producers. This certificate is a unique integer that we imagine is maintained by some external authority, perhaps a registry of executives or a "union."external authority, perhaps a registry of executives or a "union."

We use certificate numbers as the key for movie executives, although movie stars do not We use certificate numbers as the key for movie executives, although movie stars do not always have certificates and we shall continue to use name as the key for stars. That always have certificates and we shall continue to use name as the key for stars. That decision is probably unrealistic, since two stars could have the same name, but we take this decision is probably unrealistic, since two stars could have the same name, but we take this road in order to illustrate some different options.road in order to illustrate some different options.

We introduced the producer as another property of movies. This information is represented We introduced the producer as another property of movies. This information is represented by a new attribute, by a new attribute, producerC#producerC#, of relation , of relation MovieMovie. This attribute is intended to be the . This attribute is intended to be the certificate number of the producer. Producers are expected to be movie executives, as are certificate number of the producer. Producers are expected to be movie executives, as are studio presidents. There may also be other executives in the studio presidents. There may also be other executives in the MovieExecMovieExec relation. relation.

Attribute Attribute filmTypefilmType of of MovieMovie has been changed from an enumerated type to a boolean- has been changed from an enumerated type to a boolean-valued attribute called valued attribute called inColorinColor: true if the movie is in color and false if it is in black and : true if the movie is in color and false if it is in black and white.white.

The attribute The attribute gendergender has been added for movie stars. Its type is "character," either M for has been added for movie stars. Its type is "character," either M for male or F for female. Attribute male or F for female. Attribute birthdatebirthdate, of type "date" (a special type supported by many , of type "date" (a special type supported by many commercial database systems or just a character string if we prefer) has also been added.commercial database systems or just a character string if we prefer) has also been added.

All addresses have been made strings, rather than pairs consisting of a street and city. The All addresses have been made strings, rather than pairs consisting of a street and city. The purpose is to make addresses in different relations comparable easily and to simplify purpose is to make addresses in different relations comparable easily and to simplify operations on addresses.operations on addresses.

Page 7: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

An Algebra of Relational An Algebra of Relational OperationsOperations

To begin our study of operations on relations, we shall learn about a special algebra, To begin our study of operations on relations, we shall learn about a special algebra, called called relational algebrarelational algebra, that consists of some simple but powerful ways to , that consists of some simple but powerful ways to construct new relations from given relations. When the given relations are stored construct new relations from given relations. When the given relations are stored data, then the constructed relations can be answers to queries about this data.data, then the constructed relations can be answers to queries about this data.

The development of an algebra for relations has a history, which we shall follow The development of an algebra for relations has a history, which we shall follow roughly in our presentation. Initially, relational algebra was proposed by T. Codd roughly in our presentation. Initially, relational algebra was proposed by T. Codd as an algebra on sets of tuples (i.e., relations) that could be used to express as an algebra on sets of tuples (i.e., relations) that could be used to express typical queries about those relations. It consisted of five operations on sets: typical queries about those relations. It consisted of five operations on sets: union, set difference, and Cartesian product, with which you might already be union, set difference, and Cartesian product, with which you might already be familiar, and two unusual operations — selection and projection. To these, several familiar, and two unusual operations — selection and projection. To these, several operations that can be defined in terms of these were added; varieties of "join" operations that can be defined in terms of these were added; varieties of "join" are the most important.are the most important.

When DBMS's that used the relational model were first developed, their query When DBMS's that used the relational model were first developed, their query languages largely implemented the relational algebra. However, for efficiency languages largely implemented the relational algebra. However, for efficiency purposes, these systems regarded relations as bags, not sets. That is, unless the purposes, these systems regarded relations as bags, not sets. That is, unless the user asked explicitly that duplicate tuples be condensed into one (i.e., that user asked explicitly that duplicate tuples be condensed into one (i.e., that "duplicates be eliminated"), relations were allowed to contain duplicates. Thus, "duplicates be eliminated"), relations were allowed to contain duplicates. Thus, we shall study the same relational operations on bags and see the changes we shall study the same relational operations on bags and see the changes necessary.necessary.

Another change to the algebra that was necessitated by commercial Another change to the algebra that was necessitated by commercial implementations of the relational model is that several other operations are implementations of the relational model is that several other operations are needed. Most important is a way of performing aggregation, e.g.. finding the needed. Most important is a way of performing aggregation, e.g.. finding the average value of some column of a relation. We shall study these additional average value of some column of a relation. We shall study these additional operations.operations.

Page 8: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Why Bags Can Be More Why Bags Can Be More Efficient Than SetsEfficient Than Sets

As a simple example of why bags can As a simple example of why bags can lead to implementation efficiency, if lead to implementation efficiency, if you take the union of two relations but you take the union of two relations but do not eliminate duplicates, then you do not eliminate duplicates, then you can just copy the relations to the can just copy the relations to the output. If you insist that the result be a output. If you insist that the result be a set, you have to sort the relations, or set, you have to sort the relations, or do something similar to detect identical do something similar to detect identical tuples that come from the two tuples that come from the two relations.relations.

Page 9: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Basics of Relational AlgebraBasics of Relational Algebra

An algebra, in general, consists of operators and An algebra, in general, consists of operators and atomic operands. For instance, in the algebra of atomic operands. For instance, in the algebra of arithmetic, the atomic operands are variables like arithmetic, the atomic operands are variables like xx and constants like 15. The operators are the and constants like 15. The operators are the usual arithmetic ones: addition, subtraction, usual arithmetic ones: addition, subtraction, multiplication, and division. Any algebra allows us multiplication, and division. Any algebra allows us to build expressions by applying operators to to build expressions by applying operators to atomic operands and/or other expressions of the atomic operands and/or other expressions of the algebra. Usually, parentheses are needed to algebra. Usually, parentheses are needed to group operators and their operands. For instance, group operators and their operands. For instance, in arithmetic we have expressions such as in arithmetic we have expressions such as (x + (x + y) * zy) * z or or ((x + 7)/(y - 3)) + x((x + 7)/(y - 3)) + x..

Relational algebra is another example of an Relational algebra is another example of an algebra. Its atomic operands are:algebra. Its atomic operands are:

1.1. Variables that stand for relations.Variables that stand for relations.2.2. Constants, which are finite relations.Constants, which are finite relations.

Page 10: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Basics of Relational AlgebraBasics of Relational Algebra

As we mentioned, in the classical relational algebra, all operands and the results As we mentioned, in the classical relational algebra, all operands and the results of expressions are sets. The operations of the traditional relational algebra fall of expressions are sets. The operations of the traditional relational algebra fall into four broad classes:into four broad classes:

a)a) a) The usual set operations — union, intersection, and difference — applied to a) The usual set operations — union, intersection, and difference — applied to relations.relations.

b)b) b) Operations that remove parts of a relation: "selection" eliminates some rows b) Operations that remove parts of a relation: "selection" eliminates some rows (tuples), and "projection" eliminates some columns.(tuples), and "projection" eliminates some columns.

c)c) c) Operations that combine the tuples of two relations, including "Cartesian c) Operations that combine the tuples of two relations, including "Cartesian product," which pairs the tuples of two relations in all possible ways, and product," which pairs the tuples of two relations in all possible ways, and various kinds of "join" operations, which selectively pair tuples from two various kinds of "join" operations, which selectively pair tuples from two relations.relations.

d)d) d) An operation called "renaming" that does not affect the tuples of a relation, d) An operation called "renaming" that does not affect the tuples of a relation, but changes the relation schema, i.e., the names of the attributes and/or the but changes the relation schema, i.e., the names of the attributes and/or the name of the relation itself.name of the relation itself.

We shall generally refer to expressions of relational algebra as queries. While we We shall generally refer to expressions of relational algebra as queries. While we don't yet have the symbols needed to show many of the expressions of don't yet have the symbols needed to show many of the expressions of relational algebra, you should be familiar with the operations of group (a), and relational algebra, you should be familiar with the operations of group (a), and thus recognize thus recognize (R U S)(R U S) as an example of an expression of relational algebra. as an example of an expression of relational algebra. RR and and SS are are atomic operands standing for relations, whose sets of tuples are unknown. This atomic operands standing for relations, whose sets of tuples are unknown. This query asks for the union of whatever tuples are in the relations named query asks for the union of whatever tuples are in the relations named RR and and SS..

Page 11: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Set Operations on RelationsSet Operations on Relations

The three most common operations on sets arc union, The three most common operations on sets arc union, intersection, and difference. We assume the reader intersection, and difference. We assume the reader is familiar with these operations, which are defined is familiar with these operations, which are defined as follows on arbitrary sets as follows on arbitrary sets RR and and SS::

• • R R S S, the , the unionunion of of RR and and SS, is the set of elements , is the set of elements that are in that are in RR or or SS or both. An element appears only or both. An element appears only once in the union even if it is present in both once in the union even if it is present in both RR and and SS..

• • R R S S, the , the intersectionintersection of of RR and and SS, is the set of , is the set of elements that are in both elements that are in both RR and and SS..

• • R - SR - S, the , the differencedifference of of RR and and SS, is the set of , is the set of elements that are in elements that are in RR but not in but not in SS. Note that . Note that R - SR - S is different from is different from S - RS - R, the latter is the set of , the latter is the set of elements that are in elements that are in SS but not in but not in RR..

Page 12: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Set Operations on RelationsSet Operations on Relations

When we apply these operations to relations, we need to When we apply these operations to relations, we need to put some conditions on put some conditions on RR and and SS::

1.1. RR and and SS must have schemas with identical sets of must have schemas with identical sets of attributes, and the types (domains) for each attribute attributes, and the types (domains) for each attribute must be the same in must be the same in RR and and SS..

2.2. Before we compute the set-theoretic union, Before we compute the set-theoretic union, intersection, or difference of sets of tuples, the columns intersection, or difference of sets of tuples, the columns of of RR and and SS must be ordered so that the order of must be ordered so that the order of attributes is the same for both relations.attributes is the same for both relations.

Sometimes we would like to take the union, intersection, Sometimes we would like to take the union, intersection, or difference of relations that have the same number of or difference of relations that have the same number of attributes, with corresponding domains, but that use attributes, with corresponding domains, but that use different names for their attributes. If so, we may use different names for their attributes. If so, we may use the renaming operator to be discussed latter to change the renaming operator to be discussed latter to change the schema of one or both relations and give them the the schema of one or both relations and give them the same set of attributes.same set of attributes.

Page 13: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExamplesExamplesRR

namename addressaddress gendergender birthdatebirthdate

Carrie Fisher Carrie Fisher 123 Maple St., Hollywood 123 Maple St., Hollywood FF 9/99/999/99/99

Mark Hamill Mark Hamill 456 456 Oak Rd., BrentwoodOak Rd., Brentwood MM 8/88/888/88/88

SS

NameName addressaddress gendergender birthdatebirthdate

Carrie Fisher Carrie Fisher 123 Maple St., Hollywood 123 Maple St., Hollywood FF 9/99/999/99/99

Harrison Ford Harrison Ford 789 Palm 789 Palm Dr., Beverly HillsDr., Beverly Hills MM 7/77/777/77/77

R R S S

NameName addressaddress gendergender birthdatebirthdate

Carrie Fisher Carrie Fisher 123 Maple St., Hollywood 123 Maple St., Hollywood FF 9/99/999/99/99

Page 14: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExamplesExamples

R R S S

namename addressaddress gendergender birthdatebirthdate

Carrie Fisher Carrie Fisher 123 Maple St., Hollywood 123 Maple St., Hollywood FF 9/99/999/99/99

Mark Hamill Mark Hamill 456 456 Oak Rd., BrentwoodOak Rd., Brentwood MM 8/88/888/88/88

Harrison Ford Harrison Ford 789 Palm 789 Palm Dr., Beverly Dr., Beverly HillsHills

MM 7/77/777/77/77

R - SR - S

NameName addressaddress gendergender birthdatebirthdate

Mark Hamill Mark Hamill 456 456 Oak Rd., BrentwoodOak Rd., Brentwood MM 8/88/888/88/88

Page 15: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ProjectionProjection

The The projectionprojection operator is used to produce operator is used to produce from a relation from a relation RR a new relation that has a new relation that has only some of only some of RR's columns. The value of 's columns. The value of expression expression ππAA11,A,A22, ..., A, ..., Ann

(R)(R) is a relation is a relation that has only the columns for attributes that has only the columns for attributes AA11, A, A22, ..., A, ..., Ann of of RR. The schema for . The schema for the resulting value is the set of the resulting value is the set of attributes attributes {A{A11, A, A22, ..., A, ..., Ann}}, which we , which we conventionally show in the order listed.conventionally show in the order listed.

Page 16: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

MovieMovie

titletitle yearyear lengtlengthh

inColoinColorr

studioNamstudioNamee

producerCproducerC##

Star WarsStar Wars 19771977 124124 truetrue FoxFox 1234512345

Mighty Ducks Mighty Ducks 19911991 104104 truetrue DisneyDisney 6789067890

Wayne's WorldWayne's World 19921992 9595 truetrue ParamountParamount 9999999999

ππtitle, year, lengthtitle, year, length(Movie)(Movie) ππinColorinColor(Movie)(Movie)

titletitle yearyear lengtlengthh

inColorinColor

Star WarsStar Wars 19771977 124124 truetrue

Mighty Ducks Mighty Ducks 19911991 104104

Wayne's WorldWayne's World 19921992 9595

Page 17: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

SelectionSelectionThe The selectionselection operator, applied to a relation operator, applied to a relation RR, produces a , produces a

new relation with a subset of new relation with a subset of RR's tuples. The tuples in the 's tuples. The tuples in the resulting relation are those that satisfy some condition resulting relation are those that satisfy some condition CC that involves the attributes of that involves the attributes of RR. We denote this operation . We denote this operation σσCC(R)(R). The schema for the resulting relation is the same as . The schema for the resulting relation is the same as RRs schema, and we conventionally show the attributes in s schema, and we conventionally show the attributes in the same order as we use for the same order as we use for RR..

CC is a conditional expression of the type with which we are is a conditional expression of the type with which we are familiar from conventional programming languages; for familiar from conventional programming languages; for example, conditional expressions follow the keyword if in example, conditional expressions follow the keyword if in programming languages such as C or Java. The only programming languages such as C or Java. The only difference is that the operands in condition difference is that the operands in condition CC are either are either constants or attributes of constants or attributes of RR. We apply . We apply CC to each tuple to each tuple tt of of RR by substituting, for each attribute by substituting, for each attribute AA appearing in condition appearing in condition CC, the component of , the component of tt for attribute for attribute AA. If after substituting for . If after substituting for each attribute of each attribute of CC the condition the condition CC is true, then is true, then tt is one of is one of the tuples that appear in the result of the tuples that appear in the result of σσCC(R)(R); otherwise ; otherwise tt is is not in the result.not in the result.

Page 18: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

σσlength≥100length≥100(Movie)(Movie)

titletitle yearyear lengtlengthh

inColoinColorr

studioNamstudioNamee

producerCproducerC##

Star WarsStar Wars 19771977 124124 truetrue FoxFox 1234512345

Mighty Ducks Mighty Ducks 19911991 104104 truetrue DisneyDisney 6789067890

σσlength≥100 AND studioName = 'Fox'length≥100 AND studioName = 'Fox'(Movie)(Movie)

titletitle yearyear lengtlengthh

inColoinColorr

studioNamstudioNamee

producerCproducerC##

Star WarsStar Wars 19771977 124124 truetrue FoxFox 1234512345

Page 19: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Cartesian ProductCartesian Product

The The Cartesian productCartesian product (or (or cross-productcross-product, or just , or just productproduct) of two sets ) of two sets RR and and SS is the set of pairs that can be formed by choosing the is the set of pairs that can be formed by choosing the first element of the pair to be any element of first element of the pair to be any element of RR and the second and the second any element of any element of SS. This product is denoted . This product is denoted R R ×× S S. When . When RR and and SS are relations, the product is essentially the same. However, since are relations, the product is essentially the same. However, since the members of the members of RR and and SS are tuples, usually consisting of more are tuples, usually consisting of more than one component, the result of pairing a tuple from than one component, the result of pairing a tuple from RR with a with a tuple from tuple from SS is a longer tuple, with one component for each of is a longer tuple, with one component for each of the components of the constituent tuples. By convention, the the components of the constituent tuples. By convention, the components from components from RR precede the components from precede the components from SS in the in the attribute order for the result.attribute order for the result.

The relation schema for the resulting relation is the union of the The relation schema for the resulting relation is the union of the schemas for schemas for RR and and SS. However, if . However, if RR and and SS should happen to have should happen to have some attributes in common, then we need to invent new names some attributes in common, then we need to invent new names for at least one of each pair of identical attributes. To for at least one of each pair of identical attributes. To disambiguate an attribute disambiguate an attribute AA that is in the schemas of both that is in the schemas of both RR and and SS, we use , we use R.AR.A for the attribute from for the attribute from RR and and S.AS.A for the attribute for the attribute from from SS..

Page 20: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

RR SS R R ×× S SAA BB BB CC DD AA R.R.

BBS.BS.B CC DD

11 22 22 55 66 11 22 22 55 66

33 44 44 77 88 11 22 44 77 88

99 1010 1111 11 22 99 1010 1111

33 44 22 55 66

33 44 44 77 88

33 44 99 1010 1111

Page 21: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Natural JoinsNatural JoinsMore often than we want to take the product of two relations, we find a need More often than we want to take the product of two relations, we find a need

to join them by pairing only those tuples that match in some way. The to join them by pairing only those tuples that match in some way. The simplest sort of match is the natural join of two relations simplest sort of match is the natural join of two relations RR and and SS, denoted , denoted RR SS, in which we pair only those tuples from , in which we pair only those tuples from RR and and SS that agree in that agree in whatever attributes are common to the schemas of whatever attributes are common to the schemas of RR and and SS. More . More precisely, let precisely, let AA11, A, A22, ..., A, ..., Ann be all the attributes that are in both the be all the attributes that are in both the schema of schema of RR and the schema of and the schema of SS. Then a tuple . Then a tuple rr from from RR and a tuple and a tuple ss from from SS are successfully paired if and only if are successfully paired if and only if rr and and ss agree on each of the agree on each of the attributes attributes AA11, A, A22, ..., A, ..., Ann..

If the tuples If the tuples rr and and ss are successfully paired in the join are successfully paired in the join RR SS, then the result , then the result of the pairing is a tuple, called the joined tuple, with one component for of the pairing is a tuple, called the joined tuple, with one component for each of the attributes in the union of the schemas of each of the attributes in the union of the schemas of RR and and SS. The joined . The joined tuple agrees with tuple tuple agrees with tuple rr in each attribute in the schema of in each attribute in the schema of RR, and it , and it agrees with agrees with ss in each attribute in the schema of in each attribute in the schema of SS. Since . Since rr and and ss are are successfully paired, the joined tuple is able to agree with both these successfully paired, the joined tuple is able to agree with both these tuples on the attributes they have in common. The construction of the tuples on the attributes they have in common. The construction of the joined tuple is suggested next slide.joined tuple is suggested next slide.

Note also that this join operation is the same one that we used to recombine Note also that this join operation is the same one that we used to recombine relations that had been projected onto two subsets of their attributes. relations that had been projected onto two subsets of their attributes. There the motivation was to explain why BCNF decomposition made There the motivation was to explain why BCNF decomposition made sense. We shall see another use for the natural join: combining two sense. We shall see another use for the natural join: combining two relations so that we can write a query that relates attributes of each.relations so that we can write a query that relates attributes of each.

Page 22: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Joining tuplesJoining tuples

RR

SS

rr

ss

joined tuplejoined tuple

Page 23: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

RR SS R R S S

AA BB BB CC DD AA BB CC DD

11 22 22 55 66 11 22 55 66

33 44 44 77 88 33 44 77 88

99 1010 1111

dangling tupledangling tuple

Page 24: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Natural join of relationsNatural join of relations

AA BB CC BB CC DD AA BB CC DD

11 22 33 22 33 44 11 22 33 44

66 77 88 22 33 55 11 22 33 55

99 77 88 77 88 1010 66 77 88 1010

Relation Relation UU Relation Relation VV 99 77 88 1010

Relation Relation U U V V

Page 25: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Theta-JoinsTheta-Joins

The natural join forces us to pair tuples using one specific The natural join forces us to pair tuples using one specific condition. While this way, equating shared attributes, is the condition. While this way, equating shared attributes, is the most common basis on which relations are joined, it is most common basis on which relations are joined, it is sometimes desirable to pair tuples from two relations on some sometimes desirable to pair tuples from two relations on some other basis. For that purpose, we have a related notation called other basis. For that purpose, we have a related notation called the the theta-jointheta-join. Historically, the "theta" refers to an arbitrary . Historically, the "theta" refers to an arbitrary condition, which we shall represent by condition, which we shall represent by CC rather than rather than θ..

The notation for a theta-join of relations The notation for a theta-join of relations RR and and SS based on based on condition condition CC is is R R S S. The result of this operation is constructed . The result of this operation is constructed as follows:as follows:

CC

1.1. Take the product of Take the product of RR and and SS..2.2. Select from the product only those tuples that satisfy the Select from the product only those tuples that satisfy the

condition condition CC..As with the product operation, the schema for the result is the As with the product operation, the schema for the result is the

union of the schemas of union of the schemas of RR and and SS, with ", with "RR.." or "" or "S.S." prefixed to " prefixed to attributes if necessary to indicate from which schema the attributes if necessary to indicate from which schema the attribute came.attribute came.

Page 26: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

AA U.BU.B U.CU.C V.BV.B V.CV.C DD

11 22 33 22 33 44

11 22 33 22 33 55

11 22 33 77 88 1010

66 77 88 77 88 1010

99 77 88 77 88 1010

U U V VA<DA<D

Page 27: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Theta-JoinsTheta-Joins

Notice that the schema for the result in previous Notice that the schema for the result in previous slide consists of all six attributes, with slide consists of all six attributes, with UU and and VV prefixed to their respective occurrences of prefixed to their respective occurrences of attributes attributes BB and and CC to distinguish them. Thus, the to distinguish them. Thus, the theta-join contrasts with natural join, since in the theta-join contrasts with natural join, since in the latter common attributes are merged into one latter common attributes are merged into one copy. Of course it makes sense to do so in the copy. Of course it makes sense to do so in the case of the natural join, since tuples don't pair case of the natural join, since tuples don't pair unless they agree in their common attributes. In unless they agree in their common attributes. In the case of a theta-join. there is no guarantee the case of a theta-join. there is no guarantee that compared attributes will agree in the result, that compared attributes will agree in the result, since they may not be compared with since they may not be compared with ==..

Page 28: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

AA U.BU.B U.CU.C V.BV.B V.CV.C DD

11 22 33 77 88 1010

U U V VA<D AND U.BA<D AND U.B≠V.B≠V.B

Page 29: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Combining Operations to Form Combining Operations to Form QueriesQueries

If all we could do was to write single operations on one If all we could do was to write single operations on one or two relations as queries, then relational algebra or two relations as queries, then relational algebra would not be as useful as it is. However, relational would not be as useful as it is. However, relational algebra, like all algebras, allows us to form algebra, like all algebras, allows us to form expressions of arbitrary complexity by applying expressions of arbitrary complexity by applying operators either to given relations or to relations that operators either to given relations or to relations that are the result of applying one or more relational are the result of applying one or more relational operators to relations.operators to relations.

One can construct expressions of relational algebra by One can construct expressions of relational algebra by applying operators to subexpressions, using applying operators to subexpressions, using parentheses when necessary to indicate grouping of parentheses when necessary to indicate grouping of operands. It is also possible to represent expressions operands. It is also possible to represent expressions as expression trees; the latter often are easier for us as expression trees; the latter often are easier for us to read, although they are less convenient as a to read, although they are less convenient as a machine-readable notation.machine-readable notation.

Page 30: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

What are the titles and years What are the titles and years of movies made by Fox that of movies made by Fox that

are at least 100 minutes long? are at least 100 minutes long? πtitle, year

σlength≥100 σstudioName='Fox'

Movies Movies

πtitle, year(σlength≥100(Movies) σstudioName='Fox'(Movies))πtitle, year(σlength≥100 AND studioName='Fox'(Movies))

Page 31: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Find the stars of movies thatFind the stars of movies thatare at least 100 minutes longare at least 100 minutes long

Movies1(title, year, length, filmType, studioName)Movies1(title, year, length, filmType, studioName)

Movies2(title, year, starName)Movies2(title, year, starName)

ππstarNamestarName((σσlength≥100length≥100(Movies1 (Movies1 Movies2)) Movies2))

Page 32: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Equivalent Expressions and Equivalent Expressions and Query OptimizationQuery Optimization

All database systems have a query-answering system, All database systems have a query-answering system, and many of them are based on a language that is and many of them are based on a language that is similar in expressive power to relational algebra. similar in expressive power to relational algebra. Thus, the query asked by a user may have many Thus, the query asked by a user may have many equivalent expressions (expressions that produce the equivalent expressions (expressions that produce the same answer, whenever they are given the same same answer, whenever they are given the same relations as operands), and some of these may be relations as operands), and some of these may be much more quickly evaluated. An important job of much more quickly evaluated. An important job of the query "optimizer" discussed briefly is to replace the query "optimizer" discussed briefly is to replace one expression of relational algebra by an equivalent one expression of relational algebra by an equivalent expression that is more efficiently evaluated. expression that is more efficiently evaluated. Optimization of relational algebra expressions is Optimization of relational algebra expressions is covered extensively latter.covered extensively latter.

Page 33: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

RenamingRenaming

In order to control the names of the attributes used In order to control the names of the attributes used for relations that are constructed by applying for relations that are constructed by applying relational algebra operations, it is often convenient relational algebra operations, it is often convenient to use an operator that explicitly renames relations. to use an operator that explicitly renames relations. We shall use the operator We shall use the operator ρρS(AS(A11, A, A22, ..., A, ..., Ann))

(R)(R) to to rename a relation rename a relation RR. The resulting relation has . The resulting relation has exactly the same tuples as exactly the same tuples as RR, but the name of the , but the name of the relation is relation is SS. Moreover, the attributes of the result . Moreover, the attributes of the result relation relation SS are named are named AA11, A, A22, ..., A, ..., Ann, in order , in order from the left. If we only want to change the name from the left. If we only want to change the name of the relation to of the relation to SS and leave the attributes as they and leave the attributes as they are in are in RR, we can just say , we can just say ρρSS(R)(R)..

Page 34: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

RR SS R R ×× ρρS(X, C, D)S(X, C, D)(S)(S)

AA BB BB CC DD AA BB XX CC DD

11 22 22 55 66 11 22 22 55 66

33 44 44 77 88 11 22 44 77 88

99 1010 1111 11 22 99 1010 1111

33 44 22 55 66

33 44 44 77 88

33 44 99 1010 1111

Page 35: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Dependent and Independent Dependent and Independent OperationsOperations

Some of the operations that we have described can be expressed in terms of other Some of the operations that we have described can be expressed in terms of other relational algebra operations. For example, intersection can be expressed in relational algebra operations. For example, intersection can be expressed in terms of set difference:terms of set difference:

R R S = R -(R - S) S = R -(R - S)That is, if That is, if RR and and SS are any two relations with the same schema, the intersection of are any two relations with the same schema, the intersection of RR

and and SS can be computed by first subtracting can be computed by first subtracting SS from from RR to form a relation to form a relation TT consisting of all those tuples in consisting of all those tuples in RR but not but not SS. We then subtract . We then subtract TT from from RR, leaving , leaving only those tuples of only those tuples of RR that are also in that are also in SS..

The two forms of join are also expressible in terms of other operations. Theta-join The two forms of join are also expressible in terms of other operations. Theta-join can be expressed by product and selection:can be expressed by product and selection:

R R S = S = σσCC(R (R ×× S) S)The natural join of The natural join of RR and and SS can be expressed by starting with the product can be expressed by starting with the product R R ×× S S. We . We

then apply the selection operator with a condition then apply the selection operator with a condition CC of the form of the formR.AR.A11 = S.A = S.A11 AND R.A AND R.A22 = S.A = S.A22 AND … AND R.A AND … AND R.Ann = S.A = S.Ann

where where AA11, A, A22,..., A,..., Ann are all the attributes appearing in the schemas of both are all the attributes appearing in the schemas of both RR and and SS. Finally, we must project out one copy of each of the equated attributes. Let . Finally, we must project out one copy of each of the equated attributes. Let LL be the list of attributes in the schema of be the list of attributes in the schema of RR followed by those attributes in the followed by those attributes in the schema of schema of SS that are not also in the schema of that are not also in the schema of RR. Then. Then

R R S = S = ππLL((σσCC((R R ×× S S))))The rewriting rules are the only "redundancies" among the operations that we have The rewriting rules are the only "redundancies" among the operations that we have

introduced. The six remaining operations — union, difference, selection, introduced. The six remaining operations — union, difference, selection, projection, product, and renaming — form an independent set, none of which can projection, product, and renaming — form an independent set, none of which can be written in terms of the other five.be written in terms of the other five.

Page 36: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

U(A, B, C) V(B, C, D)U(A, B, C) V(B, C, D)

U U V VππA, U.B, U.C, DA, U.B, U.C, D((σσU.B=V.B AND U.C=V.CU.B=V.B AND U.C=V.C(U × V))(U × V))

U U V V A<D AND U.BA<D AND U.B≠V.B≠V.B

σσA<D AND U.B≠V.BA<D AND U.B≠V.B(U × V)(U × V)

Page 37: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

A Linear Notation for Algebraic A Linear Notation for Algebraic ExpressionsExpressions

We used trees to represent complex expressions of relational We used trees to represent complex expressions of relational algebra. Another alternative is to invent names for the temporary algebra. Another alternative is to invent names for the temporary relations that correspond to the interior nodes of the tree and relations that correspond to the interior nodes of the tree and write a sequence of assignments that create a value for each. write a sequence of assignments that create a value for each. The order of the assignments is flexible, as long as the children The order of the assignments is flexible, as long as the children of a node of a node NN have had their values created before we attempt to have had their values created before we attempt to create the value for create the value for NN itself. itself.

The notation we shall use for assignment statements is:The notation we shall use for assignment statements is:1.1. A relation name and parenthesized list of attributes for that A relation name and parenthesized list of attributes for that

relation. The name relation. The name AnswerAnswer will be used conventionally for the will be used conventionally for the result of the final step; i.e., the name of the relation at the root of result of the final step; i.e., the name of the relation at the root of the expression tree.the expression tree.

2.2. The assignment symbol The assignment symbol :=:=..3.3. Any algebraic expression on the right. We can choose to use only Any algebraic expression on the right. We can choose to use only

one operator per assignment, in which case each interior node of one operator per assignment, in which case each interior node of the tree gets its own assignment statement. However, it is also the tree gets its own assignment statement. However, it is also permissible to combine several algebraic operations in one right permissible to combine several algebraic operations in one right side, if it is convenient to do so.side, if it is convenient to do so.

Page 38: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

What are the titles and years What are the titles and years of movies made by Fox that of movies made by Fox that

are at least 100 minutes long?are at least 100 minutes long? πtitle, year

σlength≥100 σstudioName='Fox'

Movies Movies

πtitle, year(σlength≥100(Movies) σstudioName='Fox'(Movies))πtitle, year(σlength≥100 AND studioName='Fox'(Movies))

Page 39: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

R(t, y, l, i, s, p) := R(t, y, l, i, s, p) := σσlength≥100length≥100(Movie)(Movie)

S(t, y, l, i, s, p) := S(t, y, l, i, s, p) := σσstudioName='Fox'studioName='Fox'(Movie)(Movie)

T(t, y, l, i, s, p) := R T(t, y, l, i, s, p) := R S SAnswer(title, year) := Answer(title, year) := ππt, it, i(T)(T)

R(t, y, l, i, s, p) := R(t, y, l, i, s, p) := σσlength≥100length≥100(Movie)(Movie)

S(t, y, l, i, s, p) := S(t, y, l, i, s, p) := σσstudioName='Fox'studioName='Fox'(Movie)(Movie)

Answer(title, year) := Answer(title, year) := ππt, it, i(R (R S) S)

Page 40: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Relational Operations on Relational Operations on BagsBags

While a set of tuples (i.e., a relation) is a simple, While a set of tuples (i.e., a relation) is a simple, natural model of data as it might appear in a natural model of data as it might appear in a database, commercial database systems rarely, if database, commercial database systems rarely, if ever, are based purely on sets. In some situations, ever, are based purely on sets. In some situations, relations as they appear in database systems are relations as they appear in database systems are permitted to have duplicate tuples. Recall that if a permitted to have duplicate tuples. Recall that if a "set" is allowed to have multiple occurrences of a "set" is allowed to have multiple occurrences of a member, then that set is called a member, then that set is called a bagbag or - or - multisetmultiset. . We shall consider relations that are bags rather We shall consider relations that are bags rather than sets; that is, we shall allow the same tuple to than sets; that is, we shall allow the same tuple to appear more than once in a relation. When we appear more than once in a relation. When we refer to a "set," we mean a relation without refer to a "set," we mean a relation without duplicate tuples; a "bag" means a relation that duplicate tuples; a "bag" means a relation that may (or may not) have duplicate tuples.may (or may not) have duplicate tuples.

Page 41: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

AA BB

11 22

33 44

11 22

11 22

Page 42: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Why Bags?Why Bags?

When we think about implementing relations When we think about implementing relations efficiently, we can see several ways that allowing efficiently, we can see several ways that allowing relations to be bags rather than sets can speed relations to be bags rather than sets can speed up operations on relations. We mentioned how up operations on relations. We mentioned how allowing the result to be a bag could speed up the allowing the result to be a bag could speed up the union of two relations. For another example, union of two relations. For another example, when we do a projection, allowing the resulting when we do a projection, allowing the resulting relation to be a bag (even when the original relation to be a bag (even when the original relation is a set) lets us work with each tuple relation is a set) lets us work with each tuple independently. If we want a set as the result, we independently. If we want a set as the result, we need to compare each projected tuple with all the need to compare each projected tuple with all the other projected tuples, to make sure that each other projected tuples, to make sure that each projection appears only once. However, if we can projection appears only once. However, if we can accept a bag as the result, then we simply project accept a bag as the result, then we simply project each tuple and add it to the result; no comparison each tuple and add it to the result; no comparison with other projected tuples is necessary.with other projected tuples is necessary.

Page 43: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Example projection on bag and Example projection on bag and setset

RR ππA,BA,B(R)(R)

AA BB CC AA BB AA BB

11 22 55 11 22 11 22

33 44 66 33 44 33 44

11 22 77 11 22

11 22 88 11 22

Page 44: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Union, Intersection, and Union, Intersection, and Difference of BagsDifference of Bags

When we take the union of two bags, we add the number When we take the union of two bags, we add the number of occurrences of each tuple. That is, if of occurrences of each tuple. That is, if RR is a bag in is a bag in which the tuple which the tuple tt appears appears nn times, and times, and SS is a bag in is a bag in which the tuple which the tuple tt appears appears mm times, then in the bag times, then in the bag R R SS tuple tuple tt appears appears n + mn + m times. Note that either times. Note that either nn or or mm (or both) can be (or both) can be 00..

When we intersect two bags When we intersect two bags RR and and SS, in which tuple , in which tuple tt appears appears nn and and mm times, respectively, in times, respectively, in R R S S tuple tuple tt appears appears min(n, m)min(n, m) times. When we compute times. When we compute R - SR - S, , the difference of bags the difference of bags RR and and SS, tuple , tuple tt appears in appears in R – S R – S max(0, n - m)max(0, n - m) times. That is, if times. That is, if tt appears in appears in RR more more times than it appears in times than it appears in SS, then in , then in R - SR - S tuple tuple tt appears the number of times it appears in appears the number of times it appears in RR, minus the , minus the number of times it appears in number of times it appears in SS. However, if . However, if tt appears appears at least as many times in at least as many times in SS as it appears in as it appears in RR, then , then tt does not appear at all in does not appear at all in R - SR - S. Intuitively, occurrences . Intuitively, occurrences of of tt in in SS each "cancel" one occurrence in each "cancel" one occurrence in RR..

Page 45: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

RR SS R R S S R R S S R - SR - S

AA BB AA BB AA BB AA CC AA BB

11 22 11 22 11 22 11 22 11 22

33 44 33 44 11 22 33 44 11 22

11 22 33 44 11 22

11 22 55 66 11 22 S - RS - R

33 44 AA BB

33 44 33 44

33 44 55 66

55 66

Page 46: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Projection of BagsProjection of Bags

We have already illustrated the projection of bags. As We have already illustrated the projection of bags. As we saw in an example that each tuple is processed we saw in an example that each tuple is processed independently during the projection. If independently during the projection. If RR is the bag is the bag from the example and we compute the bag-projection from the example and we compute the bag-projection ππA, BA, B(R)(R), then we get the corresponding bag., then we get the corresponding bag.

If the elimination of one or more attributes during the If the elimination of one or more attributes during the projection causes the same tuple to be created from projection causes the same tuple to be created from several tuples, these duplicate tuples are not several tuples, these duplicate tuples are not eliminated from the result of a bag-projection. Thus, eliminated from the result of a bag-projection. Thus, the three tuples (1, 2, 5), (1, 2, 7), and (1, 2, 8) of the the three tuples (1, 2, 5), (1, 2, 7), and (1, 2, 8) of the relation R each gave rise to the same tuple (1, 2) relation R each gave rise to the same tuple (1, 2) after projection onto attributes A and B. In the bag after projection onto attributes A and B. In the bag result, there are three occurrences of tuple (1, 2), result, there are three occurrences of tuple (1, 2), while in the set-projection, this tuple appears only while in the set-projection, this tuple appears only once.once.

Page 47: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Bag Operations on SetsBag Operations on Sets

Imagine we have two sets Imagine we have two sets RR and and SS. Every set may be thought of as a . Every set may be thought of as a bag: the bag just happens to have at most one occurrence of any bag: the bag just happens to have at most one occurrence of any tuple. Suppose we intersect tuple. Suppose we intersect R R S S, but we think of , but we think of RR and and SS as as bags and use the bag intersection rule. Then we get the same bags and use the bag intersection rule. Then we get the same result as we would get if we thought of result as we would get if we thought of RR and and SS as sets. That is, as sets. That is, thinking of thinking of RR and and SS as bags, a tuple as bags, a tuple tt is in is in R R S S the minimum of the minimum of the number of times it is in the number of times it is in RR and and SS. Since . Since RR and and SS are sets, are sets, tt can can be in each only 0 or 1 times. Whether we use the bag or set be in each only 0 or 1 times. Whether we use the bag or set intersection rules, we find that intersection rules, we find that tt can appear at most once in can appear at most once in R R SS, and it appears once exactly when it is in both , and it appears once exactly when it is in both RR and and SS. . Similarly, if we use the bag difference rule to compute Similarly, if we use the bag difference rule to compute R - SR - S or or S S - R- R we get exactly the same result as if we used the set rule. we get exactly the same result as if we used the set rule.

However, union behaves differently, depending on whether we think However, union behaves differently, depending on whether we think of of RR and and SS as sets or bags. If we use the bag rule to compute as sets or bags. If we use the bag rule to compute R R SS, then the result may not be a set, even if , then the result may not be a set, even if RR and and SS are sets. In are sets. In particular, if tuple particular, if tuple tt appears in both appears in both RR and and SS, then , then tt appears appears twice in twice in R R S S if we use the bag rule for union. But if we use the if we use the bag rule for union. But if we use the set rule then set rule then tt appears only once in appears only once in R R S S. Thus, when taking . Thus, when taking unions, we must be especially careful to specify whether we are unions, we must be especially careful to specify whether we are using the bag or set definition of union.using the bag or set definition of union.

Page 48: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Selection on BagsSelection on Bags

To apply a selection to a bag, we apply To apply a selection to a bag, we apply the selection condition to each tuple the selection condition to each tuple independently. As always with bags, independently. As always with bags, we do not eliminate duplicate tuples we do not eliminate duplicate tuples in the result.in the result.

Page 49: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

RR σσC≥6C≥6(R)(R)

AA BB CC AA BB CC

11 22 55 33 44 66

33 44 66 11 22 77

11 22 77 11 22 77

11 22 77

Page 50: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Algebraic Laws for BagsAlgebraic Laws for Bags

An algebraic law is an equivalence between two expressions An algebraic law is an equivalence between two expressions of relational algebra whose arguments are variables of relational algebra whose arguments are variables standing for relations. The equivalence asserts that no standing for relations. The equivalence asserts that no matter what relations we substitute for these variables, matter what relations we substitute for these variables, the two expressions define the same relation. An example the two expressions define the same relation. An example of a well known law is the commutative law for union: of a well known law is the commutative law for union: R R S = S S = S R R. This law happens to hold whether we regard . This law happens to hold whether we regard relation-variables relation-variables RR and and SS as standing for sets or bags. as standing for sets or bags.

However, there are a number of other laws that hold when However, there are a number of other laws that hold when relational algebra is applied to sets but that do not hold relational algebra is applied to sets but that do not hold when relations are interpreted as bags. A simple example when relations are interpreted as bags. A simple example of such a law is the distributive law of set difference over of such a law is the distributive law of set difference over union, union, (R (R S) - T = (R - T) S) - T = (R - T) (S - T) (S - T). This law holds for . This law holds for sets but not for bags. To see why it fails for bags, suppose sets but not for bags. To see why it fails for bags, suppose RR, , SS, and , and TT each have one copy of tuple each have one copy of tuple tt. Then the . Then the expression on the left has one expression on the left has one tt, while the expression on , while the expression on the right has none. As sets, neither would have the right has none. As sets, neither would have tt. .

Page 51: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Product of BagsProduct of Bags

The rule for the Cartesian product of The rule for the Cartesian product of bags is the expected one. Each tuple of bags is the expected one. Each tuple of one relation is paired with each tuple one relation is paired with each tuple of the other, regardless of whether it is of the other, regardless of whether it is a duplicate or not. As a result, if a a duplicate or not. As a result, if a tuple tuple rr appears in a relation appears in a relation RR mm times, times, and tuple and tuple ss appears appears nn times in relation times in relation SS, then in the product , then in the product R R ×× S S, the tuple , the tuple rsrs will appear will appear mnmn times. times.

Page 52: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Computing the product of Computing the product of bagsbags

RR SS R R ×× S S

AA BB BB CC AA R.BR.B S.BS.B CC

11 22 22 33 11 22 22 33

11 22 44 55 11 22 22 33

44 55 11 22 44 55

11 22 44 55

11 22 44 55

11 22 44 55

Page 53: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Joins of BagsJoins of Bags

Joining bags also presents no surprises. Joining bags also presents no surprises. We compare each tuple of one relation We compare each tuple of one relation with each tuple of the other, decide with each tuple of the other, decide whether or not this pair of tuples joins whether or not this pair of tuples joins successfully. and if so we put the successfully. and if so we put the resulting tuple in the answer. When resulting tuple in the answer. When constructing the answer, we do not constructing the answer, we do not eliminate duplicate tuples.eliminate duplicate tuples.

Page 54: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

R R S S

RR SS R R S S R.B<S.BR.B<S.B

AA BB BB CC AA BB CC AA R.BR.B S.BS.B CC

11 22 22 33 11 22 33 11 22 44 55

11 22 44 55 11 22 33 11 22 44 55

44 55 11 22 44 55

11 22 44 55

Page 55: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Extended Operators of Extended Operators of Relational AlgebraRelational Algebra

We presented the classical relational algebra, and then introduced the modifications We presented the classical relational algebra, and then introduced the modifications necessary to treat relations as bags of tuples rather than sets. The ideas of these two necessary to treat relations as bags of tuples rather than sets. The ideas of these two sections serve as a foundation for most of modern query languages. However, languages sections serve as a foundation for most of modern query languages. However, languages such as SQL have several other operations that have proved quite important in such as SQL have several other operations that have proved quite important in applications. Thus, a full treatment of relational operations must include a number of other applications. Thus, a full treatment of relational operations must include a number of other operators, which we introduce in this section. The additions:operators, which we introduce in this section. The additions:

1.1. The The duplicate-elimination operator duplicate-elimination operator δδ turns a bag into a set by eliminating all but one copy turns a bag into a set by eliminating all but one copy of each tuple.of each tuple.

2.2. Aggregation operatorsAggregation operators, such as sums or averages, are not operations of relational algebra, , such as sums or averages, are not operations of relational algebra, but are used by the grouping operator (described next). Aggregation operators apply to but are used by the grouping operator (described next). Aggregation operators apply to attributes (columns) of a relation, e.g., the sum of a column produces the one number that attributes (columns) of a relation, e.g., the sum of a column produces the one number that is the sum of all the values in that column.is the sum of all the values in that column.

3.3. GroupingGrouping of tuples according to their value in one or more attributes has the effect of of tuples according to their value in one or more attributes has the effect of partitioning the tuples of a relation into "groups." Aggregation can then be applied to partitioning the tuples of a relation into "groups." Aggregation can then be applied to columns within each group, giving us the ability to express a number of queries that are columns within each group, giving us the ability to express a number of queries that are impossible to express in the classical relational algebra. The impossible to express in the classical relational algebra. The grouping operator grouping operator γγ is an is an operator that combines the effect of grouping and aggregation. operator that combines the effect of grouping and aggregation.

4.4. The The sorting operatorsorting operatorττ turns a relation into a list of tuples, sorted according to one or more turns a relation into a list of tuples, sorted according to one or more attributes. This operator should be used judiciously, because other relational-algebra attributes. This operator should be used judiciously, because other relational-algebra operators apply to sets or bags, but never to lists. Thus, operators apply to sets or bags, but never to lists. Thus, ττ only makes sense as the final only makes sense as the final step of a series of operations. step of a series of operations.

5.5. Extended projectionExtended projection gives additional power to the operator gives additional power to the operator ππ. In addition to projecting out . In addition to projecting out some columns, in its generalized form some columns, in its generalized form ππ can perform computations involving the columns can perform computations involving the columns of its argument relation to produce new columns.of its argument relation to produce new columns.

6.6. The The outerjoinouterjoin operator is a variant of the join that avoids losing dangling tuples. In the operator is a variant of the join that avoids losing dangling tuples. In the result of the outerjoin, dangling tuples are "padded" with the null value, so the dangling result of the outerjoin, dangling tuples are "padded" with the null value, so the dangling tuples can be represented in the output.tuples can be represented in the output.

Page 56: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Duplicate EliminationDuplicate Elimination

Sometimes, we need an operator that Sometimes, we need an operator that converts a bag to a set. For that purpose, converts a bag to a set. For that purpose, we use we use δδ(R)(R) to return the set consisting to return the set consisting of one copy of every tuple that appears of one copy of every tuple that appears one or more times in relation one or more times in relation RR..

Page 57: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

RR δδ(R)(R)

AA BB AA BB

11 22 11 22

33 44 33 44

11 22

11 22

Page 58: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Aggregation OperatorsAggregation OperatorsThere are several operators that apply to sets or bags of There are several operators that apply to sets or bags of

atomic values. These operators are used to summarize atomic values. These operators are used to summarize or "aggregate" the values in one column of a relation, or "aggregate" the values in one column of a relation, and thus are referred to as and thus are referred to as aggregationaggregation operators. The operators. The standard operators of this type are:standard operators of this type are:

1.1. SUMSUM produces the sum of a column with numerical produces the sum of a column with numerical values.values.

2.2. AVGAVG produces the average of a column with numerical produces the average of a column with numerical values.values.

3.3. MINMIN and and MAXMAX, applied to a column with numerical values, , applied to a column with numerical values, produces the smallest or largest value, respectively. produces the smallest or largest value, respectively. When applied to a column with character-string values, When applied to a column with character-string values, they produce the lexicographically (alphabetically) first they produce the lexicographically (alphabetically) first or last value, respectively.or last value, respectively.

4.4. COUNTCOUNT produces the number of (not necessarily distinct) produces the number of (not necessarily distinct) values in a column. Equivalently, values in a column. Equivalently, COUNTCOUNT applied to any applied to any attribute of a relation produces the number of tuples of attribute of a relation produces the number of tuples of that relation, including duplicates.that relation, including duplicates.

Page 59: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

AA BB

11 22

33 44

11 22

11 22

1.SUM(B) = 2 + 4 + 2 + 2 = 10.

2.AVG(A) = (1 + 3 + 1 + 1) / 4 = 1.5.

3.MIN(A) = 1.

4.MAX(B) = 4.

5.COUNT(A) = 4.

Page 60: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

GroupingGroupingOften we do not want simply the average or some other aggregation Often we do not want simply the average or some other aggregation

of an entire column. Rather, we need to consider the tuples of a of an entire column. Rather, we need to consider the tuples of a relation in groups, corresponding to the value of one or more relation in groups, corresponding to the value of one or more other columns, and we aggregate only within each group. As an other columns, and we aggregate only within each group. As an example, suppose we wanted to compute the total number of example, suppose we wanted to compute the total number of minutes of movies produced by each studio, i.e., a relation such minutes of movies produced by each studio, i.e., a relation such as:as:

studiostudio sumOfLengtsumOfLengthshs

DisneyDisney 123456123456

MGMMGM 5432154321

…… ……

Starting with the relationStarting with the relation::Movie(title, year, length, inColor, studioName, producerC#)from our example database schema, from our example database schema, we must group the tuples according to we must group the tuples according to their value for attribute their value for attribute studioNamestudioName. . We must then sum the length column We must then sum the length column within each group.within each group.

That is, we imagine that the tuples of Movie are grouped as That is, we imagine that the tuples of Movie are grouped as suggested, and we apply the aggregation suggested, and we apply the aggregation SUM(length)SUM(length) to each to each group independently.group independently.

Page 61: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

A relation with imaginary A relation with imaginary division into groupsdivision into groups

studioNamestudioName

DisneyDisney

DisneyDisney

DisneyDisney

MGMMGM

MGMMGM

ºº

ºº

ºº

Page 62: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

The Grouping OperatorThe Grouping OperatorWe shall now introduce an operator that allows us to group a relation We shall now introduce an operator that allows us to group a relation

and/or aggregate some columns. If there is grouping, then the and/or aggregate some columns. If there is grouping, then the aggregation is within groups.aggregation is within groups.

The subscript used with the The subscript used with the γγ operator is a list operator is a list LL of elements, each of of elements, each of which is either:which is either:

a)a) An attribute of the relation An attribute of the relation RR to which the to which the γγ is applied: this attribute is is applied: this attribute is one of the attributes by which one of the attributes by which RR will be grouped. This element is said will be grouped. This element is said to be a to be a grouping attributegrouping attribute..

b)b) An aggregation operator applied to an attribute of the relation. To An aggregation operator applied to an attribute of the relation. To provide a name for the attribute corresponding to this aggregation in provide a name for the attribute corresponding to this aggregation in the result, an arrow and new name are appended to the aggregation. the result, an arrow and new name are appended to the aggregation. The underlying attribute is said to be an The underlying attribute is said to be an aggregated attributeaggregated attribute..

The relation returned by the expression The relation returned by the expression γγLL(R)(R) is constructed as follows: is constructed as follows:1.1. Partition the tuples of Partition the tuples of RR into into groupsgroups. Each group consists of all tuples . Each group consists of all tuples

having one particular assignment of values to the grouping attributes having one particular assignment of values to the grouping attributes in the list in the list LL. If there are no grouping attributes, the entire relation . If there are no grouping attributes, the entire relation RR is is one group.one group.

2.2. For each group, produce one tuple consisting of:For each group, produce one tuple consisting of:i.i. The grouping attributes' values for that group andThe grouping attributes' values for that group andii.ii. The aggregations, over all tuples of that group, for the aggregated The aggregations, over all tuples of that group, for the aggregated

attributes on list attributes on list LL..

Page 63: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

δδ is a Special Case of is a Special Case of γγ

Technically, the Technically, the δδ operator is redundant. If operator is redundant. If R(AR(A11, A, A22, ..., A, ..., Ann)) is is a relation, then a relation, then δδ(R)(R) is equivalent to is equivalent to γγAA11, A, A22, ..., A, ..., Ann

(R)(R). That is, to . That is, to eliminate duplicates, we group on all the attributes of the eliminate duplicates, we group on all the attributes of the relation and do no aggregation. Then each group corresponds relation and do no aggregation. Then each group corresponds to a tuple that is found one or more times in to a tuple that is found one or more times in RR. Since the result . Since the result of of γγ contains exactly one tuple from each group, the effect of contains exactly one tuple from each group, the effect of this "grouping" is to eliminate duplicates. However, because this "grouping" is to eliminate duplicates. However, because SS is such a common and important operator, we shall continue to is such a common and important operator, we shall continue to consider it separately when we study algebraic laws and consider it separately when we study algebraic laws and algorithms for implementing the operators.algorithms for implementing the operators.

One can also see One can also see γγ as an extension of the projection operator on as an extension of the projection operator on sets. That is, sets. That is, γγAA11, A, A22, ..., A, ..., Ann

(R)(R)is also the same as is also the same as ππAA11, A, A22, ..., , ...,

AAnn(R)(R)if if RR is a set. However, if is a set. However, if RR is a bag, then is a bag, then γγ eliminates eliminates

duplicates while duplicates while ππ does not. For this reason, does not. For this reason, γγ is often referred is often referred to as generalized projection.to as generalized projection.

Page 64: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

Starsln(title, year, starName)Starsln(title, year, starName)

γγstarName, MIN(year)->minYear, COUNT(title)->ctTitlestarName, MIN(year)->minYear, COUNT(title)->ctTitle(StarsIn)(StarsIn)

γγstarName, MIN(year)->minYear, COUNT(title)->ctTitlestarName, MIN(year)->minYear, COUNT(title)->ctTitle

StarsIn

σctTitle≥3

πstarName, minYear

Page 65: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Extending the Projection Extending the Projection OperatorOperator

Let us reconsider the projection operator Let us reconsider the projection operator ππLL(R)(R). In the classical relational . In the classical relational algebra, algebra, LL is a list of (some of the) attributes of is a list of (some of the) attributes of RR. We extend the projection . We extend the projection operator to allow it to compute with components of tuples as well as choose operator to allow it to compute with components of tuples as well as choose components. In components. In extended projectionextended projection, also denoted , also denoted ππLL(R)(R), projection lists , projection lists can have the following kinds of elements:can have the following kinds of elements:

1.1. A single attribute of A single attribute of RR..2.2. An expression An expression x —> yx —> y, where , where xx and and yy are names for attributes. The element are names for attributes. The element

x —> yx —> y in the list in the list LL asks that we take the attribute asks that we take the attribute xx of of RR and rename it and rename it yy, , i.e., the name of this attribute in the schema of the result relation is i.e., the name of this attribute in the schema of the result relation is yy..

3.3. 3. An expression 3. An expression E —> zE —> z, where , where EE is an expression involving attributes of is an expression involving attributes of RR, , constants, arithmetic operators, and string operators, and constants, arithmetic operators, and string operators, and zz is a new name is a new name for the attribute that results from the calculation implied by for the attribute that results from the calculation implied by EE. For example, . For example, a a a + b —> xa + b —> x as a list element represents the sum of the attributes as a list element represents the sum of the attributes aa and and bb, , renamed renamed xx. Element . Element c || d —> ec || d —> e means concatenate the (presumably means concatenate the (presumably string-valued) attributes string-valued) attributes cc and and dd and call the result and call the result ee..

The result of the projection is computed by considering each tuple of The result of the projection is computed by considering each tuple of RR in turn. in turn. We evaluate the list We evaluate the list LL by substituting the tuple's components for the by substituting the tuple's components for the corresponding attributes mentioned in corresponding attributes mentioned in LL and applying any operators and applying any operators indicated by indicated by LL to these values. The result is a relation whose schema is the to these values. The result is a relation whose schema is the names of the attributes on list names of the attributes on list LL, with whatever renaming the list specifies. , with whatever renaming the list specifies. Each tuple of Each tuple of RR yields one tuple of the result. Duplicate tuples in yields one tuple of the result. Duplicate tuples in RR surely surely yield duplicate tuples in the result, but the result can have duplicates even yield duplicate tuples in the result, but the result can have duplicates even if if RR does not. does not.

Page 66: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

RR ππA,B+C->XA,B+C->X(R)(R) ππB-A->X,C-B->YB-A->X,C-B->Y(R)(R)

AA BB CC AA XX XX YY

00 11 22 00 33 11 11

00 11 22 00 33 11 11

33 44 55 33 99 11 11

Page 67: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

The Sorting OperatorThe Sorting Operator

There are several contexts in which we want to sort the tuples of There are several contexts in which we want to sort the tuples of a relation by one or more of its attributes. Often, when a relation by one or more of its attributes. Often, when querying data, one wants the result relation to be sorted. For querying data, one wants the result relation to be sorted. For instance, in a query about all the movies in which Scan instance, in a query about all the movies in which Scan Connery appeared, we might wish to have the list sorted by Connery appeared, we might wish to have the list sorted by title, so we could more easily find whether a certain movie title, so we could more easily find whether a certain movie was on the list. We shall also see how execution of queries by was on the list. We shall also see how execution of queries by the DBMS is often made more efficient if we sort the relations the DBMS is often made more efficient if we sort the relations first.first.

The expression The expression ττL(R)L(R), where , where RR is a relation and is a relation and LL a list of some a list of some of of RR's attributes, is the relation 's attributes, is the relation RR, but with the tuples of , but with the tuples of RR sorted in the order indicated by sorted in the order indicated by LL. If . If LL is the list is the list AA11, A, A22, ..., , ..., AAnn, then the tuples of , then the tuples of RR are sorted first by their value of are sorted first by their value of attribute attribute AA11. Ties are broken according to the value of . Ties are broken according to the value of AA22; ; tuples that agree on both tuples that agree on both AA11 and and AA22 are ordered according to are ordered according to their value of their value of AA33, and so on. Ties that remain after attribute , and so on. Ties that remain after attribute AAnn is considered may be ordered arbitrarily.is considered may be ordered arbitrarily.

Page 68: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

The Sorting OperatorThe Sorting Operator

If R is a relation with schema R(A, B, C), then If R is a relation with schema R(A, B, C), then ττC, BC, B(R)(R) orders the tuples of orders the tuples of RR by their value of by their value of CC, and tuples , and tuples with the same with the same CC-value are ordered by their -value are ordered by their BB value. value. Tuples that agree on both Tuples that agree on both BB and and CC may be ordered may be ordered arbitrarily.arbitrarily.

The operator The operator ττ is anomalous, in that it is the only is anomalous, in that it is the only operator in our relational algebra whose result is a list operator in our relational algebra whose result is a list of tuples, rather than a set. Thus, in terms of of tuples, rather than a set. Thus, in terms of expressing queries, it only makes sense to talk about expressing queries, it only makes sense to talk about r as the final operator in an algebraic expression. If r as the final operator in an algebraic expression. If another operator of relational algebra is applied after another operator of relational algebra is applied after ττ, the result of the , the result of the ττ is treated as a set or bag, and no is treated as a set or bag, and no ordering of the tuples is implied.ordering of the tuples is implied.

However, as we shall see, it sometimes speeds However, as we shall see, it sometimes speeds execution of the query if we sort intermediate results.execution of the query if we sort intermediate results.

Page 69: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

OuterjoinsOuterjoins

A property of the join operator is that it is possible for certain A property of the join operator is that it is possible for certain tuples to be "dangling"; that is, they fail to match any tuple of tuples to be "dangling"; that is, they fail to match any tuple of the other relation in the common attributes. Dangling tuples the other relation in the common attributes. Dangling tuples do not have any trace in the result of the join, so the join may do not have any trace in the result of the join, so the join may not represent the data of the original relations completely. In not represent the data of the original relations completely. In cases where this behavior is undesirable, a variation on the cases where this behavior is undesirable, a variation on the join, called "outerjoin," has been proposed and appears in join, called "outerjoin," has been proposed and appears in various commercial systems.various commercial systems.

We shall consider the "natural" case first, where the join is on We shall consider the "natural" case first, where the join is on equated values of all attributes in common to the two equated values of all attributes in common to the two relations. The relations. The outerjoinouterjoin R R ºº S S is formed by starting with is formed by starting with R R SS, and adding any dangling tuples from , and adding any dangling tuples from RR or or SS. The added . The added tuples must be padded with a special tuples must be padded with a special nullnull symbol, symbol, , in all the , in all the attributes that they do not possess but that appear in the join attributes that they do not possess but that appear in the join result.result.

When we study SQL, we shall find that the null symbol When we study SQL, we shall find that the null symbol is written is written out, as out, as NULLNULL. You may use . You may use NULLNULL in place of in place of here if you wish. here if you wish.

Page 70: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

UU VV U U º Vº V

AA BB CC BB CC DD AA BB CC DD

11 22 33 22 33 1010 11 22 33 1010

44 55 66 22 33 1111 11 22 33 1111

77 88 99 66 77 1212 44 55 66

77 88 99

66 77 1212

Page 71: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

OuterjoinsOuterjoins

There are many variants of the basic There are many variants of the basic (natural) outerjoin idea. The (natural) outerjoin idea. The left left outerjoinouterjoin R R ººLL S S is like the outerjoin, is like the outerjoin, but only dangling tuples of the left but only dangling tuples of the left argument argument RR are padded with are padded with and added and added to the result. The right outerjoin to the result. The right outerjoin R R ººRR S S is like the outerjoin, but only the is like the outerjoin, but only the dangling tuples of the right argument dangling tuples of the right argument SS are padded with are padded with and added to the and added to the result.result.

Page 72: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

UU VV U U ººLL V V

AA BB CC BB CC DD AA BB CC DD

11 22 33 22 33 1010 11 22 33 1010

44 55 66 22 33 1111 11 22 33 1111

77 88 99 66 77 1212 44 55 66

77 88 99

Page 73: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

UU VV U U ººRR V V

AA BB CC BB CC DD AA BB CC DD

11 22 33 22 33 1010 11 22 33 1010

44 55 66 22 33 1111 11 22 33 1111

77 88 99 66 77 1212 66 77 1212

Page 74: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

OuterjoinsOuterjoins

In addition, all three natural outerjoin In addition, all three natural outerjoin operators have theta-join analogs, where operators have theta-join analogs, where first a theta-join is taken and then those first a theta-join is taken and then those tuples that failed to join with any tuple of tuples that failed to join with any tuple of the other relation, when the condition of the the other relation, when the condition of the theta-join was applied, are padded with theta-join was applied, are padded with and added to the result. We use and added to the result. We use ººCC to to denote a theta-outerjoin with condition denote a theta-outerjoin with condition CC. . This operator can also be modified with This operator can also be modified with LL or or RR to indicate left- or right-outerjoin. to indicate left- or right-outerjoin.

Page 75: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

UU VV U U º Vº V A>V.CA>V.C

AA BB CC BB CC DD AA U.U.BB

U.U.CC

V.V.BB

V.V.CC

DD

11 22 33 22 33 1010 44 55 66 22 33 1010

44 55 66 22 33 1111 44 55 66 22 33 1111

77 88 99 66 77 1212 77 88 99 22 33 1010

77 88 99 22 33 1111

11 22 33

66 77 1212

Page 76: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Constraints on RelationsConstraints on Relations

Relational algebra provides a means to express Relational algebra provides a means to express common constraints, such as the referential common constraints, such as the referential integrity constraints. In fact, we shall see that integrity constraints. In fact, we shall see that relational algebra offers us convenient ways relational algebra offers us convenient ways to express a wide variety of other constraints. to express a wide variety of other constraints. Even functional dependencies can be Even functional dependencies can be expressed in relational algebra, as we shall expressed in relational algebra, as we shall see. Constraints are quite important in see. Constraints are quite important in database programming, and we shall cover database programming, and we shall cover how SQL database systems can enforce the how SQL database systems can enforce the same sorts of constraints as we can express same sorts of constraints as we can express in relational algebra.in relational algebra.

Page 77: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Relational Algebra as a Relational Algebra as a Constraint LanguageConstraint Language

There are two ways in which we can use There are two ways in which we can use expressions of relational algebra to express expressions of relational algebra to express constraints.constraints.

1.1. If If RR is an expression of relational algebra, then is an expression of relational algebra, then R = R = is a constraint that says "The value of is a constraint that says "The value of RR must be empty," or equivalently "There are no must be empty," or equivalently "There are no tuples in the result of tuples in the result of RR."."

2.2. If If RR and and SS are expressions of relational algebra, are expressions of relational algebra, then then R R S S is a constraint that says "Every is a constraint that says "Every tuple in the result of tuple in the result of RR must also be in the must also be in the result of result of SS." Of course the result of ." Of course the result of SS may may contain additional tuples not produced by contain additional tuples not produced by RR..

Page 78: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Relational Algebra as a Relational Algebra as a Constraint LanguageConstraint Language

These ways of expressing constraints are actually equivalent in what These ways of expressing constraints are actually equivalent in what they can express, but sometimes one or the other is clearer or they can express, but sometimes one or the other is clearer or more succinct. That is, the constraint more succinct. That is, the constraint R R S S could just as well have could just as well have been written been written R - S = R - S = . To see why, notice that if every tuple in . To see why, notice that if every tuple in RR is also in is also in SS, then surely , then surely R - SR - S is empty. Conversely, if is empty. Conversely, if R - SR - S contains no tuples, then every tuple in contains no tuples, then every tuple in RR must be in must be in SS (or else it (or else it would be in would be in R - SR - S).).

On the other hand, a constraint of the first form, On the other hand, a constraint of the first form, R = R = , could just as , could just as well have been written well have been written R R . Technically, . Technically, is not an expression of is not an expression of relational algebra, but since there are expressions that evaluate relational algebra, but since there are expressions that evaluate to to , such as , such as R - RR - R, there is no harm in using , there is no harm in using as a relational- as a relational-algebra expression. Note that these equivalences hold oven if algebra expression. Note that these equivalences hold oven if RR and and SS are bags, provided we make the conventional interpretation are bags, provided we make the conventional interpretation of of R R S S: each tuple : each tuple tt appears in appears in SS at least as many times as it at least as many times as it appears in appears in RR..

In the following sections, we shall see how to express significant In the following sections, we shall see how to express significant constraints in one of these two styles. As we shall see, it is the constraints in one of these two styles. As we shall see, it is the first style — equal-to-the-emptyset — that is most commonly used first style — equal-to-the-emptyset — that is most commonly used in SQL programming. However, as shown above, we are free to in SQL programming. However, as shown above, we are free to think in terms of set-containment if we wish and later convert our think in terms of set-containment if we wish and later convert our constraint to the equal-to-the-emptyset style.constraint to the equal-to-the-emptyset style.

Page 79: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Referential Integrity Referential Integrity ConstraintsConstraints

A common kind of constraint, called "referential integrity", A common kind of constraint, called "referential integrity", asserts that a value appearing in one context also asserts that a value appearing in one context also appears in another, related context. We saw referential appears in another, related context. We saw referential integrity as a matter of relationships "making sense." integrity as a matter of relationships "making sense." That is, if an object or entity That is, if an object or entity AA is related to object or is related to object or entity entity BB, then , then BB must really exist. For example, in ODL must really exist. For example, in ODL terms, if a relationship in object terms, if a relationship in object AA is represented is represented physically by a pointer, then referential integrity of this physically by a pointer, then referential integrity of this relationship asserts that the pointer must not be null relationship asserts that the pointer must not be null and must point to a genuine object.and must point to a genuine object.

In the relational model, referential integrity constraints In the relational model, referential integrity constraints look somewhat different. If we have a value look somewhat different. If we have a value vv in a tuple in a tuple of one relation of one relation RR, then because of our design intentions , then because of our design intentions we may expect that v will appear in a particular we may expect that v will appear in a particular component of some tuple of another relation component of some tuple of another relation SS. An . An example will illustrate how referential integrity in the example will illustrate how referential integrity in the relational model can be expressed in relational algebra.relational model can be expressed in relational algebra.

Page 80: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

Movie(title, year, length, inColor, studioName, producerC#)Movie(title, year, length, inColor, studioName, producerC#)MovieExec(name, address, cert#, netWorth)MovieExec(name, address, cert#, netWorth)

ππproducerC#producerC#(Movie) (Movie) ππcert#cert#(MovieExec)(MovieExec)

ππproducerC#producerC#(Movie) - (Movie) - ππcert#cert#(MovieExec)= (MovieExec)=

Page 81: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

StarsIn(movieTitle, movieYear, starName)StarsIn(movieTitle, movieYear, starName)Movie(title, year, length, inColor, studioName, producerC#)Movie(title, year, length, inColor, studioName, producerC#)

ππmovieTitle, movieYearmovieTitle, movieYear(StarsIn) (StarsIn) ππtitle, title,

yearyear(Movie)(Movie)

Page 82: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Additional Constraint Additional Constraint ExamplesExamples

The same constraint notation allows us The same constraint notation allows us to express far more than referential to express far more than referential integrity. For example, we can integrity. For example, we can express any functional dependency express any functional dependency as an algebraic constraint, although as an algebraic constraint, although the notation is more cumbersome the notation is more cumbersome than the FD notation introduced.than the FD notation introduced.

Page 83: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

name -> addressname -> addressMovieStar(name, address, gender, birthdate)MovieStar(name, address, gender, birthdate)

MS1 is MS1 is ρρMS1MS1((name,addrename,addresss,gender,birthdate)s,gender,birthdate)((MMovieovieSStartar))

MS2 is MS2 is ρρMS2MS2((name,addrename,addresss,gender,birthdate)s,gender,birthdate)((MMovieovieSStartar))

σσMS1.name=MS2.name AND MS1.addressMS1.name=MS2.name AND MS1.addressMS2.addressMS2.address(MS1 (MS1 MS2) = MS2) =

Page 84: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Additional Constraint Additional Constraint ExamplesExamples

Some domain constraints can also be expressed in Some domain constraints can also be expressed in relational algebra. Often, a domain constraint relational algebra. Often, a domain constraint simply requires that values for an attribute have a simply requires that values for an attribute have a specific data type, such as integer or character specific data type, such as integer or character string of length 30, so we may associate that string of length 30, so we may associate that domain with the attribute. However, often a domain with the attribute. However, often a domain constraint involves specific values that we domain constraint involves specific values that we require for an attribute. If the set of acceptable require for an attribute. If the set of acceptable values can be expressed in the language of values can be expressed in the language of selection conditions, then this domain constraint selection conditions, then this domain constraint can be expressed in the algebraic constraint can be expressed in the algebraic constraint language.language.

σσgendergender'F' AND gender'F' AND gender'M''M'(MovieStar) = (MovieStar) =

Page 85: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

Additional Constraint Additional Constraint ExamplesExamples

Finally, there are some constraints that Finally, there are some constraints that fall into none of the categories fall into none of the categories outlined, nor are they functional or outlined, nor are they functional or multivalued dependencies. The multivalued dependencies. The algebraic constraint language lets us algebraic constraint language lets us express many new kinds of express many new kinds of constraints.constraints.

We offer one example here.We offer one example here.

Page 86: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

ExampleExample

MovieExec(name, address, cert#, MovieExec(name, address, cert#, netWorth)netWorth)

Studio(name, address, presC#)Studio(name, address, presC#)

σσnetWorth<10000000netWorth<10000000(Studio (Studio MovieExec) = MovieExec) = presC# = cert#presC# = cert#

ππpresC#presC#(Studio) (Studio) ππcert#cert#((σσnetWorth≥10000000netWorth≥10000000(MovieExec))(MovieExec))

Page 87: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

SummarySummaryClassical Relational AlgebraClassical Relational Algebra: This algebra underlies most query languages for : This algebra underlies most query languages for

the relational model. Its principal operators are union, intersection, the relational model. Its principal operators are union, intersection, difference, selection, projection, Cartesian product, natural join, theta-join, difference, selection, projection, Cartesian product, natural join, theta-join, and renaming.and renaming.

Selection and ProjectionSelection and Projection: The selection operator produces a result consisting of : The selection operator produces a result consisting of all tuples of the argument relation that satisfy the selection condition. all tuples of the argument relation that satisfy the selection condition. Projection removes undesired columns from the argument relation to Projection removes undesired columns from the argument relation to produce the result.produce the result.

JoinsJoins: We join two relations by comparing tuples, one from each relation. In a : We join two relations by comparing tuples, one from each relation. In a natural join, we splice together those pairs of tuples that agree on all natural join, we splice together those pairs of tuples that agree on all attributes common to the two relations. In a theta-join, pairs of tuples are attributes common to the two relations. In a theta-join, pairs of tuples are concatenated if they meet a selection condition associated with the theta-concatenated if they meet a selection condition associated with the theta-join.join.

Relations as BagsRelations as Bags: In commercial database systems, relations are actually : In commercial database systems, relations are actually bags, in which the same tuple is allowed to appear several times. The bags, in which the same tuple is allowed to appear several times. The operations of relational algebra on sets can be extended to bags, but there operations of relational algebra on sets can be extended to bags, but there are some algebraic laws that fail to hold.are some algebraic laws that fail to hold.

Extensions to Relational AlgebraExtensions to Relational Algebra: To match the capabilities of SQL or other : To match the capabilities of SQL or other query languages, some operators not present in the classical relational query languages, some operators not present in the classical relational algebra are needed. Sorting of a relation is an example, as is an extended algebra are needed. Sorting of a relation is an example, as is an extended projection, where computation on columns of a relation is supported. projection, where computation on columns of a relation is supported. Grouping, aggregation, and outerjoins are also needed.Grouping, aggregation, and outerjoins are also needed.

Page 88: Database Systems Relational Algebra assoc. prof., dr. Vladimir Dimitrov e-mail: cht@fmi.uni-sofia.bg web: is.fmi.uni-sofia.bg

SummarySummary

Grouping and AggregationGrouping and Aggregation: Aggregations summarize a column of a : Aggregations summarize a column of a relation. Typical aggregation operators are sum, average, count, relation. Typical aggregation operators are sum, average, count, minimum, and maximum. The grouping operator allows us to minimum, and maximum. The grouping operator allows us to partition the tuples of a relation according to their value(s) in one partition the tuples of a relation according to their value(s) in one or more attributes before computing aggregation(s) for each or more attributes before computing aggregation(s) for each group.group.

OuterjoinsOuterjoins: The outerjoin of two relations starts with a join of those : The outerjoin of two relations starts with a join of those relations. Then, dangling tuples (those that failed to join with any relations. Then, dangling tuples (those that failed to join with any tuple) from either relation are padded with null values for the tuple) from either relation are padded with null values for the attributes belonging only to the other relation, and the padded attributes belonging only to the other relation, and the padded tuples are included in the result.tuples are included in the result.

Constraints in Relational AlgebraConstraints in Relational Algebra: Many common kinds of constraints : Many common kinds of constraints can be expressed as the containment of one relational algebra can be expressed as the containment of one relational algebra expression in another, or as the equality of a relational algebra expression in another, or as the equality of a relational algebra expression to the empty set. These constraints include functional expression to the empty set. These constraints include functional dependencies and referential integrity constraints, for example.dependencies and referential integrity constraints, for example.