database systems i relational algebra

41
DATABASE SYSTEMS I RELATIONAL ALGEBRA

Upload: hadar

Post on 25-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Database Systems I Relational Algebra. Relational Query Languages. Query languages (QL): Allow manipulation and retrieval of data from a database. Relational model supports simple, powerful query languages: Strong formal foundation based on logic. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Database Systems I  Relational Algebra

DATABASE SYSTEMS I

RELATIONAL ALGEBRA

Page 2: Database Systems I  Relational Algebra

22

RELATIONAL QUERY LANGUAGES Query languages (QL): Allow manipulation

and retrieval of data from a database. Relational model supports simple, powerful

query languages: Strong formal foundation based on logic. High level, abstract formulation of queries. Easy to program. Allows the DBS to do much optimization.

DBS can choose, e.g., most efficient sorting algorithm or the order of basic operations.

Page 3: Database Systems I  Relational Algebra

33

RELATIONAL QUERY LANGUAGES Query Languages != programming

languages! QLs not expected to be “Turing complete”.

Able to answer any computable problem given enough storage space and time

QLs not intended to be used for complex calculations. Applications on top of database can do that

QLs support easy, efficient access to large data sets.

E.g., in a QL cannot determine whether the number of tuples of a

table is even or odd, create a visualization of the results of a query, ask the user for additional input.

Page 4: Database Systems I  Relational Algebra

44

FORMAL QUERY LANGUAGES Two mathematical query languages form the

basis for “real” languages (e.g. SQL), and for implementation: Relational Algebra (RA)

More procedural, very useful for representing execution plans, relatively close to SQL.

Composed of a collection of operators A step-by-step procedure for computing the answer

Relational Calculus (RC) Lets users describe what they want, rather than how to

compute it. (Non-procedural, declarative.) Describes the answer, not the steps.

Understanding these formal query languages is important for understanding SQL and query processing.

Page 5: Database Systems I  Relational Algebra

55

PRELIMINARIES A query is applied to relation instances, and

the result of a query is also a relation instance. Inputs and Outputs of Queries are relations Query evaluated on instances of input relations

Different instance (DB?) as input = different answer Schemas of input relations for a query are fixed

(but query will run regardless of instance!) The schema for the result of a given query is also

fixed! Determined by definition of input relations and query language constructs.

Page 6: Database Systems I  Relational Algebra

66

PRELIMINARIES Positional vs. named-attribute notation:

Positional notationEx: function(1,2,3)easier for formal definitions

Named-attribute notationEx: function(var1=1, var2=2, var3=3)more readable

Advantages/disadvantages of one over the other?

Page 7: Database Systems I  Relational Algebra

77

EXAMPLE INSTANCES “Sailors” and

“Reserves” relations for our examples. 2 instances of

“Sailors” S1 & S2 We’ll use positional

or named attribute notation

Assume that names of attributes in query results are `inherited’ from names of attributes in query input relations

sid bid day22 101 10/10/9658 103 11/12/96

R1

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

S1

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S2

Page 8: Database Systems I  Relational Algebra

RELATIONAL ALGEBRA

Page 9: Database Systems I  Relational Algebra

99

RELATIONAL ALGEBRA An algebra consists of operators and

operands. Operands can be either variables or constants.

In the algebra of arithmetic: atomic operands are variables such as x or y and

constants such as 15. Operators are the usual arithmetic operators such

as +, -, *. Expressions are formed by applying operators

to atomic operands or other expressions. For example,

15x + 15

(x + 15) * y

Page 10: Database Systems I  Relational Algebra

1010

RELATIONAL ALGEBRA Algebraic expressions can be re-ordered

according to commutativity or associativity laws without changing their resulting value.

E.g., 15 + 20 = 20 + 15

(x * y) * z = x * (y * z) Parentheses group operators and define

precedence of operators, e.g. (x + 15) * y x + (15 *y)

Describes step-by-step procedure for computing the query Just like algebra in math.

Page 11: Database Systems I  Relational Algebra

1111

RELATIONAL ALGEBRA In relational algebra, operands are relations /

tables, and an expression evaluates to a relation / set of tuples.

The relational algebra operators are set operations, operations removing rows (selection) or columns

(projection) from a relation, operations combining two relations into a new

one (Cartesian product, join), a renaming operation, which changes the name

of the relation or of its attributes.

Page 12: Database Systems I  Relational Algebra

1212

RELATIONAL ALGEBRA Every operator takes one or two relation

instances A relational algebra expression is recursively

defined to be a relation A combination of relations is a relation Result is also a relation Can apply operator to

Relation from database Relation as a result of another operator

Page 13: Database Systems I  Relational Algebra

1313

RELATIONAL ALGEBRA OPERATIONS Basic operations

Selection ( ) Selects a subset of rows from

relation. Projection ( )

Deletes unwanted columns from relation.

Cartesian product ( ) Combine two relations.

Set-difference ( ) Tuples in relation 1, but not in

relation 2. Union ( )

Tuples in relation 1 or in relation 2.

Page 14: Database Systems I  Relational Algebra

1414

RELATIONAL ALGEBRA OPERATIONS Additional operations:

Intersection, join, division. Renaming of relations / attributes.

Not essential, can be implemented using the five basic operations.

But (very!) useful. Since each operation returns a relation,

operations can be composed, i.e. output of one operation can be input of the next operation.

Page 15: Database Systems I  Relational Algebra

1515

Renames relations / attributes, without changing the relation instance.

relation R is renamed to S, attributes are renamed A1, . . ., An

Rename only some attributes

using the positional notation to reference attributes

No renaming of attributes, just the relation

RENAMING

)(),...,2,1( RAnAAS

)(),...,11( RAkkAS

)(RS

Page 16: Database Systems I  Relational Algebra

1616

PROJECTION One input relation. Deletes attributes that are not in projection

list. Schema of result contains exactly the

attributes in the projection list, with the same names that they had in the (only) input relation.

Projection operation has to eliminate duplicates, since relations are sets. Duplicate elimination is expensive. Commercial DBMS typically don’t do duplicate

elimination unless the user explicitly asks for it.

Page 17: Database Systems I  Relational Algebra

1717

PROJECTION Similar in concept to VIEWs sname rating

yuppy 9lubber 8guppy 5rusty 10

sname rating S, ( )2

age35.055.5

age S( )2

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S2

Other fields are projected out

Page 18: Database Systems I  Relational Algebra

1818

SELECTION One input relation. Selects all tuples that satisfy selection

condition. No duplicates in result!

Schema of result identical to schema of (only) input relation.

Selection conditions: simple conditions comparing attribute values

(variables) and / or constants or complex conditions that combine simple

conditions using logical connectives AND and OR.

Page 19: Database Systems I  Relational Algebra

1919

SELECTION

rating S8 2( )

sid sname rating age28 yuppy 9 35.058 rusty 10 35.0

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S2

Page 20: Database Systems I  Relational Algebra

2020

SELECTION

rating S8 2( )

sid sname rating age28 yuppy 9 35.058 rusty 10 35.0

sname ratingyuppy 9rusty 10

sname rating rating S, ( ( ))8 2

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S2

But the result of an expression is another relation, we can substitute an expression wherever a relation is expected

Page 21: Database Systems I  Relational Algebra

2121

UNION, INTERSECTION, SET-DIFFERENCE All of these set

operations take two input relations, which must be union-compatible: Same sets of attributes. Corresponding

attributes have same type.

What is the schema of result?

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.044 guppy 5 35.028 yuppy 9 35.0

sid sname rating age31 lubber 8 55.558 rusty 10 35.0

S S1 2

S S1 2

sid sname rating age22 dustin 7 45.0

S S1 2

Page 22: Database Systems I  Relational Algebra

2222

UNION

Concatenates S1 and S2 Result contains ALL tuples that occur in either S1 or S2

Schemas must be identical If they have the same number of fields Fields have same domains

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.044 guppy 5 35.028 yuppy 9 35.0

S S1 2

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

S1sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S2

SELECT * FROM S1UNIONSELECT * FROM S2

Page 23: Database Systems I  Relational Algebra

2323

INTERSECTION

Result contains ALL tuples that occur in both S1 or S2 Schemas must be identical

21 SS

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

S1sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S2

SELECT * FROM S1INTERSECTSELECT * FROM S2

Page 24: Database Systems I  Relational Algebra

2424

SET-DIFFERENCE

Result contains ALL tuples that occur in S1 but not in S2 Schemas must be identical

21 SS

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

S1sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S2

SELECT * FROM S1MINUSSELECT * FROM S2

Page 25: Database Systems I  Relational Algebra

2525

CARTESIAN PRODUCT Also referred to as cross-product or product. Two input relations. Each tuple of the one relation is paired with each

tuple of the other relation. Result schema has one attribute per attribute of

both input relations, with attribute names `inherited’ if possible.

In the result, there may be two attributes with the same name, e.g. both S1 and R1 have an attribute called sid. What to do??

Page 26: Database Systems I  Relational Algebra

2626

CARTESIAN PRODUCT

Result contains all fields of S1 followed by fields of S2 Contains one tuple <s1,s2> for each pair of tuples in S1 and

S2. Schemas can be different If S1 and S2 contains fields of the same name, there is a

naming conflict. Rename one or both fields.

21 SS

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

S1sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

S2

SELECT * FROM S1, S2

Page 27: Database Systems I  Relational Algebra

2727

CARTESIAN PRODUCT Field names in conflict become unnamed

(sid) sname rating age (sid) bid day22 dustin 7 45.0 22 101 10/ 10/ 9622 dustin 7 45.0 58 103 11/ 12/ 9631 lubber 8 55.5 22 101 10/ 10/ 9631 lubber 8 55.5 58 103 11/ 12/ 9658 rusty 10 35.0 22 101 10/ 10/ 9658 rusty 10 35.0 58 103 11/ 12/ 96

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

sid bid day 22 101 10/10/96 58 103 11/12/96

R1S1

21 SS

Page 28: Database Systems I  Relational Algebra

2828

RENAMING

In the result, there may be two attributes with the same name, e.g. both S1 and R1 have an attribute called sid.

Then, apply the renaming operation, e.g.

The field names in the output relation are the same fields, but some are renamed

Field 1 is renamed to sid1, field 5 is renamed to sid5

For an expression to be well defined, no 2+ fields must have same name

)1()25,11(1 RsidsidS

Page 29: Database Systems I  Relational Algebra

2929

JOIN Similar to Cartesian product with same result schema.

Cartesian Product has no conditions Join has conditions

Join can be defined as a Cartesian Product, followed by selections and projections Join however is much more efficient Why?

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

sid bid day 22 101 10/10/96 58 103 11/12/96

R1S1

Page 30: Database Systems I  Relational Algebra

3030

JOIN Condition Join

Each tuple of the one relation is paired with each tuple of the other relation if the two tuples satisfy the join condition.

Condition c refers to attributes of both R and S.

R c S c R S ( )

(sid) sname rating age (sid) bid day22 dustin 7 45.0 58 103 11/ 12/ 9631 lubber 8 55.5 58 103 11/ 12/ 96

11:Example .1.1 RS sidRsidS

Page 31: Database Systems I  Relational Algebra

3131

JOIN

11:Example RS

Equi-Join: A special case of Conditional Join where the condition c contains only equalities.

Result schema similar to Cartesian product, but only one copy of attributes for which equality is specified. R1.sid is redundant R1.sid is dropped by a applying a projection

Natural Join: Equi-join on all common attributes.

sid sname rating age bid day22 dustin 7 45.0 101 10/ 10/ 9658 rusty 10 35.0 103 11/ 12/ 96

11:Example RS sid

Page 32: Database Systems I  Relational Algebra

3232

DIVISION Not supported as a primitive operation, but

useful for expressing queries like: Find sailors who have reserved all boats.

Let A have 2 attributes, x and y; B have only attribute y:A/B = i.e., A/B contains all x tuples (sailors) such that for

every y tuple (boat) in B, there is an xy tuple (reservation) in A.

In general, x and y can be any lists of attributes; y is the list of attributes in B, and x y is the list of attributes of A.

AyxByx ,:|

Page 33: Database Systems I  Relational Algebra

3333

DIVISION

sno pnos1 p1s1 p2s1 p3s1 p4s2 p1s2 p2s3 p2s4 p2s4 p4

pnop2

pnop2p4

pnop1p2p4

snos1s2s3s4

snos1s4

snos1

A

B1B2

B3

A/B1 A/B2 A/B3

Page 34: Database Systems I  Relational Algebra

3434

DIVISION Division is not an essential operation; can be

implemented using the five basic operations. Also true of joins, but joins are so common that

systems implement joins specially. Idea: For A/B, compute all x values in A that are

not ‘disqualified’ by some y value in B. x value in A is disqualified if by attaching y value

from B, we obtain an xy tuple that is not in A.

Disqualified x values:

A/B:

x x A B A(( ( ) ) )

x A( ) all disqualified x values

Page 35: Database Systems I  Relational Algebra

3535

DIVISION Division is not an essential operation; can be

implemented using the five basic operations.

SELECT name FROM Student WHERE NOT EXISTS (

SELECT CID FROM Course WHERE CID NOT IN (

SELECT CID FROM Register WHERE Student.SID

= Register.SID))

returns those courses this student didn’t take

If such courses don’t exist (NOT EXISTS), then student must’ve taken all courses

Page 36: Database Systems I  Relational Algebra

3636

Find names of sailors who’ve reservedboat #103.

Solution 1:

Solution 2:

Solution 3:

Which is most efficient? Why?

))Re(( 103 Sailorsservesbidsname

)Re( 1031 servesbidTemp

)1(2 SailorsTempTemp

sname bid serves Sailors( (Re ))103

EXAMPLE QUERIES

Page 37: Database Systems I  Relational Algebra

3737

Find names of sailors who’ve reserved a red boat.

Information about boat color only available in Boats; so need an extra join:

A more efficient solution:

A query optimizer can find the second solution given the first one.

sname color red Boats serves Sailors(( ' ' ) Re )

sname sid bid color red Boats s Sailors( (( ' ' ) Re ) )

EXAMPLE QUERIES

Page 38: Database Systems I  Relational Algebra

3838

Find sailors who’ve reserved a red or a green boat.

Can identify all red or green boats, then find sailors who’ve reserved one of these boats:

Can also define Tempboats using union! (How?)

What happens if OR is replaced by AND in this query?

)''''( BoatsgreencolorORredcolorTempboats

sname Tempboats serves Sailors( Re )

EXAMPLE QUERIES

Page 39: Database Systems I  Relational Algebra

3939

Find sailors who’ve reserved a red and a green boat.

Must identify sailors who’ve reserved red boats, sailors who’ve reserved green boats, then find the intersection (note that sid is a key for Sailors):

))Re)''((( servesBoatsredcolorsidTempred

sname Tempred Tempgreen Sailors(( ) )

))Re)''((( servesBoatsgreencolorsidTempgreen

EXAMPLE QUERIES

Page 40: Database Systems I  Relational Algebra

4040

Find the names of sailors who’ve reserved all boats.

Uses division; schemas of the input relations must be carefully chosen:

To find sailors who’ve reserved all ‘Interlake’ boats:

Book has lots of examples.

))(/)Re,(( BoatsbidservesbidsidTempsids

sname Tempsids Sailors( )

)''(/ BoatsInterlakebnamebid

EXAMPLE QUERIES

Page 41: Database Systems I  Relational Algebra

4141

SUMMARY The relational model has formal query

languages that are easy to use and allow efficient optimization by the DBS.

Relational algebra (RA) is more procedural; used as internal representation for SQL query evaluation plans.

Five basic RA operations: selection, projection, Cartesian product, union, set-difference.

Additional operations as shorthand for important cases: intersection, join, division.

These operations can be implemented using the basic operations.