# relational algebra. 421b: database systems - relational algebra 2 relational query languages q query...

Embed Size (px)

TRANSCRIPT

Relational Algebra

2421B: Database Systems - Relational Algebra

Relational Query Languages

Query languages: allow manipulation and retrieval of data from a database

Relational model supports simple, powerful QLs: Strong formal foundation Allows for much optimization

Query languages are NOT programming languages QLs not expected to be “turing complete” QLs not intended to be used for complex calculations QLs support easy, efficient and sophisticated access to large

data sets

3421B: Database Systems - Relational Algebra

Formal Relational Query Languages

Mathematical query languages form the basis for “real” languages (e.g. SQL) and for implementation: Relational Algebra:

Operational - a query is a sequence of operations on data Very useful for representing execution plans, i.e., to describe

how a SQL query is executed internally Relational Calculus:

descriptive - a query describes how the data to be retrieved looks like

Understanding Relational Algebra is key to understanding SQL.

4421B: Database Systems - Relational Algebra

Preliminaries A query is applied to a relation instance,

and the result of a query is also a relation instance (i.e., input = relation instances, output = relation instance) Schemas of input relations for a query are fixed

(but query will run regardless of instance) The schema for the result of a given query is

also fixed! Determined by the definition of query language constructs

5421B: Database Systems - Relational Algebra

Example Instances We assume that names of fields in query results are

‘inherited’ from names of fields in query input relations

sid sname

rating age

28 yuppy 9

35

31 debby 8

55

22 conny 5

35

58 lilly 10

35

sid bid day

22 101 10/24/07

58 103 09/30/07

58 101 09/01/07

58 104 10/23/07

Reserves R

Sailors S1

bid bname color101 Interlake blue

103 Snapper

red

104 Bingo green

Boat B

sid sname

rating age44 robby 7

4531 debby 8 5558 lilly 10 35

S2

6421B: Database Systems - Relational Algebra

Relational Algebra Basic Operations

Selection : Selects a subset of rows from a relation Projection ∏: projects to a subset of columns from a

relation Cross Product : Combines two relations Set Difference : Tuples that are in the first but not the

second relation Union : tuples in any of the relations to be unified

Additional operations Intersection (), join ( ), division, renaming Not essential but very useful

Relational algebra is closed: since each operation has input relation(s), and returns a relation, operations can be composed

7421B: Database Systems - Relational Algebra

Projection Projection:

Notation: ∏L(Rin) Returns a subset of the columns of the input relation Rin, i.e, it

ignores the attributes that are not in the projection list L Schema of result relation contains exactly the attributes of the

projection list, with the same attribute names as in Rin Projection operator has to eliminate duplicates! Note: real systems typically do NOT eliminate duplicates unless the

user explicitly asks for it; eliminating duplicates is a very costly operation!

sid sname

rating age

28 yuppy 9

35

31 debby 8

55

22 conny 5

35

58 lilly 10

35

S2sname

rating

yuppy 9debby 8conny 5lilly 10

∏ (S2)

age

3555

sname,rating ∏ (S2)age

8421B: Database Systems - Relational Algebra

Selection and Operator Composition

Selection: C(Rin) Returns the subset of the rows of the

input relation Rin that fulfill the condition C

Condition C involves the attributes of Rin Schema of result relation identical to

schema of Rin No duplicates (obviously)

Operation Composition: result relation of one operation can be input for another relational algebra operation

sid sname

rating age

28 yuppy 9

35

31 debby 8

55

22 conny 5

35

58 lilly 10

35

S2

(S2)rating > 8

sid sname

rating age

28 yuppy 9 3558 lilly 10 35

sname

rating

yuppy 9 lilly 10

∏ ( (S2))rating > 8sname,rating

9421B: Database Systems - Relational Algebra

Union, Intersection,Set-Difference

Notation: Rin1 Rin2 (Union),

Rin1 Rin2 (Intersection),

Rin1 - Rin2 (Difference),

Usual operations on sets Rin1 and Rin2 must be union-compatible, i.e., they

must have the same number of attributes and corresponding attributes must have the same type (note that attribute names need not be the same)

The result schema is the same as the schema of the input relations (with possible renamed attributes)

10421B: Database Systems - Relational Algebra

Union, Intersection,Set-Difference: Examples

sid sname

rating age

28 yuppy 9

35

31 debby 8

55

22 conny 5

35

58 lilly 10

35

S2

sid sname

rating age44 robby 7

4531 debby 8 5558 lilly 10 35

S1sid snam

erating age44 robby 7

4531 debby 8 55 58 lilly 10 35 28 yuppy 9 3522 conny 5 35

S1 S2

sid sname

rating age

31 debby 8 5558 lilly 10 35

S1 S2

sid sname

rating age

44 robby 7 45

S1 - S2

11421B: Database Systems - Relational Algebra

Cross-Product Cross-Product: Rin1 X Rin2 Each row of Rin1 is paired

with each row of Rin2 Result schema has one

attribute per attribute of Rin1 and one attribute

per attribute Rin2 with field names inherited if possible Conflict if both relations

have an attribute with the same name: e.g., attribute names of result relation concatenate input relation name with input attribute name

sid sname

rating age44 robby 7

4531 debby 8 5558 lilly 10 35

S2

S1.sid

sname

rating age

44 robby 7

45

44 robby 7

45

31 debby 8

55

31 debby 8

55 58 lilly 10

35

58 lilly 10

35

S1 X R1R1.sid bid day22 101 07/24/00 58 103 07/30/0022 101 07/24/00 58 103 07/30/0022 101 07/24/00 58 103 07/30/00

sid bid day

22 101 10/24/07

58 103 09/30/07

58 101 09/01/07

58 104 10/23/07

12421B: Database Systems - Relational Algebra

S2 R

Joins Condition Join (Theta-Join): Rout = Rin1 C Rin2 = C(Rin1 X Rin2) Result schema the same as for cross-product Fewer tuples than cross-product

sid sname

rating age44 robby 7

4531 debby 8 5558 lilly 10 35

S2

S2.sid

sname

rating age44 robby 7

4531 debby 8 55

R.sid bid day

22 101 10/24/0722 101 10/24/07

S2.sid > R.sid

sid bid day

22 101 10/24/07

58 103 09/30/07

58 101 09/01/07

58 104 10/23/07

58 lilly 10 35 22 101 10/24/07

13421B: Database Systems - Relational Algebra

S2 S2’

Self-Joins

sid sname

rating age44 robby 7

4531 debby 8 5558 lilly 10 35

S2

S2.sid

S2.sname

S2.rating

S2.age

44 robby 7 45 58 lilly 10 3531 debby 8 55 58 lilly 10 35

S2’.sid

S2’.sname

S2’.rating

S2.age > S2’.sid

give S2 a second name S2’

S2’.age

14421B: Database Systems - Relational Algebra

Equi-Join Equi-Join: Rout = Rin1 Rin1.a1 = Rin2.b1, … Rin1.an = Rin2.bn Rin2 = A special case of condition join where the condition C contains only equalities. Result schema similar to cross-product, but only one copy of attributes for which

equality is specified Natural Join: Equijoin on all common attributes, i.e., on all attributes with the

same name

sid sname

rating age44 robby 7

4531 debby 8 5558 lilly 10 35

S2 sid bid day

22 101 10/24/07

58 103 09/30/07

58 101 09/01/07

58 104 10/23/07

S2 RS2.sid

sname

rating age

58 lilly 10 3558 lilly 10

35

bid day

103 09/30/07 101 09/01/07 104 10/23/07

S2.sid = R.sid

58 lilly 10 35

15421B: Database Systems - Relational Algebra

Division Not supported as a primitive

operation, but useful for expressing queries like Find sailors who have reserved all boats

Let A have 2 fields, x and y; B have only field y: A/B = {<x> | <y> B <x,y> A} I.e., A/B contains all x tuples (sailors) such

that for every y tuple (boat) in B, there is an xy tuple in A

Or: if the set of y values (boats) associated with an x value (sailor) in A contains all y values in B, the x value is in A/B

In general, x and y can be any lists of fields; y is the list of fields in B, x y is the list of fields of A.

sid bid22 101

58 103

58

10158

104

Reserves R1

bid101

103

104

BoatB1

16421B: Database Systems - Relational Algebra

Examples of Division A/b

x y

x1 y1

x1 y2

x1 y3

x1 y4

x2 y1

x2 y2

x3 y2

x4 y2

x4 y4

y

y2

y

y2y4

y

y1

y2

y4

A

B1B2

B3x

x1

x2

x3

x4

A/B1

x

x1x4

A/B2

x

x1

A/B3

17421B: Database Systems - Relational Algebra

Expressing A/B using Basic Operators

Division is not an essential operation; just a useful shorthand (Also true of joins, but joins are so common that

systems implement joins specially) Idea: For A/B, compute all x values that are

not ‘disqualified’ by some y value in B. x value is disqualified if by attaching y value

from B, we obtain an xy tuple that is not in A Disqualified x values: ∏X((∏X (A) x B)-A)

A/B: ∏X (A) - all disqualified tuples

18421B: Database Systems - Relational Algebra

Renaming

Renaming : (Rout(B1,…Bn), Rin(A1, …An))

Produces a relation identical to Rin but named Rout and with

attributes A1, … An of Rin renamed to B1, … Bn

(Temp, S1), (Temp1(sid1,rating1), S1(sid,rating))

19421B: Database Systems - Relational Algebra

Examples (discussed in class)

Relations Sailors(sid,sname,rating,age) Reserves(sid,bid,day) Boats(bid,bname,color)

Queries Find names of sailors who have reserved boat #103 (three

solutions) Find names of sailors who have reserved a red boat (2

solutions) Find names of sailors who have reserved a red or a green

boat (2 solutions) Find names of sailors who have reserved a red and a

green boat (1 solution) Find names of sailors who have reserved all boats

(solution with division)

20421B: Database Systems - Relational Algebra

Summary

The relational model has rigorously defined query languages that are simple and powerful

Relational algebra is operational; useful as internal representation for query evaluation plans

Projection, selection, cross-product, difference and union are the minimal set of operators with which all operations of the relational algebra can be expressed

Several ways of expressing a given query; a query optimizer should choose the most efficient version.

Relational Completeness of a query language: A query language (e.g., SQL) can express every query that is expressible in relational algebra

21421B: Database Systems - Relational Algebra

Relational Calculus Relational Calculus is non-operational but descriptive:

Users define queries in terms of what they want, not in terms of how to compute it. (Declarativeness).

Comes in two flavors: Tuple relational calculus (TRC) and Domain relational calculus (DRC)

Both TRC and DRC are simple subsets of first-order logic

Calculus has variables, constants, comparison operations, logical connectives and quantifiers. TRC: variables range over tuples DRC: variables range over domain elements (= field values)

Find the names of all sailors with a rating > 7 TRC:{T | S Sailors(S.rating > 7 S.name = T.name)} DRC:{<N> | S,R,A(<S,N,R,A> Sailors R7}

22421B: Database Systems - Relational Algebra

Domain Relational Calculus

Query has the form:{<x1,x2,…,xn>} | p(<x1,x2,…xn>)}

The Answer includes all tuples <x1,x2,…,xn> that make the formula p(<x1,x2,…xn>} be true. (‘|’ should be read as “such that”).

Formulas are recursively defined, starting with simple atomic formulas (getting tuples from relations or making comparisons of values), and building bigger and better formulas using the logical connectives.

23421B: Database Systems - Relational Algebra

DRC Formulas

Variables X,Y, x1,… range over domain elements (= field values)

Atomic formula: Let op be one of ,,,,, <x1,x2,…xn> Relation-name, or X op Y, or X op constant

Formula An atomic formula, or p, pq, pq, where p and q are formulas, or X(p(X)), where variable X is free in p(X), or X(p(X)), where variable X is free in p(X) The use of quantifiers X and X is said to bind X. A

variable that is not bound is free.

24421B: Database Systems - Relational Algebra

Free and Bound Variables

The use of quantifiers X and X is said to bind X. A variable that is not bound is free.

Let us revisit the definition of a query:{<x1,x2,…,xn>} | p(<x1,x2,…xn>)}

There is an important restriction: the variables x1,…,xn that appear to the left of ‘|’ must be the only free variables in the formula p(…).

25421B: Database Systems - Relational Algebra

Example

Find all sailors with a rating above 7:{<I,N,T,A> | <I,N,T,A> Sailors T7}

The condition <I,N,T,A>Sailors ensures that the domain variables I,N,T, and A are bound to fields of the same Sailors tuple.

The condition T7 ensures that all tuples in the answers have a rating above 7

The term <I,N,T,A> to the left of ‘|’ says that every tuple of the Sailors relation with a rating above 7 is in the answer

26421B: Database Systems - Relational Algebra

Further Examples (discussed in class)

Relations Sailors(sid,sname,rating,age) Reserves(sid,bid,day) Boats(bid,bname,color)

Queries Find sailors who are older than 18 or have a rating under

9, and are called ‘debby’’ Find sailors with a rating above 7 that have reserved boat

#103 (use of ) Find sailors rated > 7 who have reserved a red boat (use

of ) Find sailors who have reserved all boats (use of )

Transfer to “find all sailors I such that for each 3-tuple (B,BN,C> either it is not a tuple in Boats or there is a tuple in Reserves showing that sailor I has reserved the boat.

27421B: Database Systems - Relational Algebra

Unsafe Queries and Expressive Power

Unsafe queries: It is possible to write syntactically correct calculus queries that have an infinite number of answers! Such queries are called unsafe. E.g., {<I,N,T,A> | (<I,N,T,A> Sailors)}

Expressive Power: It is known that every query that can be expressed in relational algebra can be expressed as a safe query in DRC/TRC; the converse is also true.

Relational Completeness of a query language: A query language (e.g., SQL) can express every query that is expressible in relational algebra/calculus

28421B: Database Systems - Relational Algebra

Some rules and definitions

Equivalence: Let R, S, T be relations; C, C1, C2 conditions; L projection lists of the relations R and S Commutativity:

∏L(C(R)) = C(∏L(R)) if C only considers attributes of L R1 R2 = R2 R1

Associativity: R1 (R2 R3) = (R1 R2) R3

Idempotence: ∏L2 (∏L1(R)) = ∏L2 (R) if L2 L1 C2(C1(R)) = C1C2(R))