database systems i relational algebra
DESCRIPTION
Database Systems I Relational Algebra. Relational Query Languages. Query languages (QL): Allow manipulation and retrieval of data from a database. Relational model supports simple, powerful query languages: Strong formal foundation based on logic. - PowerPoint PPT PresentationTRANSCRIPT
DATABASE SYSTEMS I
RELATIONAL ALGEBRA
22
RELATIONAL QUERY LANGUAGES Query languages (QL): Allow manipulation
and retrieval of data from a database. Relational model supports simple, powerful
query languages: Strong formal foundation based on logic. High level, abstract formulation of queries. Easy to program. Allows the DBS to do much optimization.
DBS can choose, e.g., most efficient sorting algorithm or the order of basic operations.
33
RELATIONAL QUERY LANGUAGES Query Languages != programming
languages! QLs not expected to be “Turing complete”.
Able to answer any computable problem given enough storage space and time
QLs not intended to be used for complex calculations. Applications on top of database can do that
QLs support easy, efficient access to large data sets.
E.g., in a QL cannot determine whether the number of tuples of a
table is even or odd, create a visualization of the results of a query, ask the user for additional input.
44
FORMAL QUERY LANGUAGES Two mathematical query languages form the
basis for “real” languages (e.g. SQL), and for implementation: Relational Algebra (RA)
More procedural, very useful for representing execution plans, relatively close to SQL.
Composed of a collection of operators A step-by-step procedure for computing the answer
Relational Calculus (RC) Lets users describe what they want, rather than how to
compute it. (Non-procedural, declarative.) Describes the answer, not the steps.
Understanding these formal query languages is important for understanding SQL and query processing.
55
PRELIMINARIES A query is applied to relation instances, and
the result of a query is also a relation instance. Inputs and Outputs of Queries are relations Query evaluated on instances of input relations
Different instance (DB?) as input = different answer Schemas of input relations for a query are fixed
(but query will run regardless of instance!) The schema for the result of a given query is also
fixed! Determined by definition of input relations and query language constructs.
66
PRELIMINARIES Positional vs. named-attribute notation:
Positional notationEx: function(1,2,3)easier for formal definitions
Named-attribute notationEx: function(var1=1, var2=2, var3=3)more readable
Advantages/disadvantages of one over the other?
77
EXAMPLE INSTANCES “Sailors” and
“Reserves” relations for our examples. 2 instances of
“Sailors” S1 & S2 We’ll use positional
or named attribute notation
Assume that names of attributes in query results are `inherited’ from names of attributes in query input relations
sid bid day22 101 10/10/9658 103 11/12/96
R1
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0
S1
sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
S2
RELATIONAL ALGEBRA
99
RELATIONAL ALGEBRA An algebra consists of operators and
operands. Operands can be either variables or constants.
In the algebra of arithmetic: atomic operands are variables such as x or y and
constants such as 15. Operators are the usual arithmetic operators such
as +, -, *. Expressions are formed by applying operators
to atomic operands or other expressions. For example,
15x + 15
(x + 15) * y
1010
RELATIONAL ALGEBRA Algebraic expressions can be re-ordered
according to commutativity or associativity laws without changing their resulting value.
E.g., 15 + 20 = 20 + 15
(x * y) * z = x * (y * z) Parentheses group operators and define
precedence of operators, e.g. (x + 15) * y x + (15 *y)
Describes step-by-step procedure for computing the query Just like algebra in math.
1111
RELATIONAL ALGEBRA In relational algebra, operands are relations /
tables, and an expression evaluates to a relation / set of tuples.
The relational algebra operators are set operations, operations removing rows (selection) or columns
(projection) from a relation, operations combining two relations into a new
one (Cartesian product, join), a renaming operation, which changes the name
of the relation or of its attributes.
1212
RELATIONAL ALGEBRA Every operator takes one or two relation
instances A relational algebra expression is recursively
defined to be a relation A combination of relations is a relation Result is also a relation Can apply operator to
Relation from database Relation as a result of another operator
1313
RELATIONAL ALGEBRA OPERATIONS Basic operations
Selection ( ) Selects a subset of rows from
relation. Projection ( )
Deletes unwanted columns from relation.
Cartesian product ( ) Combine two relations.
Set-difference ( ) Tuples in relation 1, but not in
relation 2. Union ( )
Tuples in relation 1 or in relation 2.
1414
RELATIONAL ALGEBRA OPERATIONS Additional operations:
Intersection, join, division. Renaming of relations / attributes.
Not essential, can be implemented using the five basic operations.
But (very!) useful. Since each operation returns a relation,
operations can be composed, i.e. output of one operation can be input of the next operation.
1515
Renames relations / attributes, without changing the relation instance.
relation R is renamed to S, attributes are renamed A1, . . ., An
Rename only some attributes
using the positional notation to reference attributes
No renaming of attributes, just the relation
RENAMING
)(),...,2,1( RAnAAS
)(),...,11( RAkkAS
)(RS
1616
PROJECTION One input relation. Deletes attributes that are not in projection
list. Schema of result contains exactly the
attributes in the projection list, with the same names that they had in the (only) input relation.
Projection operation has to eliminate duplicates, since relations are sets. Duplicate elimination is expensive. Commercial DBMS typically don’t do duplicate
elimination unless the user explicitly asks for it.
1717
PROJECTION Similar in concept to VIEWs sname rating
yuppy 9lubber 8guppy 5rusty 10
sname rating S, ( )2
age35.055.5
age S( )2
sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
S2
Other fields are projected out
1818
SELECTION One input relation. Selects all tuples that satisfy selection
condition. No duplicates in result!
Schema of result identical to schema of (only) input relation.
Selection conditions: simple conditions comparing attribute values
(variables) and / or constants or complex conditions that combine simple
conditions using logical connectives AND and OR.
1919
SELECTION
rating S8 2( )
sid sname rating age28 yuppy 9 35.058 rusty 10 35.0
sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
S2
2020
SELECTION
rating S8 2( )
sid sname rating age28 yuppy 9 35.058 rusty 10 35.0
sname ratingyuppy 9rusty 10
sname rating rating S, ( ( ))8 2
sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
S2
But the result of an expression is another relation, we can substitute an expression wherever a relation is expected
2121
UNION, INTERSECTION, SET-DIFFERENCE All of these set
operations take two input relations, which must be union-compatible: Same sets of attributes. Corresponding
attributes have same type.
What is the schema of result?
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.044 guppy 5 35.028 yuppy 9 35.0
sid sname rating age31 lubber 8 55.558 rusty 10 35.0
S S1 2
S S1 2
sid sname rating age22 dustin 7 45.0
S S1 2
2222
UNION
Concatenates S1 and S2 Result contains ALL tuples that occur in either S1 or S2
Schemas must be identical If they have the same number of fields Fields have same domains
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.044 guppy 5 35.028 yuppy 9 35.0
S S1 2
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0
S1sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
S2
SELECT * FROM S1UNIONSELECT * FROM S2
2323
INTERSECTION
Result contains ALL tuples that occur in both S1 or S2 Schemas must be identical
21 SS
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0
S1sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
S2
SELECT * FROM S1INTERSECTSELECT * FROM S2
2424
SET-DIFFERENCE
Result contains ALL tuples that occur in S1 but not in S2 Schemas must be identical
21 SS
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0
S1sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
S2
SELECT * FROM S1MINUSSELECT * FROM S2
2525
CARTESIAN PRODUCT Also referred to as cross-product or product. Two input relations. Each tuple of the one relation is paired with each
tuple of the other relation. Result schema has one attribute per attribute of
both input relations, with attribute names `inherited’ if possible.
In the result, there may be two attributes with the same name, e.g. both S1 and R1 have an attribute called sid. What to do??
2626
CARTESIAN PRODUCT
Result contains all fields of S1 followed by fields of S2 Contains one tuple <s1,s2> for each pair of tuples in S1 and
S2. Schemas can be different If S1 and S2 contains fields of the same name, there is a
naming conflict. Rename one or both fields.
21 SS
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0
S1sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
S2
SELECT * FROM S1, S2
2727
CARTESIAN PRODUCT Field names in conflict become unnamed
(sid) sname rating age (sid) bid day22 dustin 7 45.0 22 101 10/ 10/ 9622 dustin 7 45.0 58 103 11/ 12/ 9631 lubber 8 55.5 22 101 10/ 10/ 9631 lubber 8 55.5 58 103 11/ 12/ 9658 rusty 10 35.0 22 101 10/ 10/ 9658 rusty 10 35.0 58 103 11/ 12/ 96
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0
sid bid day 22 101 10/10/96 58 103 11/12/96
R1S1
21 SS
2828
RENAMING
In the result, there may be two attributes with the same name, e.g. both S1 and R1 have an attribute called sid.
Then, apply the renaming operation, e.g.
The field names in the output relation are the same fields, but some are renamed
Field 1 is renamed to sid1, field 5 is renamed to sid5
For an expression to be well defined, no 2+ fields must have same name
)1()25,11(1 RsidsidS
2929
JOIN Similar to Cartesian product with same result schema.
Cartesian Product has no conditions Join has conditions
Join can be defined as a Cartesian Product, followed by selections and projections Join however is much more efficient Why?
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0
sid bid day 22 101 10/10/96 58 103 11/12/96
R1S1
3030
JOIN Condition Join
Each tuple of the one relation is paired with each tuple of the other relation if the two tuples satisfy the join condition.
Condition c refers to attributes of both R and S.
R c S c R S ( )
(sid) sname rating age (sid) bid day22 dustin 7 45.0 58 103 11/ 12/ 9631 lubber 8 55.5 58 103 11/ 12/ 96
11:Example .1.1 RS sidRsidS
3131
JOIN
11:Example RS
Equi-Join: A special case of Conditional Join where the condition c contains only equalities.
Result schema similar to Cartesian product, but only one copy of attributes for which equality is specified. R1.sid is redundant R1.sid is dropped by a applying a projection
Natural Join: Equi-join on all common attributes.
sid sname rating age bid day22 dustin 7 45.0 101 10/ 10/ 9658 rusty 10 35.0 103 11/ 12/ 96
11:Example RS sid
3232
DIVISION Not supported as a primitive operation, but
useful for expressing queries like: Find sailors who have reserved all boats.
Let A have 2 attributes, x and y; B have only attribute y:A/B = i.e., A/B contains all x tuples (sailors) such that for
every y tuple (boat) in B, there is an xy tuple (reservation) in A.
In general, x and y can be any lists of attributes; y is the list of attributes in B, and x y is the list of attributes of A.
AyxByx ,:|
3333
DIVISION
sno pnos1 p1s1 p2s1 p3s1 p4s2 p1s2 p2s3 p2s4 p2s4 p4
pnop2
pnop2p4
pnop1p2p4
snos1s2s3s4
snos1s4
snos1
A
B1B2
B3
A/B1 A/B2 A/B3
3434
DIVISION Division is not an essential operation; can be
implemented using the five basic operations. Also true of joins, but joins are so common that
systems implement joins specially. Idea: For A/B, compute all x values in A that are
not ‘disqualified’ by some y value in B. x value in A is disqualified if by attaching y value
from B, we obtain an xy tuple that is not in A.
Disqualified x values:
A/B:
x x A B A(( ( ) ) )
x A( ) all disqualified x values
3535
DIVISION Division is not an essential operation; can be
implemented using the five basic operations.
SELECT name FROM Student WHERE NOT EXISTS (
SELECT CID FROM Course WHERE CID NOT IN (
SELECT CID FROM Register WHERE Student.SID
= Register.SID))
returns those courses this student didn’t take
If such courses don’t exist (NOT EXISTS), then student must’ve taken all courses
3636
Find names of sailors who’ve reservedboat #103.
Solution 1:
Solution 2:
Solution 3:
Which is most efficient? Why?
))Re(( 103 Sailorsservesbidsname
)Re( 1031 servesbidTemp
)1(2 SailorsTempTemp
sname bid serves Sailors( (Re ))103
EXAMPLE QUERIES
3737
Find names of sailors who’ve reserved a red boat.
Information about boat color only available in Boats; so need an extra join:
A more efficient solution:
A query optimizer can find the second solution given the first one.
sname color red Boats serves Sailors(( ' ' ) Re )
sname sid bid color red Boats s Sailors( (( ' ' ) Re ) )
EXAMPLE QUERIES
3838
Find sailors who’ve reserved a red or a green boat.
Can identify all red or green boats, then find sailors who’ve reserved one of these boats:
Can also define Tempboats using union! (How?)
What happens if OR is replaced by AND in this query?
)''''( BoatsgreencolorORredcolorTempboats
sname Tempboats serves Sailors( Re )
EXAMPLE QUERIES
3939
Find sailors who’ve reserved a red and a green boat.
Must identify sailors who’ve reserved red boats, sailors who’ve reserved green boats, then find the intersection (note that sid is a key for Sailors):
))Re)''((( servesBoatsredcolorsidTempred
sname Tempred Tempgreen Sailors(( ) )
))Re)''((( servesBoatsgreencolorsidTempgreen
EXAMPLE QUERIES
4040
Find the names of sailors who’ve reserved all boats.
Uses division; schemas of the input relations must be carefully chosen:
To find sailors who’ve reserved all ‘Interlake’ boats:
Book has lots of examples.
))(/)Re,(( BoatsbidservesbidsidTempsids
sname Tempsids Sailors( )
)''(/ BoatsInterlakebnamebid
EXAMPLE QUERIES
4141
SUMMARY The relational model has formal query
languages that are easy to use and allow efficient optimization by the DBS.
Relational algebra (RA) is more procedural; used as internal representation for SQL query evaluation plans.
Five basic RA operations: selection, projection, Cartesian product, union, set-difference.
Additional operations as shorthand for important cases: intersection, join, division.
These operations can be implemented using the basic operations.