chapter 5 notes. p. 189: sets, bags, and lists to understand the distinction between sets, bags, and...

40
Chapter 5 Notes

Upload: elmer-alexander

Post on 05-Jan-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Chapter 5

Notes

Page 2: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

P. 189: Sets, Bags, and Lists

• To understand the distinction between sets, bags, and lists, remember that a set has unordered elements, and only one occurrence of each element.

• A bag allows more than one occurrence of an element, but the elements and their occurrences are unordered.

• A list allows more than one occurrence of an element, but the occurrences are ordered.

• Thus, {1, 2, 1} and {2, 1, 1} are the same bag, but (1, 2, 1) and (2, 1, 1) are not the same list.

Page 3: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

P. 206: Why bags?

• Commercial DBMS’s implement relations that are bags, rather than sets ….. Some relational operations are considerably more efficient if we use the bag model

– To take the union of two relations as bags, we simply copy one relation and add to the copy all the tuples of the other relation• There is no need to eliminate duplicate copies of a tuple that happens to

be in both relations.

– When we project relations as sets, we need to compare each projected type with all the other projected tuples, to make sure that each projection appears only once.• If we can accept a bag as a result, then we simply project each tuple and

add it to the result; no comparison with other projected tuples is necessary.

Page 4: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Union, Intersection, and Difference of Bags

• Suppose that R and S are bags, and that tuple t appears n times in R and m times in S.

• In the bag union R S, tuple t appears n + m times• In the bag intersection R S, the tuple t appears

min(n, m) times• In the bag difference R – S, tuple t appears max(0, n-

m) times

Page 5: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Projection and Selection on Bags

• Each tuple is processed independently during a projection– If the elimination of one or more attributes during the

projection causes the same tuple to be created from several tuples, these duplicate tuples are not eliminated from the result of the a bag-projection.

• To apply a selection to a bag, we apply the selection condition to each tuple independently.– Duplicate tuples are not eliminated in the result

Page 6: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Product and Joins of Bags

• The rule for the Cartesian product of bags is the expected one.

• Joining bags presents no surprises– Compare each tuple of one relation with each tuple of the

other, – decide whether or not this pair of tuples joins successfully,– and if so we put the resulting tuple in the answer. – Duplicate tuples are not eliminated in the answer.

Page 7: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Section 5.2: The Extended Algebra

δ = eliminate duplicates from bags.τ = sort tuples.γ = grouping and aggregation.Outerjoin : avoids “dangling tuples” = tuples that do not

join with anything.

Page 8: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Duplicate Elimination

• R1 := δ(R2).

• R1 consists of one copy of each tuple that appears in R2 one or more times.

Page 9: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Example: Duplicate Elimination

R = ( A B ) 1 2 3 4 1 2

δ(R) = A B1 23 4

Page 10: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

5.2.2: Aggregation Operators

• Aggregation operators are not operators of relational algebra.

• Rather, they apply to entire columns of a table and produce a single result.

• The most important examples: SUM, AVG, COUNT, MIN, and MAX.

Page 11: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Example: Aggregation

R = ( A B )1 33 43 2

SUM(A) = 7COUNT(A) = 3MAX(B) = 4AVG(B) = 3

Page 12: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

5.2.4: Grouping Operator

• R1 := γL (R2). L is a list of elements that are either:

1. Individual (grouping ) attributes.2. AGG(A ), where AGG is one of the aggregation operators

and A is an attribute.• An arrow and a new attribute name renames the

component.

Page 13: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Applying γL(R)

• Group R according to all the grouping attributes on list L.– That is: form one group for each distinct list of values for

those attributes in R.

• Within each group, compute AGG(A ) for each aggregation on list L.

• Result has one tuple for each group:1. The grouping attributes and2. Their group’s aggregations.

Page 14: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

14

Grouping/Aggregation

R = ( A B C )1 2 34 5 61 2 5

γA,B,AVG(C)->X (R) = ??

First, group R by A and B :A B C1 2 31 2 54 5 6

Then, average C within groups:

A B X1 2 44 5 6

Page 15: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

5.2.6: Sorting

• R1 := τL (R2).– L is a list of some of the attributes of R2.

• R1 is the list of tuples of R2 sorted first on the value of the first attribute on L, then on the second attribute of L, and so on.– Break ties arbitrarily.

• τ is the only operator whose result is neither a set nor a bag.

Page 16: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Example: Sorting

R = ( A B )1 23 45 2

τB (R) = [(5,2), (1,2), (3,4)]

Page 17: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

5.2.7: Outerjoin

• Suppose we join R ⋈C S.

• A tuple of R that has no tuple of S with which it joins is said to be dangling.– Similarly for a tuple of S.

• Outerjoin preserves dangling tuples by padding them NULL.

Page 18: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Outerjoin

R = ( A B ) S = ( B C )1 2 2 34 5 6 7

(1,2) joins with (2,3), but the other two tuplesare dangling.

R OUTERJOIN S = A B C1 2 34 5 NULLNULL 6 7

Page 19: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

5.3 A Logic for Relations

• The logical query language Datalog consists of if-then rules.

• Each of these rules expresses the idea that from certain combinations of tuples in certain relations, – we may infer that some other tuple must be in some other

relation, or in the answer to a query.

• If-then logical rules have been used in many systems.– Nonrecursive rules are equivalent to the core relational

algebra.– Recursive rules extend relational algebra and appear in SQL-99.

Page 20: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Integration example

• Goal: integrated view of the menus at many bars Sells(bar, beer, price).

• Joe has data JoeMenu(beer, price).

• Approach 1: Describe Sells in terms of JoeMenu and other local data sources.

Sells(’Joe’’s Bar’, b, p) <- JoeMenu(b, p)

Page 21: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Integration Example cont.

• Approach 2: Describe how JoeMenu can be used as a view to help answer queries about Sells and other relations.

JoeMenu(b, p) <- Sells(’Joe’’s Bar’, b, p)

Page 22: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

A Logical rule

• Our first example of a rule uses the relations Frequents(drinker, bar), Likes(drinker, beer), and Sells(bar, beer, price).

• The rule is a query asking for “happy” drinkers --- those that frequent a bar that serves a beer that they like.

Page 23: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Anatomy of a Rule

Happy(d) <- Frequents(d,bar) ANDLikes(d,beer) AND Sells(bar,beer,p)

Body = antecedent =AND of subgoals.

Head = consequent,a single subgoal

Read thissymbol “if”

Page 24: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Subgoals Are Atoms

• An atom is a predicate, or relation name with variables or constants as arguments.

• In essence, a predicate is the name of a function that returns a boolean value.– R(a, b, c) is true if (a, b, c) is a tuple of R.

• The head is an atom; the body is the AND of one or more atoms.

• Convention: Predicates begin with a capital, variables begin with lower-case.

Page 25: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Atom

Sells(bar, beer, p)

The predicate= name of arelation

Arguments arevariables (or constants).

Page 26: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Interpreting Rules

• A variable appearing in the head is distinguished ; otherwise it is nondistinguished.

• Rule meaning: The head is true for given values of the distinguished variables if there exist values of the nondistinguished variables that make all subgoals of the body true.

Page 27: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Interpretation

Happy(d) <- Frequents(d,bar) ANDLikes(d,beer) AND Sells(bar,beer,p)

Distinguishedvariable

Nondistinguishedvariables

Interpretation: drinker d is happy if there exist abar, a beer, and a price p such that d frequentsthe bar, likes the beer, and the bar sells the beerat price p.

Page 28: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Arithmetic Atoms or Subgoals

• In addition to relations as predicates, a predicate for a subgoal of the body (an atom) can be an arithmetic comparison.

• We write arithmetic atoms in the usual way, – e.g., x < y.

• The previously defined atoms are called relational atoms

Page 29: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Arithmetic Atoms

• A beer is “cheap” if there are at least two bars that sell it for under $2.

Cheap(beer) <- Sells(bar1,beer,p1) ANDSells(bar2,beer,p2) AND p1 < 2.00AND p2 < 2.00 AND bar1 <> bar2

Page 30: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Negated Subgoals

• NOT in front of a subgoal negates its meaning.

• Example: Think of Arc(a,b) as arcs in a graph.– S(x,y) says the graph is not transitive from x to y ; i.e.,

there is a path of length 2 from x to y, but no arc from x to y.

S(x,y) <- Arc(x,z) AND Arc(z,y)AND NOT Arc(x,y)

Page 31: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Datalog Rules and Queries

• Applying a rule• consider all combinations of values of the variables.• If all subgoals are true, then evaluate the head.• The resulting head is a tuple in the result.

Happy(d) <- Frequents(d,bar) ANDLikes(d,beer) AND Sells(bar,beer,p)

FOR (each d, bar, beer, p)IF (Frequents(d,bar), Likes(d,beer), and Sells(bar,beer,p) are all true)

add Happy(d) to the result• Note: set semantics so add only once

Page 32: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Applying a Rule 2

• For each subgoal, consider all tuples that make the subgoal true.

• If a selection of tuples define a single value for each variable, then add the head to the result.

• Leads to finite search for P(x)<-Q(x), but P(x)<-Q(y) is problematic.– We want rule evaluations to be finite and lead to finite

results.– “Unsafe” rules like P(x)<-Q(y) have infinite results, even if

Q is finite.

Page 33: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Rule Evaluation 2

Happy(d) <- Frequents(d,bar) ANDLikes(d,beer) AND Sells(bar,beer,p)

FOR (each f in Frequents, i in Likes, ands in Sells)IF (f[1]=i[1] and f[2]=s[1] and i[2]=s[2])

add Happy(f[1]) to the result

Page 34: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Safe Rules

• A rule is safe if:1. Each distinguished variable,2. Each variable in an arithmetic subgoal, and3. Each variable in a negated subgoal,

also appears in a nonnegated,relational subgoal.

• Safe rules prevent infinite results.

Page 35: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Unsafe Rules

• Each of the following is unsafe and not allowed:1. S(x) <- R(y)2. S(x) <- R(y) AND NOT R(x)3. S(x) <- R(y) AND x < y

• In each case, an infinity of x ’s can satisfy the rule, even if R is a finite relation.

Page 36: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Safe Rules Advantage

• We can use “approach 2” to evaluation, where we select tuples from only the nonnegated, relational subgoals.

• The head, negated relational subgoals, and arithmetic subgoals thus have all their variables defined and can be evaluated.

Page 37: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

5.4 Datalog Programs

• Datalog program = collection of rules.• In a program, predicates can be either

1. EDB = Extensional Database = stored table.2. IDB = Intensional Database = relation defined by rules.

• Never both! No EDB in heads

Page 38: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Evaluating Datalog Programs

• As long as there is no recursion, we can pick an order to evaluate the IDB predicates, so that all the predicates in the body of its rules have already been evaluated.

• If an IDB predicate has more than one rule, each rule contributes tuples to its relation.

Page 39: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Example Datalog Program

• Using EDB Sells(bar, beer, price) and Beers(name, manf), find the manufacturers of beers Joe doesn’t sell.

JoeSells(b) <- Sells(’Joe’’s Bar’, b, p)Answer(m) <- Beers(b,m)

AND NOT JoeSells(b)

• Step 1: Examine all Sells tuples with first component ’Joe’’s Bar’.– Add the second component to JoeSells.

• Step 2: Examine all Beers tuples (b,m).– If b is not in JoeSells, add m to Answer.

Page 40: Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,

Relational Algebra and Datalog

• Without recursion, Datalog can express all and only the queries of core relational algebra.– The same as SQL select-from-where, without aggregation

and grouping.

• But with recursion, Datalog can express more than these languages.

• Yet still not Turing-complete.