1 database systems lecture #7 yan pan school of software, sysu 2011
TRANSCRIPT
1
Database SystemsLecture #7
Yan Pan
School of Software, SYSU
2011
2
Agenda Subqueries, etc.
3
Relation operators Basic operators:
Selection: Projection: Cartesian Product:
Other set-theoretic ops: Union: Intersection: Difference: -
Additional operators: Joins (natural, equijoin, theta join, semijoin) Renaming: Grouping…
4
New-style join types Cross joins (simplest):
FROM A CROSS JOIN B
Inner joins (regular joins): FROM A [INNER] JOIN B ON …
Natural join: FROM A NATURAL JOIN B; Joins on common fields and merges
Outer joins (later) No dangling rows
5
SQL e.g. with tuple vars Reps(ssn, name, etc.) Clients(ssn, name, rssn)
Q: Who are George’s clients, in SQL?
Conceptually: Clients.name(Reps.name=“George” and Reps.ssn=rssn(Reps x
Clients))
6
New topic: Subqueries Powerful feature of SQL: one clause can
contain other SQL queries Anywhere where a value or relation is allowed
Several ways: Selection single constant (scalar) in SELECT Selection single constant (scalar) in WHERE Selection relation in WHERE Selection relation in FROM
7
Standard multi-table example Purchase(prodname, buyerssn, etc.) Person(name, ssn, etc.) Q: What did Christo buy?
As usual, need to AND on equality identifying ssn’s row and buyerssn’s row
SELECT Purchase.prodnameFROM Purchase, PersonWHERE buyerssn = ssn AND name = 'Christo'
SELECT Purchase.prodnameFROM Purchase, PersonWHERE buyerssn = ssn AND name = 'Christo'
8
Subquery motivation Purchase(prodname, buyerssn, etc.) Person(name, ssn, etc.) Q: What did Christo buy?
Natural intuition: Go find Christo ’s ssn Then find purchases
SELECT ssnFROM PersonWHERE name = 'Christo'
SELECT ssnFROM PersonWHERE name = 'Christo'
SELECT Purchase.prodnameFROM PurchaseWHERE buyerssn = Christo’s-ssn
SELECT Purchase.prodnameFROM PurchaseWHERE buyerssn = Christo’s-ssn
9
Subqueries Subquery: copy in Christo ’s selection for his ssn:
The subquery returns one value, so the = is valid If it returns more (or fewer), we get a run-time error
SELECT Purchase.prodnameFROM PurchaseWHERE buyerssn = (SELECT ssn FROM Person WHERE name = 'Christo')
SELECT Purchase.prodnameFROM PurchaseWHERE buyerssn = (SELECT ssn FROM Person WHERE name = 'Christo')
10
Operators on subqueries Several new operators applied to (unary)
selections:1. IN R
2. EXISTS R
3. UNIQUE R
4. s > ALL R
5. s > ANY R
6. x IN R > is just an example op Each expression can be negated with NOT
Guifeng Zheng, DBMS, SS/SYSU 11
Subqueries with IN Product(prodname,maker), Person(name,ssn),
Purchase(buyerssn,product) Q: Find companies Martha bought products from Strategy:
1. Find Martha’s ssn2. Find products listed with that ssn as buyer3. Find company names of those products
SELECT DISTINCT Product.makerFROM ProductWHERE prodname IN (SELECT product FROM Purchase WHERE buyerssn =
(SELECT ssn FROM Person
WHERE name = 'Martha'))
SELECT DISTINCT Product.makerFROM ProductWHERE prodname IN (SELECT product FROM Purchase WHERE buyerssn =
(SELECT ssn FROM Person
WHERE name = 'Martha'))
12
Subqueries returning relations Equivalent to:
Or:
SELECT DISTINCT Product.makerFROM Product, Purchase, PeopleWHERE prodname = product AND buyerssn = ssn AND name = 'Martha'
SELECT DISTINCT Product.makerFROM Product, Purchase, PeopleWHERE prodname = product AND buyerssn = ssn AND name = 'Martha'
SELECT DISTINCT Product.makerFROM Product JOIN Purchase ON prodname=product JOIN People ON buyerssn=ssnWHERE name = 'Martha'
SELECT DISTINCT Product.makerFROM Product JOIN Purchase ON prodname=product JOIN People ON buyerssn=ssnWHERE name = 'Martha'
13
FROM subqueries Motivation for another way:
suppose we’re given Martha’s purchases Then could just cross with Products to get product makers
Substitute (named) subquery for Martha’s purchases
SELECT makerFROM Product, (SELECT product FROM Purchase WHERE buyerssn =
(SELECT ssn FROM Person WHERE name = 'Martha')) Marthas
WHERE Product.name = Marthas.product
SELECT makerFROM Product, (SELECT product FROM Purchase WHERE buyerssn =
(SELECT ssn FROM Person WHERE name = 'Martha')) Marthas
WHERE Product.name = Marthas.product
14
Complex RA Expressions Scenario:
1. Purchase(pid, seller-ssn, buyer-ssn, etc.)
2. Person(ssn, name, etc.)
3. Product(pid, name, etc.)
Q: Who (give names) bought gizmos from Dick? Where to start? Purchase uses pid, ssn, so must get them…
15
Complex RA Expressions
Person Purchase Person Product
name='Dick' name='Gizmo'
pid ssn
seller-ssn=ssn
pid=pid
buyer-ssn=Person.ssn
name
16
Translation to SQL
We’re converting the tree on the last slide into SQL The result of the query should be the names indicated above One step at a time, we’ll make the query more complete, until
we’ve translated the English-language description to an actual SQL query
We’ll also simplify the query when possible
(the names of the people who bought gadgets from Dick)
(the names of the people who bought gadgets from Dick)
17
Translation to SQL
Blue type = actual SQL Black italics = description of subquery
Note: the subquery above consists of purchase records, except with the info describing the buyers attached In the results, the column header for name will be 'buyer'
SELECT DISTINCT name buyer FROM
(the info, along with buyer names, for purchases of gadgets sold by Dick)
SELECT DISTINCT name buyer FROM
(the info, along with buyer names, for purchases of gadgets sold by Dick)
18
Translation to SQL
Note: the subquery in this version is being given the name P2 We’re pairing our rows from Person with rows from P2
SELECT DISTINCT name buyer FROM
(SELECT *FROM Person, (the purchases of gadgets from Dick) P2
WHERE Person.ssn = P2.buyer-ssn)
SELECT DISTINCT name buyer FROM
(SELECT *FROM Person, (the purchases of gadgets from Dick) P2
WHERE Person.ssn = P2.buyer-ssn)
19
Translation to SQL
We simplified by combining the two SELECTs
SELECT DISTINCT name buyer
FROM Person, (the purchases of gadgets from Dick) P2
WHERE Person.ssn = P2.buyer-ssn
SELECT DISTINCT name buyer
FROM Person, (the purchases of gadgets from Dick) P2
WHERE Person.ssn = P2.buyer-ssn
Guifeng Zheng, DBMS, SS/SYSU 20
Translation to SQL
P2 is still the name of the subquery It’s just been filled in with a query that contains two
subqueries Outer parentheses are bolded for clarity
SELECT DISTINCT name buyer
FROM Person, (SELECT * FROM Purchases WHERE seller-ssn = (Dick’s ssn)
AND pid = (the id of gadget)) P2
WHERE Person.ssn = P2.buyer-ssn
SELECT DISTINCT name buyer
FROM Person, (SELECT * FROM Purchases WHERE seller-ssn = (Dick’s ssn)
AND pid = (the id of gadget)) P2
WHERE Person.ssn = P2.buyer-ssn
21
Translation to SQL
Now the subquery to find Dick’s ssn is filled in
SELECT DISTINCT name buyer
FROM Person, (SELECT * FROM Purchases WHERE seller-ssn = (SELECT ssn FROM Person WHERE name='Dick') AND pid = (the id of gadget)) P2
WHERE Person.ssn = P2.buyer-ssn
SELECT DISTINCT name buyer
FROM Person, (SELECT * FROM Purchases WHERE seller-ssn = (SELECT ssn FROM Person WHERE name='Dick') AND pid = (the id of gadget)) P2
WHERE Person.ssn = P2.buyer-ssn
22
Translation to SQL
And now the subquery to find Gadget’s product id is filled in, too Note: the SQL simplified by using subqueries
Not used in relational algebra
SELECT DISTINCT name buyer
FROM Person, (SELECT * FROM Purchases WHERE seller-ssn = (SELECT ssn FROM Person WHERE name='Dick') AND pid = (SELECT pid FROM Product WHERE name='Gadget')) P2WHERE Person.ssn = P2.buyer-ssn
SELECT DISTINCT name buyer
FROM Person, (SELECT * FROM Purchases WHERE seller-ssn = (SELECT ssn FROM Person WHERE name='Dick') AND pid = (SELECT pid FROM Product WHERE name='Gadget')) P2WHERE Person.ssn = P2.buyer-ssn
23
George’s-neighbors with subqueries People(ssn, name, street, city, state, state) Q: Who lives on George’s street?
A: First, find George: name='George'(People)
And get George’s street/city/state: street(name='George'(People))
Look up people on that street…
24
Next: ALL op
Employees(name, job, divid, salary)Find which employees are paid more than all the programmers
SELECT nameFROM EmployeesWHERE salary > ALL (SELECT salary FROM Employees WHERE job='programmer')
SELECT nameFROM EmployeesWHERE salary > ALL (SELECT salary FROM Employees WHERE job='programmer')
Guifeng Zheng, DBMS, SS/SYSU 25
ANY/SOME op
Employees(name, job, divid, salary)Find which employees are paid more than at least one vice president
SELECT nameFROM EmployeesWHERE salary > ANY (SELECT salary FROM Employees WHERE job='VP')
SELECT nameFROM EmployeesWHERE salary > ANY (SELECT salary FROM Employees WHERE job='VP')
Guifeng Zheng, DBMS, SS/SYSU 26
ANY/SOME op
Employees(name, job, divid, salary)Find which employees are paid more than at least one vice president
SELECT nameFROM EmployeesWHERE salary > SOME (SELECT salary FROM Employees WHERE job='VP')
SELECT nameFROM EmployeesWHERE salary > SOME (SELECT salary FROM Employees WHERE job='VP')
27
Existential/Universal ConditionsEmployees(name, job, divid, salary)
Division(name, id, head)
Find all divisions with an employee whose salary is > 100000
Existential: easy!
SELECT DISTINCT Division.nameFROM Employees, DivisionWHERE salary > 100000 AND divid=id
SELECT DISTINCT Division.nameFROM Employees, DivisionWHERE salary > 100000 AND divid=id
28
Existential/Universal ConditionsEmployees(name, job, divid, salary)
Division(name, id, head)
Find all divisions in which everyone makes > 100000
Existential: easy!
29
Existential/universal with IN
2. Select the divisions we didn’t find:
1. Find the other divisions: in which someone makes <= 100000:
SELECT nameFROM DivisionWHERE id IN (SELECT divid FROM Employees WHERE salary <= 100000
SELECT nameFROM DivisionWHERE id IN (SELECT divid FROM Employees WHERE salary <= 100000
SELECT nameFROM DivisionWHERE id NOT IN (SELECT divid FROM Employees WHERE salary <= 100000
SELECT nameFROM DivisionWHERE id NOT IN (SELECT divid FROM Employees WHERE salary <= 100000
30
Next: correlated subqueries Acc(name,bal,type…) Q: Who has the largest balance?
Can we do this with subqueries?
31
Acc(name,bal,type,…) Q: Find holder of largest account
SELECT nameFROM AccWHERE bal >= ALL (SELECT bal FROM Acc)
SELECT nameFROM AccWHERE bal >= ALL (SELECT bal FROM Acc)
Correlated Queries
32
Correlated Queries So far, subquery executed once;
result used for higher query More complicated: correlated queries
“[T]he subquery… [is] evaluated many times, once for each assignment of a value to some term in the subquery that comes from a tuple variable outside the subquery” (Ullman, p286).
Q: What does this mean? A: That subqueries refer to vars from outer queries
33
Acc(name,bal,type,…) Q2: Find holder of largest account of each type
SELECT name, typeFROM AccWHERE bal >= ALL (SELECT bal FROM Acc WHERE type=type)
SELECT name, typeFROM AccWHERE bal >= ALL (SELECT bal FROM Acc WHERE type=type)
Correlated Queries
correlation
34
Acc(name,bal,type,…) Q2: Find holder of largest account of each type
Note:1. scope of variables
SELECT name, typeFROM Acc as a1WHERE bal >= ALL (SELECT bal FROM Acc WHERE type=a1.type)
SELECT name, typeFROM Acc as a1WHERE bal >= ALL (SELECT bal FROM Acc WHERE type=a1.type)
Correlated Queries
correlation
35
New topic: R.A./SQL Set Operators Relations are sets have set-theoretic ops
Venn diagrams
Union: R1 R2 Example:
ActiveEmployees RetiredEmployees
Difference: R1 – R2 Example:
AllEmployees – RetiredEmployees = ActiveEmployees
Intersection: R1 R2 Example:
RetiredEmployees UnionizedEmployees
36
Set operations - exampleName Address Gender Birthdate
Fisher 123 Maple F 9/9/99
Hamill 456 Oak M 8/8/88
Name Address Gender Birthdate
Fisher 123 Maple F 9/9/99
Ford 345 Palm M 7/7/77
R:
S:
Name Address Gender Birthdate
Fisher 123 Maple F 9/9/99
Hamill 456 Oak M 8/8/88
Ford 345 Palm M 7/7/77
R S:
37
Set operations - exampleName Address Gender Birthdate
Fisher 123 Maple F 9/9/99
Hamill 456 Oak M 8/8/88
Name Address Gender Birthdate
Fisher 123 Maple F 9/9/99
Ford 345 Palm M 7/7/77
R:
S:
R S: Name Address Gender Birthdate
Fisher 123 Maple F 9/9/99
38
Set operations - exampleName Address Gender Birthdate
Fisher 123 Maple F 9/9/99
Hamill 456 Oak M 8/8/88
Name Address Gender Birthdate
Fisher 123 Maple F 9/9/99
Ford 345 Palm M 7/7/77
R:
S:
R - S: Name Address Gender Birthdate
Hamill 456 Oak M 8/8/88
39
Set ops in SQL UNION, INTERSECT, EXCEPT Oracle SQL uses MINUS rather than
EXCEPT These ops applied to queries:
(SELECT name FROM Person WHERE City = 'New York')
INTERSECT(SELECT custname FROM Purchase WHERE store='Kim''s')
(SELECT name FROM Person WHERE City = 'New York')
INTERSECT(SELECT custname FROM Purchase WHERE store='Kim''s')
40
Boat examples Reserve(ssn,bmodel,color)
Q: Find ssns of sailors who reserved red boats or green boats
SELECT DISTINCT ssn
FROM reserve
WHERE color = 'red' OR color = 'green'
SELECT DISTINCT ssn
FROM reserve
WHERE color = 'red' OR color = 'green'
41
Boat examples Reserve(ssn,bmodel,color)
Q: Find ssns of sailors who reserved red boats and green boats
SELECT DISTINCT ssn
FROM reserve
WHERE color = 'red' AND color = 'green'
SELECT DISTINCT ssn
FROM reserve
WHERE color = 'red' AND color = 'green'
Guifeng Zheng, DBMS, SS/SYSU 42
Boat examples Reserve(ssn,bmodel,color)
Q: Find ssns of sailors who reserved red boats and green boats
SELECT DISTINCT r1.ssn
FROM reserve r1, reserve r2
WHERE r1.ssn = r2.ssn AND r1.color = 'red' AND r2.color = 'green'
SELECT DISTINCT r1.ssn
FROM reserve r1, reserve r2
WHERE r1.ssn = r2.ssn AND r1.color = 'red' AND r2.color = 'green'
43
Boat examples Reserve(ssn,bmodel,color)
Q: Find ssns of sailors who reserved red boats and green boats
(SELECT DISTINCT ssn
FROM reserve
WHERE color = 'red') INTERSECT(SELECT DISTINCT ssn
FROM reserve
WHERE color = 'green')
(SELECT DISTINCT ssn
FROM reserve
WHERE color = 'red') INTERSECT(SELECT DISTINCT ssn
FROM reserve
WHERE color = 'green')
44
Boat examples Reserve(ssn,bmodel,color)
Q: Find ssns of sailors who reserved red boats or green boats
(SELECT DISTINCT ssn
FROM reserve
WHERE color = 'red') UNION (SELECT DISTINCT ssn
FROM reserve
WHERE color = 'green')
(SELECT DISTINCT ssn
FROM reserve
WHERE color = 'red') UNION (SELECT DISTINCT ssn
FROM reserve
WHERE color = 'green')
45
Boat examples Reserve(ssn,bmodel,color)
Q: Find ssns of sailors who reserved red boats but not green boats
(SELECT DISTINCT ssn
FROM reserve
WHERE color = 'red') EXCEPT (SELECT DISTINCT ssn
FROM reserve
WHERE color = 'green')
(SELECT DISTINCT ssn
FROM reserve
WHERE color = 'red') EXCEPT (SELECT DISTINCT ssn
FROM reserve
WHERE color = 'green')
46
(SELECT name, address FROM Cust1)
UNION(SELECT name FROM Cust2)
(SELECT name, address FROM Cust1)
UNION(SELECT name FROM Cust2)
Union-Compatibility Situation: Cust1(name,address,…), Cust2(name,…) Want: report of all customer names and addresses
(if known) Can’t do:
Both tables must have same sequence of types Applies to all set ops
47
Union-Compatibility Situation: Cust1(name,address,…), Cust2(name,…) Want: report of all customer names and addresses
(if known) But can do:
Resulting field names taken from first table
(SELECT name, address FROM Cust1)
UNION(SELECT name, '(N/A)' FROM Cust2)
(SELECT name, address FROM Cust1)
UNION(SELECT name, '(N/A)' FROM Cust2)
Result(name, address)Result(name, address)
48
New topic: Nulls in SQL If we don’t have a value, can put a NULL
Null can mean several things: Value does not exists Value exists but is unknown Value not applicable
But null is not the same as 0
49
Null Values x = NULL 4*(3-x)/7 = NULL x = NULL x + 3 – x = NULL x = NULL 3 + (x-x) = NULL x = NULL x = 'Joe' is UNKNOWN
In general: no row using null fields appear in the selection test will pass the test With one exception
Pace Boole, SQL has three boolean values: FALSE = 0 TRUE = 1 UNKNOWN = 0.5
50
Null values in boolean expressions C1 AND C2 = min(C1, C2) C1 OR C2 = max(C1, C2) NOT C1 = 1 – C1
height > 6 = UNKNOWN UNKNOWN OR weight > 190 = UNKOWN (age < 25) AND UNKNOWN = UNKNOWN
E.g.age=20height=NULLweight=180
SELECT *FROM PersonWHERE (age < 25) AND (height > 6 OR weight > 190)
SELECT *FROM PersonWHERE (age < 25) AND (height > 6 OR weight > 190)
51
Comparing null and non-nulls The schema specifies whether null is allowed for
each attribute NOT NULL to forbid Nulls are allowed by default
Unexpected behavior:
Some Persons are not included! The “trichotomy law” does not hold!
SELECT *FROM PersonWHERE age < 25 OR age >= 25
SELECT *FROM PersonWHERE age < 25 OR age >= 25
52
Testing for null values Can test for NULL explicitly:
x IS NULL x IS NOT NULL
But: x = NULL is never true
Now it includes all Persons
SELECT *FROM PersonWHERE age < 25 OR age >= 25 OR age IS NULL
SELECT *FROM PersonWHERE age < 25 OR age >= 25 OR age IS NULL
53
Null/logic review TRUE AND UNKNOWN = ?
TRUE OR UNKNOWN = ?
UNKNOWN OR UNKNOWN = ?
X = NULL = ?
54
Next: Outer join Like inner join except that dangling tuples are
included, padded with nulls
Left outerjoin: dangling tuples from left are included Nulls appear “on the right”
Right outerjoin: dangling tuples from right are included Nulls appear “on the left”
55
Cross join - example
Name Address Gender Birthdate
Hanks 123 Palm Rd M 01/01/60
Taylor 456 Maple Av F 02/02/40
Lucas 789 Oak St M 03/03/55
Name Address Networth
Spielberg 246 Palm Rd 10M
Taylor 456 Maple Av 20M
Lucas 789 Oak St 30M
MovieStar
MovieExec
56
Name Address G. Birthdate Name Address Net
Hanks 123 Palm Rd M 01/01/60
Taylor 456 Maple Av F 02/02/40 Taylor 456 Maple Av 20M
Lucas 789 Oak St M 03/03/55 Lucas 789 Oak St 30M
Spielberg 246 Palm Rd 10M
Guifeng Zheng, DBMS, SS/SYSU 57
Outer Join - ExampleSELECT * FROM MovieStar LEFT OUTER JOIN MovieExec ON MovieStart.name=MovieExec.name
SELECT * FROM MovieStar LEFT OUTER JOIN MovieExec ON MovieStart.name=MovieExec.name
SELECT * FROM MovieStar RIGHT OUTER JOIN MovieExec ON MovieStart.name=MovieExec.name
SELECT * FROM MovieStar RIGHT OUTER JOIN MovieExec ON MovieStart.name=MovieExec.name
Name Address G. Birthdate Name Address Net
Hanks 123 Palm Rd M 01/01/60 Null Null Null
Taylor 456 Maple Av F 02/02/40 Taylor 456 Maple Av 20M
Lucas 789 Oak St M 03/03/55 Lucas 789 Oak St 30M
Null Null Null Null Spielberg 246 Palm Rd 10M
Name Address G. Birthdate Name Address Net
Hanks 123 Palm Rd M 01/01/60 Null Null Null
Taylor 456 Maple Av F 02/02/40 Taylor 456 Maple Av 20M
Lucas 789 Oak St M 03/03/55 Lucas 789 Oak St 30M
Null Null Null Null Spielberg 246 Palm Rd 10M
58
Outer Join - Example
Name Address Gender Birthdate
Hanks 123 Palm Rd M 01/01/60
Taylor 456 Maple Av F 02/02/40
Lucas 789 Oak St M 03/03/55
Name Address Networth
Spielberg 246 Palm Rd 10M
Taylor 456 Maple Av 20M
Lucas 789 Oak St 30M
MovieStar MovieExec
SELECT * FROM MovieStar FULL OUTER JOIN MovieExec ON MovieStart.name=MovieExec.name
SELECT * FROM MovieStar FULL OUTER JOIN MovieExec ON MovieStart.name=MovieExec.name
Name Address G. Birthdate Name Address Net
Hanks 123 Palm Rd M 01/01/60 Null Null Null
Taylor 456 Maple Av F 02/02/40 Taylor 456 Maple Av 20M
Lucas 789 Oak St M 03/03/55 Lucas 789 Oak St 30M
Null Null Null Null Spielberg 246 Palm Rd 10M
59
New-style outer joins Outer joins may be left, right, or full
FROM A LEFT [OUTER] JOIN B; FROM A RIGHT [OUTER] JOIN B; FROM A FULL [OUTER] JOIN B;
OUTER is optional If OUTER is included, then FULL is the default
Q: How to remember left v. right? A: It indicates the side whose rows are always
included
60
Next: Grouping & Aggregation In SQL:
aggregation operators in SELECT, Grouping in GROUP BY clause
Recall aggregation operators: sum, avg, min, max, count
strings, numbers, dates Each applies to scalars Count also applies to row: count(*) Can DISTINCT inside aggregation op: count(DISTINCT x)
Grouping: group rows that agree on single value Each group becomes one row in result
61
Aggregation functions Numerical: SUM, AVG, MIN, MAX Char: MIN, MAX
In lexocographic/alphabetic order Any attribute: COUNT
Number of values
SUM(B) = 10 AVG(A) = 1.5 MIN(A) = 1 MAX(A) = 3 COUNT(A) = 4
A B
1 2
3 4
1 2
1 2
62
Straight aggregation In R.A. sum(x)total(R) In SQL:
Just put the aggregation op in SELECT NB: aggreg. ops applied to each non-null val
count(x) counts the number of nun-null vals in field x Use count(*) to count the number of rows
SELECT SUM(x) totalFROM R
SELECT SUM(x) totalFROM R
63
Straight aggregation example COUNT applies to duplicates, unless otherwise stated:
Better:
Can we say:
same as Count(*), except excludes nulls
SELECT Count(category)FROM ProductWHERE year > 1995
SELECT Count(category)FROM ProductWHERE year > 1995
SELECT COUNT(DISTINCT category)FROM ProductWHERE year > 1995
SELECT COUNT(DISTINCT category)FROM ProductWHERE year > 1995
SELECT category, COUNT(category)FROM ProductWHERE year > 1995
SELECT category, COUNT(category)FROM ProductWHERE year > 1995
64
Straight aggregation example Purchase(product, date, price, quantity)
Q: Find total sales for the entire database:
Q: Find total sales of bagels:
SELECT SUM(price * quantity)FROM Purchase
SELECT SUM(price * quantity)FROM Purchase
SELECT SUM(price * quantity)FROM PurchaseWHERE product = 'bagel'
SELECT SUM(price * quantity)FROM PurchaseWHERE product = 'bagel'
65
Largest balance again Acc(name,bal,type) Q: Who has the largest balance? Q: Who has the largest balance of each
type?
Can we do these with aggregation functions?
66
Straight grouping Group rows together by field values Produces one row for each group
I.e., by each (combin. of) grouped val(s) Don’t select non-grouped fields
Reduces to DISTINCT selections:
SELECT productFROM PurchaseGROUP BY product
SELECT productFROM PurchaseGROUP BY product
SELECT DISTINCT productFROM Purchase
SELECT DISTINCT productFROM Purchase
67
Grouping & aggregation Sometimes want to group and compute
aggregations by group Aggregation op applied to rows in group, not to all rows in table
Purchase(product, date, price, quantity) Find total sales for products that sold for > 0.50:
SELECT product, SUM(price*quantity) totalFROM PurchaseWHERE price > .50GROUP BY product
SELECT product, SUM(price*quantity) totalFROM PurchaseWHERE price > .50GROUP BY product
68
Illustrated G&A example
Product Date Price Quantity
Bagel 10/21 0.85 15
Banana 10/22 0.52 7
Banana 10/19 0.52 17
Bagel 10/20 0.85 20
Purchase
69
Product Date Price Quantity
Banana 10/19 0.52 17
Banana 10/22 0.52 7
Bagel 10/20 0.85 20
Bagel 10/21 0.85 15
First compute the FROM-WHERE Then GROUP BY product:
Illustrated G&A example
70
Product TotalSales
Bagel $29.75
Banana $12.48
Finally, aggregate and select:
Illustrated G&A example
SELECT product, SUM(price*quantity) totalFROM PurchaseWHERW price > .50GROUP BY product
SELECT product, SUM(price*quantity) totalFROM PurchaseWHERW price > .50GROUP BY product
71
Illustrated G&A example GROUP BY may be reduced to (a possibly more
complicated) subquery:
SELECT product, SUM(price*quantity) totalFROM PurchaseWHERE price > .50GROUP BY product
SELECT product, SUM(price*quantity) totalFROM PurchaseWHERE price > .50GROUP BY product
SELECT DISTINCT x.product, (SELECT SUM(y.price*y.quantity) FROM Purchase y WHERE x.product = y.product AND y.price > .50) totalFROM Purchase xWHERE x.price > .50
SELECT DISTINCT x.product, (SELECT SUM(y.price*y.quantity) FROM Purchase y WHERE x.product = y.product AND y.price > .50) totalFROM Purchase xWHERE x.price > .50
72
For every product, what is the total sales and max quantity sold?
Product SumSales MaxQuantity
Banana $12.48 17
Bagel $29.75 20
Multiple aggregations
SELECT product, SUM(price * quantity) SumSales, MAX(quantity) MaxQuantityFROM PurchaseWHERE price > .50GROUP BY product
SELECT product, SUM(price * quantity) SumSales, MAX(quantity) MaxQuantityFROM PurchaseWHERE price > .50GROUP BY product
73
Another grouping/aggregation e.g. Movie(title, year, length, studioName)
Q: How many total minutes of film have been produced by each studio?
Strategy: Divide movies into groups per studio, then add lengths per group
74
Another grouping/aggregation e.g.
Title Year Length Studio
Star Wars 1977 120 Fox
Jedi 1980 105 Fox
Aviator 2004 800 Miramax
Pulp Fiction 1995 110 Miramax
Lost in Translation
2003 95 Universal
SELECT studio, sum(length) totalLengthFROM MoviesGROUP BY studio
SELECT studio, sum(length) totalLengthFROM MoviesGROUP BY studio
75
Another grouping/aggregation e.g.
Title Year Length Studio
Star Wars 1977 120 Fox
Jedi 1980 105 Fox
Aviator 2004 800 Miramax
Pulp Fiction 1995 110 Miramax
Lost in Translation
2003 95 Universal
SELECT studio, sum(length) lengthFROM MoviesGROUP BY studio
SELECT studio, sum(length) lengthFROM MoviesGROUP BY studio
76
Another grouping/aggregation e.g.
Title Year Length Studio
Star Wars 1977 120 Fox
Jedi 1980 105 Fox
Aviator 2004 800 Miramax
Pulp Fiction 1995 110 Miramax
Lost in Translation
2003 95 Universal
Studio Length
Fox 225
Miramax 910
Universal 95
SELECT studio, sum(length) totalLengthFROM MoviesGROUP BY studio
SELECT studio, sum(length) totalLengthFROM MoviesGROUP BY studio
77
Grouping/aggregation example StarsIn(SName,Title,Year) Q: Find the year of each star’s first movie
Q: Find the span of each star’s career Look up first and last movies
SELECT sname, min(year) firstyearFROM StarsInGROUP BY sname
SELECT sname, min(year) firstyearFROM StarsInGROUP BY sname
78
Account types again Acc(name,bal,type) Q: Who has the largest balance of each
type?
Can we do this with grouping/aggregation?
79
G & A for constructed relations Movie(title,year,producerSsn,length) MovieExec(name,ssn,netWorth)
Can do the same thing for larger, non-atomic relations Q: How many mins. of film did each producer make?
What happens to non-producer movie-execs?
SELECT name, sum(length) totalFROM Movie, MovieExecWHERE producerSsn = ssnGROUP BY name
SELECT name, sum(length) totalFROM Movie, MovieExecWHERE producerSsn = ssnGROUP BY name
80
HAVING clauses Sometimes want to limit which rows may be grouped Q: How many mins. of film did each rich producer
make? Rich = netWorth > 10000000
Q: Is HAVING necessary here? A: No, could just add rich req. to WHERE
SELECT name, sum(length) totalFROM Movie, MovieExecWHERE producerSsn = ssnGROUP BY nameHAVING netWorth > 10000000
SELECT name, sum(length) totalFROM Movie, MovieExecWHERE producerSsn = ssnGROUP BY nameHAVING netWorth > 10000000
81
HAVING clauses Sometimes want to limit which rows may be
grouped Q: How many mins. of film did each rich producer
make? Old = made movies before 1930
Q: Is HAVING necessary here?
SELECT name, sum(length) totalFROM Movie, MovieExecWHERE producerSsn = ssnGROUP BY nameHAVING min(year) < 1930
SELECT name, sum(length) totalFROM Movie, MovieExecWHERE producerSsn = ssnGROUP BY nameHAVING min(year) < 1930