sql – aggregatesup3f/cs4750/slides/4750lec10-sql... · 2020-06-03 · select s_id, course from...

27
Spring 2020 – University of Virginia 1 © Praphamontripong © Praphamontripong SQL – Aggregates CS 4750 Database Systems [A. Silberschatz, H. F. Korth, S. Sudarshan, Database System Concepts, Ch.3.7 and Ch. 5.5] [C.M. Ricardo, S.D. Urban, Databases Illuminated, Ch. 5.4]

Upload: others

Post on 11-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 1© Praphamontripong© Praphamontripong

SQL – Aggregates

CS 4750Database Systems

[A. Silberschatz, H. F. Korth, S. Sudarshan, Database System Concepts, Ch.3.7 and Ch. 5.5][C.M. Ricardo, S.D. Urban, Databases Illuminated, Ch. 5.4]

Page 2: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 2© Praphamontripong

SELECT S_id, Course FROM Student_lecture AS SWHERE S.Teaching_assistant = “Minnie” ;

Student_lectureS_id Address Course Teaching_assistant

1234 57 Hockanum Blvd Database Systems Minnie2345 1400 E. Bellows Database Systems Humpty3456 900 S. Detroit Cloud Computing Dumpty1234 57 Hockanum Blvd Web Programming Lang. Mickey5678 2131 Forest Lake Ln. Software Analysis Minnie

Recap 1: SELECT .. FROM .. WHERE

For each row in S:if (row.Teaching_assistant = ”Minnie”:

output (row.S_id, row.Course)

SELECT output selected

attributesFROM

Open an iterator

WHEREFilter each row

Page 3: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 3© Praphamontripong

SELECT S_id FROM Student_lecture AS SWHERE S.Course = “Database Systems” AND

S.Teaching_assistant = “Humpty” ;

Recap 2: SELECT .. FROM .. WHERE

Find all S_id who is taking Database Systems and have Humpty as a Teaching assistant

Student_lectureS_id Address Course Teaching_assistant

1234 57 Hockanum Blvd Database Systems Minnie2345 1400 E. Bellows Database Systems Humpty3456 900 S. Detroit Cloud Computing Dumpty1234 57 Hockanum Blvd Web Programming Lang. Mickey5678 2131 Forest Lake Ln. Software Analysis Minnie

S_id2345

Page 4: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 4© Praphamontripong

SELECT Name, two_week_hours/2 AS Hours_per_weekFROM hiring

Recap 3: SELECT .. FROM .. WHERE

List all names of TAs and the number of hours the TAs work per week, rename the hours as “Hours_per_week”

hiringTA_id Name Year Two_week_hours1234 Minnie 4 202345 Humpty 3 243456 Dumpty 4 303333 Minnie 3 125678 Mickey 2 16

Name Hours_per_weekMinnie 10

Humpty 12Dumpty 15Minnie 6Mickey 8

Page 5: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 5© Praphamontripong

SELECT YearFROM hiring

Recap 4: SELECT .. FROM .. WHERE

List all years

hiringTA_id Name Year Two_week_hours1234 Minnie 4 202345 Humpty 3 243456 Dumpty 4 303333 Minnie 3 125678 Mickey 2 16

Year43432

DuplicatesMay occur in output

of an operator

Page 6: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 6© Praphamontripong

SELECT DISTINCT YearFROM hiring

Recap 5: SELECT .. FROM .. WHERE

List all years the TAs are in. If multiple TAs are in the same year, list the year only once

hiringTA_id Name Year Two_week_hours1234 Minnie 4 202345 Humpty 3 243456 Dumpty 4 303333 Minnie 3 125678 Mickey 2 16

Year432

Page 7: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 7© Praphamontripong

SELECT Name, two_week_hours/2 AS Hours_per_weekFROM hiringORDER BY Name, Hours_per_week

Recap 6: SELECT .. FROM .. WHERE

List all names of TAs and the number of hours the TAs work per week, rename the hours as “Hours_per_week”, then order the result set by names and then Hours_per_week

hiringTA_id Name Year Two_week_hours1234 Minnie 4 202345 Humpty 3 243456 Dumpty 4 303333 Minnie 3 125678 Mickey 2 16

Name Hours_per_weekDumpty 15Humpty 12Mickey 8Minnie 6Minnie 10

Page 8: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 8© Praphamontripong

Aggregation FunctionsCalculate a value across an entire set or across groups of rows within the set

SQL uses five aggregation operators:

• SUM – produces the sum of a column with numerical values

• AVG – produces the average of a column with numerical values

• MIN – applied to a column with numerical values, produces the smallest value

• MAX – applied to a column with numerical values, produces the largest value

• COUNT – produces the number of (not necessarily distinct) values in a column

Page 9: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 9© Praphamontripong

Example: SUMGiven the loan schema

loan(loan_number, branch_name, amount)

The sum of the amounts of all loans is expressed by

loan_number branch_name amountL-11 Round Hill 900L-14 Downtown 1500L-15 Perryridge 1500L-16 Perryridge 1300L-17 Downtown 1000L-23 Redwood 2000L-93 Mianus 500

SUM(amount)8700

Total8700

SELECT SUM(amount)FROM loan

SELECT SUM(amount) AS TotalFROM loan

Page 10: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 10© Praphamontripong

Example: AVGGiven the loan schema

loan(loan_number, branch_name, amount)

The average of the amounts of all loans is expressed by

loan_number branch_name amountL-11 Round Hill 900L-14 Downtown 1500L-15 Perryridge 1500L-16 Perryridge 1300L-17 Downtown 1000L-23 Redwood 2000L-93 Mianus 500

AVG(amount)1242.857142857143

SELECT AVG(amount)FROM loan

Page 11: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 11© Praphamontripong

Example: MINGiven the loan schema

loan(loan_number, branch_name, amount)

The smallest amount of loans is expressed by

loan_number branch_name amountL-11 Round Hill 900L-14 Downtown 1500L-15 Perryridge 1500L-16 Perryridge 1300L-17 Downtown 1000L-23 Redwood 2000L-93 Mianus 500

min(amount)500

SELECT MIN(amount)FROM loan

Page 12: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 12© Praphamontripong

Example: MAXGiven the loan schema

loan(loan_number, branch_name, amount)

The largest amount of loans is expressed by

loan_number branch_name amountL-11 Round Hill 900L-14 Downtown 1500L-15 Perryridge 1500L-16 Perryridge 1300L-17 Downtown 1000L-23 Redwood 2000L-93 Mianus 500

max(amount)2000

SELECT MAX(amount)FROM loan

Page 13: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 13© Praphamontripong

Example: COUNTGiven the loan schema

loan(loan_number, branch_name, amount)

Count the number of tuples in the loan table

loan_number branch_name amountL-11 Round Hill 900L-14 Downtown 1500L-15 Perryridge 1500L-16 Perryridge 1300L-17 Downtown 1000L-23 Redwood 2000L-93 Mianus 500

count(*)7

Count(loan_number)7

SELECT count(*)FROM loan

SELECT count(loan_number)FROM loan

Page 14: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 14© Praphamontripong

Example: COUNT .. DISTINCTGiven the loan schema

loan(loan_number, branch_name, amount)

Count the number of values in the branch_name column

loan_number branch_name amountL-11 Round Hill 900L-14 Downtown 1500L-15 Perryridge 1500L-16 Perryridge 1300L-17 Downtown 1000L-23 Redwood 2000L-93 Mianus 500

count(branch_name)7

count(distinct branch_name)5

SELECT count(branch_name)FROM loan

SELECT count(DISTINCT branch_name)FROM loan

Page 15: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 15© Praphamontripong

Aggregation: Order of Actions

1. The FROM clause generates the data set

2. The WHERE clause filters the data set generated by the FROM clause

3. The GROUP BY clause aggregates the data set that was filtered by the WHERE clause (note: GROUP BY does not sort the result set)

4. The HAVING clause filters the data set that was aggregated by the GROUP BY clause

5. The SELECT clause transforms the filtered aggregated data set

6. The ORDER BY clause sorts the transformed data set

SELECT select_listFROM table_source[WHERE search_condition][GROUP BY group_by_expression][HAVING search_condition][ORDER BY order_expression [ASC | DESC] ]

Order matters

Page 16: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 16© Praphamontripong

Grouping RequirementSeveral DBMS requires that the columns appear in the SELECT clause that are not used in an aggregation function must appear in the GROUP BY clause

SELECT column_A, column_B, some_aggregation_functionFROM table_sourceGROUP BY column_A, column_B

SELECT column_A, column_B, some_aggregation_functionFROM table_sourceGROUP BY column_B

Page 17: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 17© Praphamontripong

Example: SUM with GROUP BYGiven the loan schema

loan(loan_number, branch_name, amount)

The sum of the amounts of all loans for each branch is expressed by

loan_number branch_name amountL-11 Round Hill 900L-14 Downtown 1500L-15 Perryridge 1500L-16 Perryridge 1300L-17 Downtown 1000L-23 Redwood 2000L-93 Mianus 500

branch_name SUM(amount)Downtown 2500

Mianus 500Perryridge 2800Redwood 2000Round Hill 900

SELECT branch_name, SUM(amount)FROM loanGROUP BY branch_name;

Page 18: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 18© Praphamontripong

Let’s Try (1)• Open your MySQL terminal (use phpMyAdmin or GCP shell)

• Use the bank database from in-class4 • Or run the script to set up your database

http://www.cs.virginia.edu/~up3f/cs4750/inclass/inclass04/alldbs.sql

• Consider table: account

• Write SQL statement to solve the following problem

Find the total amount each branch has in accounts

SELECT branch_name, SUM(balance) AS Total FROM accountGROUP BY branch_name;

Page 19: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 19© Praphamontripong

Let’s Try (2)• Open your MySQL terminal (use phpMyAdmin or GCP shell)

• Use the bank database from in-class4 • Or run the script to set up your database

http://www.cs.virginia.edu/~up3f/cs4750/inclass/inclass04/alldbs.sql

• Consider table: account

• Write SQL statement to solve the following problem

Find the average amount each branch has in accounts

SELECT branch_name, AVG(balance) AS average FROM accountGROUP BY branch_name;

Page 20: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 20© Praphamontripong

Grouping, Aggregation, and Null• The value NULL is ignored in any aggregation

• Not contribute to a sum, average, or count of an attribute• Cannot be the minimum or maximum in its column

• Null is treated as an ordinary value when forming groups

• Can have a group with NULL attribute(s)

• When performing any aggregation except count over an empty bag of values, the result is NULL

• The count of an empty bag is 0

Page 21: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 21© Praphamontripong

HAVING Clauses• An aggregation in a HAVING clause applies only to the

tuples of the group being tested – filter groups

• Any attributes of relations in the FROM clause may be aggregated in the HAVING clause

• But only those attributes that are in the GROUP BY list may appear unaggregated in the having clause

branch_name SUM(amount)Downtown 2500Perryridge 2800

The sum of the amounts of all loans for each branch that has more than one loan is expressed by

SELECT branch_name, sum(amount) FROM loanGROUP BY branch_nameHAVING COUNT(branch_name) > 1;

Page 22: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 22© Praphamontripong

Example 1List the number of customers in each country. Only include countries with more than 10 customers

count(id) Country11 France11 Germany13 USA

SELECT count(id), countryFROM CustomerGROUP BY countryHAVING COUNT(id) > 10;

Page 23: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 23© Praphamontripong

Example 2List the number of customers in each country, except USA, sorted high to low. Only include countries with more than 9 or more customers

count(id) Country11 France11 Germany9 Brazil

SELECT COUNT(id), countryFROM CustomerWHERE counter <> “USA”GROUP BY countryHAVING COUNT(id) >= 9ORDER BY COUNT(id) DESC;

Page 24: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 24© Praphamontripong

Final Notes about Aggregation#1: Keep the GROUP BY clause small and precise

• Several DBMSs require that all non-aggregated columns must be in the GROUP BY clause

• Excessive columns in GROUP BY can negatively impact the query’s performance; make the query hard to read, understand, rewrite

• For queries that need both aggregations and details, do all aggregations in subqueries first, then join those to the tables to retrieve the details

Page 25: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 25© Praphamontripong

Final Notes about Aggregation#2: COUNT(*) and COUNT(<column_name>) are different

• COUNT(*) – count all rows, including ones with null values

• COUNT(<column_name>) – count only the rows where the column value is not NULL

• Sometimes, dividing a query into subqueies can be more efficient than using a GROUP BY (more about subqueries next week)

Page 26: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 26© Praphamontripong

Final Notes about Aggregation#3: Use DISTINCT to get distinct counts

• COUNT(*) – returns the number of rows in a group, including NULL value and duplicates

• COUNT(<column_name>) – returns the number of rows where the column value is not NULL

• COUNT(DISTINCT <column_name>) – returns the number of rows with unique, non-null values of the column

Page 27: SQL – Aggregatesup3f/cs4750/slides/4750Lec10-SQL... · 2020-06-03 · SELECT S_id, Course FROM Student_lecture AS S WHERE S.Teaching_assistant = “Minnie” ; Student_lecture S_id

Spring 2020 – University of Virginia 27© Praphamontripong

Wrap-Up• Aggregation functions

• Order of actions matter when applying aggregation

• Aggregation helps make decisions and succinctly convey information

What’s next?

• SQL – Joins

• Combine techniques (aggregates and joins) to solve complex questions