sql – aggregatesup3f/cs4750/slides/4750lec10-sql... · 2020-06-03 · select s_id, course from...
TRANSCRIPT
Spring 2020 – University of Virginia 1© Praphamontripong© Praphamontripong
SQL – Aggregates
CS 4750Database Systems
[A. Silberschatz, H. F. Korth, S. Sudarshan, Database System Concepts, Ch.3.7 and Ch. 5.5][C.M. Ricardo, S.D. Urban, Databases Illuminated, Ch. 5.4]
Spring 2020 – University of Virginia 2© Praphamontripong
SELECT S_id, Course FROM Student_lecture AS SWHERE S.Teaching_assistant = “Minnie” ;
Student_lectureS_id Address Course Teaching_assistant
1234 57 Hockanum Blvd Database Systems Minnie2345 1400 E. Bellows Database Systems Humpty3456 900 S. Detroit Cloud Computing Dumpty1234 57 Hockanum Blvd Web Programming Lang. Mickey5678 2131 Forest Lake Ln. Software Analysis Minnie
Recap 1: SELECT .. FROM .. WHERE
For each row in S:if (row.Teaching_assistant = ”Minnie”:
output (row.S_id, row.Course)
SELECT output selected
attributesFROM
Open an iterator
WHEREFilter each row
Spring 2020 – University of Virginia 3© Praphamontripong
SELECT S_id FROM Student_lecture AS SWHERE S.Course = “Database Systems” AND
S.Teaching_assistant = “Humpty” ;
Recap 2: SELECT .. FROM .. WHERE
Find all S_id who is taking Database Systems and have Humpty as a Teaching assistant
Student_lectureS_id Address Course Teaching_assistant
1234 57 Hockanum Blvd Database Systems Minnie2345 1400 E. Bellows Database Systems Humpty3456 900 S. Detroit Cloud Computing Dumpty1234 57 Hockanum Blvd Web Programming Lang. Mickey5678 2131 Forest Lake Ln. Software Analysis Minnie
S_id2345
Spring 2020 – University of Virginia 4© Praphamontripong
SELECT Name, two_week_hours/2 AS Hours_per_weekFROM hiring
Recap 3: SELECT .. FROM .. WHERE
List all names of TAs and the number of hours the TAs work per week, rename the hours as “Hours_per_week”
hiringTA_id Name Year Two_week_hours1234 Minnie 4 202345 Humpty 3 243456 Dumpty 4 303333 Minnie 3 125678 Mickey 2 16
Name Hours_per_weekMinnie 10
Humpty 12Dumpty 15Minnie 6Mickey 8
Spring 2020 – University of Virginia 5© Praphamontripong
SELECT YearFROM hiring
Recap 4: SELECT .. FROM .. WHERE
List all years
hiringTA_id Name Year Two_week_hours1234 Minnie 4 202345 Humpty 3 243456 Dumpty 4 303333 Minnie 3 125678 Mickey 2 16
Year43432
DuplicatesMay occur in output
of an operator
Spring 2020 – University of Virginia 6© Praphamontripong
SELECT DISTINCT YearFROM hiring
Recap 5: SELECT .. FROM .. WHERE
List all years the TAs are in. If multiple TAs are in the same year, list the year only once
hiringTA_id Name Year Two_week_hours1234 Minnie 4 202345 Humpty 3 243456 Dumpty 4 303333 Minnie 3 125678 Mickey 2 16
Year432
Spring 2020 – University of Virginia 7© Praphamontripong
SELECT Name, two_week_hours/2 AS Hours_per_weekFROM hiringORDER BY Name, Hours_per_week
Recap 6: SELECT .. FROM .. WHERE
List all names of TAs and the number of hours the TAs work per week, rename the hours as “Hours_per_week”, then order the result set by names and then Hours_per_week
hiringTA_id Name Year Two_week_hours1234 Minnie 4 202345 Humpty 3 243456 Dumpty 4 303333 Minnie 3 125678 Mickey 2 16
Name Hours_per_weekDumpty 15Humpty 12Mickey 8Minnie 6Minnie 10
Spring 2020 – University of Virginia 8© Praphamontripong
Aggregation FunctionsCalculate a value across an entire set or across groups of rows within the set
SQL uses five aggregation operators:
• SUM – produces the sum of a column with numerical values
• AVG – produces the average of a column with numerical values
• MIN – applied to a column with numerical values, produces the smallest value
• MAX – applied to a column with numerical values, produces the largest value
• COUNT – produces the number of (not necessarily distinct) values in a column
Spring 2020 – University of Virginia 9© Praphamontripong
Example: SUMGiven the loan schema
loan(loan_number, branch_name, amount)
The sum of the amounts of all loans is expressed by
loan_number branch_name amountL-11 Round Hill 900L-14 Downtown 1500L-15 Perryridge 1500L-16 Perryridge 1300L-17 Downtown 1000L-23 Redwood 2000L-93 Mianus 500
SUM(amount)8700
Total8700
SELECT SUM(amount)FROM loan
SELECT SUM(amount) AS TotalFROM loan
Spring 2020 – University of Virginia 10© Praphamontripong
Example: AVGGiven the loan schema
loan(loan_number, branch_name, amount)
The average of the amounts of all loans is expressed by
loan_number branch_name amountL-11 Round Hill 900L-14 Downtown 1500L-15 Perryridge 1500L-16 Perryridge 1300L-17 Downtown 1000L-23 Redwood 2000L-93 Mianus 500
AVG(amount)1242.857142857143
SELECT AVG(amount)FROM loan
Spring 2020 – University of Virginia 11© Praphamontripong
Example: MINGiven the loan schema
loan(loan_number, branch_name, amount)
The smallest amount of loans is expressed by
loan_number branch_name amountL-11 Round Hill 900L-14 Downtown 1500L-15 Perryridge 1500L-16 Perryridge 1300L-17 Downtown 1000L-23 Redwood 2000L-93 Mianus 500
min(amount)500
SELECT MIN(amount)FROM loan
Spring 2020 – University of Virginia 12© Praphamontripong
Example: MAXGiven the loan schema
loan(loan_number, branch_name, amount)
The largest amount of loans is expressed by
loan_number branch_name amountL-11 Round Hill 900L-14 Downtown 1500L-15 Perryridge 1500L-16 Perryridge 1300L-17 Downtown 1000L-23 Redwood 2000L-93 Mianus 500
max(amount)2000
SELECT MAX(amount)FROM loan
Spring 2020 – University of Virginia 13© Praphamontripong
Example: COUNTGiven the loan schema
loan(loan_number, branch_name, amount)
Count the number of tuples in the loan table
loan_number branch_name amountL-11 Round Hill 900L-14 Downtown 1500L-15 Perryridge 1500L-16 Perryridge 1300L-17 Downtown 1000L-23 Redwood 2000L-93 Mianus 500
count(*)7
Count(loan_number)7
SELECT count(*)FROM loan
SELECT count(loan_number)FROM loan
Spring 2020 – University of Virginia 14© Praphamontripong
Example: COUNT .. DISTINCTGiven the loan schema
loan(loan_number, branch_name, amount)
Count the number of values in the branch_name column
loan_number branch_name amountL-11 Round Hill 900L-14 Downtown 1500L-15 Perryridge 1500L-16 Perryridge 1300L-17 Downtown 1000L-23 Redwood 2000L-93 Mianus 500
count(branch_name)7
count(distinct branch_name)5
SELECT count(branch_name)FROM loan
SELECT count(DISTINCT branch_name)FROM loan
Spring 2020 – University of Virginia 15© Praphamontripong
Aggregation: Order of Actions
1. The FROM clause generates the data set
2. The WHERE clause filters the data set generated by the FROM clause
3. The GROUP BY clause aggregates the data set that was filtered by the WHERE clause (note: GROUP BY does not sort the result set)
4. The HAVING clause filters the data set that was aggregated by the GROUP BY clause
5. The SELECT clause transforms the filtered aggregated data set
6. The ORDER BY clause sorts the transformed data set
SELECT select_listFROM table_source[WHERE search_condition][GROUP BY group_by_expression][HAVING search_condition][ORDER BY order_expression [ASC | DESC] ]
Order matters
Spring 2020 – University of Virginia 16© Praphamontripong
Grouping RequirementSeveral DBMS requires that the columns appear in the SELECT clause that are not used in an aggregation function must appear in the GROUP BY clause
SELECT column_A, column_B, some_aggregation_functionFROM table_sourceGROUP BY column_A, column_B
SELECT column_A, column_B, some_aggregation_functionFROM table_sourceGROUP BY column_B
Spring 2020 – University of Virginia 17© Praphamontripong
Example: SUM with GROUP BYGiven the loan schema
loan(loan_number, branch_name, amount)
The sum of the amounts of all loans for each branch is expressed by
loan_number branch_name amountL-11 Round Hill 900L-14 Downtown 1500L-15 Perryridge 1500L-16 Perryridge 1300L-17 Downtown 1000L-23 Redwood 2000L-93 Mianus 500
branch_name SUM(amount)Downtown 2500
Mianus 500Perryridge 2800Redwood 2000Round Hill 900
SELECT branch_name, SUM(amount)FROM loanGROUP BY branch_name;
Spring 2020 – University of Virginia 18© Praphamontripong
Let’s Try (1)• Open your MySQL terminal (use phpMyAdmin or GCP shell)
• Use the bank database from in-class4 • Or run the script to set up your database
http://www.cs.virginia.edu/~up3f/cs4750/inclass/inclass04/alldbs.sql
• Consider table: account
• Write SQL statement to solve the following problem
Find the total amount each branch has in accounts
SELECT branch_name, SUM(balance) AS Total FROM accountGROUP BY branch_name;
Spring 2020 – University of Virginia 19© Praphamontripong
Let’s Try (2)• Open your MySQL terminal (use phpMyAdmin or GCP shell)
• Use the bank database from in-class4 • Or run the script to set up your database
http://www.cs.virginia.edu/~up3f/cs4750/inclass/inclass04/alldbs.sql
• Consider table: account
• Write SQL statement to solve the following problem
Find the average amount each branch has in accounts
SELECT branch_name, AVG(balance) AS average FROM accountGROUP BY branch_name;
Spring 2020 – University of Virginia 20© Praphamontripong
Grouping, Aggregation, and Null• The value NULL is ignored in any aggregation
• Not contribute to a sum, average, or count of an attribute• Cannot be the minimum or maximum in its column
• Null is treated as an ordinary value when forming groups
• Can have a group with NULL attribute(s)
• When performing any aggregation except count over an empty bag of values, the result is NULL
• The count of an empty bag is 0
Spring 2020 – University of Virginia 21© Praphamontripong
HAVING Clauses• An aggregation in a HAVING clause applies only to the
tuples of the group being tested – filter groups
• Any attributes of relations in the FROM clause may be aggregated in the HAVING clause
• But only those attributes that are in the GROUP BY list may appear unaggregated in the having clause
branch_name SUM(amount)Downtown 2500Perryridge 2800
The sum of the amounts of all loans for each branch that has more than one loan is expressed by
SELECT branch_name, sum(amount) FROM loanGROUP BY branch_nameHAVING COUNT(branch_name) > 1;
Spring 2020 – University of Virginia 22© Praphamontripong
Example 1List the number of customers in each country. Only include countries with more than 10 customers
count(id) Country11 France11 Germany13 USA
SELECT count(id), countryFROM CustomerGROUP BY countryHAVING COUNT(id) > 10;
Spring 2020 – University of Virginia 23© Praphamontripong
Example 2List the number of customers in each country, except USA, sorted high to low. Only include countries with more than 9 or more customers
count(id) Country11 France11 Germany9 Brazil
SELECT COUNT(id), countryFROM CustomerWHERE counter <> “USA”GROUP BY countryHAVING COUNT(id) >= 9ORDER BY COUNT(id) DESC;
Spring 2020 – University of Virginia 24© Praphamontripong
Final Notes about Aggregation#1: Keep the GROUP BY clause small and precise
• Several DBMSs require that all non-aggregated columns must be in the GROUP BY clause
• Excessive columns in GROUP BY can negatively impact the query’s performance; make the query hard to read, understand, rewrite
• For queries that need both aggregations and details, do all aggregations in subqueries first, then join those to the tables to retrieve the details
Spring 2020 – University of Virginia 25© Praphamontripong
Final Notes about Aggregation#2: COUNT(*) and COUNT(<column_name>) are different
• COUNT(*) – count all rows, including ones with null values
• COUNT(<column_name>) – count only the rows where the column value is not NULL
• Sometimes, dividing a query into subqueies can be more efficient than using a GROUP BY (more about subqueries next week)
Spring 2020 – University of Virginia 26© Praphamontripong
Final Notes about Aggregation#3: Use DISTINCT to get distinct counts
• COUNT(*) – returns the number of rows in a group, including NULL value and duplicates
• COUNT(<column_name>) – returns the number of rows where the column value is not NULL
• COUNT(DISTINCT <column_name>) – returns the number of rows with unique, non-null values of the column
Spring 2020 – University of Virginia 27© Praphamontripong
Wrap-Up• Aggregation functions
• Order of actions matter when applying aggregation
• Aggregation helps make decisions and succinctly convey information
What’s next?
• SQL – Joins
• Combine techniques (aggregates and joins) to solve complex questions