04 | grouping and aggregating data brian alderman | mct, ceo / founder of microtechpoint tobias...
TRANSCRIPT
Click to edit Master subtitle style
04 | Grouping and Aggregating Data
Brian Alderman | MCT, CEO / Founder of MicroTechPointTobias Ternstrom | Microsoft SQL Server Program Manager
Course Topics
Querying Microsoft SQL Server 2012 Jump Start
01 | Introducing SQL Server 2012 SQL Server types of statements; other SQL statement elements; basic SELECT statements
02 | Advanced SELECT Statements DISTINCT, Aliases, scalar functions and CASE, using JOIN and MERGE; Filtering and sorting data, NULL values
03 | SQL Server Data Types Introduce data types, data type usage, converting data types, understanding SQL Server function types
04 | Grouping and Aggregating Data Aggregate functions, GROUP BY and HAVING clauses, subqueries; self-contained, correlated, and EXISTS; Views, inline-table valued functions, and derived tables
| Lunch Break Eat, drink, and recharge for the afternoon session
Aggregate functionsGROUP BY and HAVING clausesSubqueries (self-contained, correlated, and EXISTS)Working with table functions
Module Overview
Aggregate Functions
Common built-in aggregate functions
• STDEV• STDEVP• VAR• VARP
• STDEV• STDEVP• VAR• VARP
• SUM• MIN• MAX• AVG• COUNT• COUNT_BIG
• CHECKSUM_AGG• GROUPING• GROUPING_ID
Common Statistical Other
Working with aggregate functions
Aggregate functions:Return a scalar value (with no column name)Ignore NULLs except in COUNT(*)Can be used in
SELECT, HAVING, and ORDER BY clausesFrequently used with GROUP BY clause
UniqueOrders Avg_UnitPrice Min_OrderQty Max_LineTotal------------- ------------ ------------ -------------31465 465.0934 1 27893.619000
UniqueOrders Avg_UnitPrice Min_OrderQty Max_LineTotal------------- ------------ ------------ -------------31465 465.0934 1 27893.619000
SELECT COUNT (DISTINCT SalesOrderID) AS UniqueOrders, AVG(UnitPrice) AS Avg_UnitPrice, MIN(OrderQty)AS Min_OrderQty, MAX(LineTotal) AS Max_LineTotalFROM Sales.SalesOrderDetail;
SELECT COUNT (DISTINCT SalesOrderID) AS UniqueOrders, AVG(UnitPrice) AS Avg_UnitPrice, MIN(OrderQty)AS Min_OrderQty, MAX(LineTotal) AS Max_LineTotalFROM Sales.SalesOrderDetail;
Using DISTINCT with aggregate functionsUse DISTINCT with aggregate functions to summarize only unique valuesDISTINCT aggregates eliminate duplicate values, not rows (unlike SELECT DISTINCT)Compare (with partial results):
SELECT SalesPersonID, YEAR(OrderDate) AS OrderYear,COUNT(CustomerID) AS All_Custs,COUNT(DISTINCT CustomerID) AS Unique_CustsFROM Sales.SalesOrderHeaderGROUP BY SalesPersonID, YEAR(OrderDate);
SELECT SalesPersonID, YEAR(OrderDate) AS OrderYear,COUNT(CustomerID) AS All_Custs,COUNT(DISTINCT CustomerID) AS Unique_CustsFROM Sales.SalesOrderHeaderGROUP BY SalesPersonID, YEAR(OrderDate);
SalesPersonID OrderYear All_Custs Unique_custs----------- ----------- ----------- ------------289 2006 84 48281 2008 52 27285 2007 9 8277 2006 140 57
SalesPersonID OrderYear All_Custs Unique_custs----------- ----------- ----------- ------------289 2006 84 48281 2008 52 27285 2007 9 8277 2006 140 57
Using the GROUP BY clause
GROUP BY creates groups for output rows, according to unique combination of values specified in the GROUP BY clause
GROUP BY calculates a summary value for aggregate functions in subsequent phases
Detail rows are “lost” after GROUP BY clause is processed
SELECT <select_list>FROM <table_source>WHERE <search_condition>GROUP BY <group_by_list>;
SELECT <select_list>FROM <table_source>WHERE <search_condition>GROUP BY <group_by_list>;
SELECT SalesPersonID, COUNT(*) AS CntFROM Sales.SalesOrderHeaderGROUP BY SalesPersonID;
SELECT SalesPersonID, COUNT(*) AS CntFROM Sales.SalesOrderHeaderGROUP BY SalesPersonID;
Using Aggregate functions
Demo
GROUP BY and HAVING
GROUP BY and logical order of operationsHAVING, SELECT, and ORDER BY must return a single value per groupAll columns in SELECT, HAVING, and ORDER BY must appear in GROUP BY clause or be inputs to aggregate expressions
If a query uses GROUP BY, all subsequent phases operate on the groups, not source rows
Logical Order Phase Comments
5 SELECT
1 FROM
2 WHERE
3 GROUP BY Creates groups
4 HAVING Operates on groups
6 ORDER BY
Using GROUP BY with aggregate functionsAggregate functions are commonly used in SELECT clause, summarize per group:
Aggregate functions may refer to any columns, not just those in GROUP BY clause
SELECT productid, MAX(OrderQty) AS largest_orderFROM Sales.SalesOrderDetailGROUP BY productid;
SELECT productid, MAX(OrderQty) AS largest_orderFROM Sales.SalesOrderDetailGROUP BY productid;
SELECT CustomerID, COUNT(*) AS cntFROM Sales.SalesOrderHeaderGROUP BY CustomerID;
SELECT CustomerID, COUNT(*) AS cntFROM Sales.SalesOrderHeaderGROUP BY CustomerID;
Filtering grouped data using HAVING ClauseHAVING clause provides a search condition that each group must satisfyHAVING clause is processed after GROUP BY
SELECT CustomerID, COUNT(*) AS Count_OrdersFROM Sales.SalesOrderHeaderGROUP BY CustomerIDHAVING COUNT(*) > 10;
SELECT CustomerID, COUNT(*) AS Count_OrdersFROM Sales.SalesOrderHeaderGROUP BY CustomerIDHAVING COUNT(*) > 10;
Compare HAVING to WHERE clauses
WHERE filters rows before groups createdControls which rows are placed into groups
HAVING filters groupsControls which groups are passed to next logical phase
• Using a COUNT(*) expression in HAVING clause is useful to solve common business problems:
• Show only customers that have placed more than one order:
• Show only products that appear on 10 or more orders:
SELECT Cust.Customerid, COUNT(*) AS cntFROM Sales.Customer AS Cust JOIN Sales.SalesOrderHeader AS Ord ON Cust.CustomerID = ORD.CustomerIDGROUP BY Cust.CustomerIDHAVING COUNT(*) > 1;
SELECT Cust.Customerid, COUNT(*) AS cntFROM Sales.Customer AS Cust JOIN Sales.SalesOrderHeader AS Ord ON Cust.CustomerID = ORD.CustomerIDGROUP BY Cust.CustomerIDHAVING COUNT(*) > 1;
SELECT Prod.ProductID, COUNT(*) AS cntFROM Production.Product AS ProdJOIN Sales.SalesOrderDetail AS Ord ON Prod.ProductID = Ord.ProductIDGROUP BY Prod.ProductIDHAVING COUNT(*) >= 10;
SELECT Prod.ProductID, COUNT(*) AS cntFROM Production.Product AS ProdJOIN Sales.SalesOrderDetail AS Ord ON Prod.ProductID = Ord.ProductIDGROUP BY Prod.ProductIDHAVING COUNT(*) >= 10;
Using GROUP BY and HAVING
Demo
Subqueries
Working with subqueries
Subqueries are nested queries or queries within queriesResults from inner query are passed to outer query
Inner query acts like an expression from perspective of outer query
Subqueries can be self-contained or correlatedSelf-contained subqueries have no dependency on outer queryCorrelated subqueries depend on values from outer query
Subqueries can be scalar, multi-valued, or table-valued
Writing scalar subqueries
Scalar subquery returns single value to outer queryCan be used anywhere single-valued expression can be used: SELECT, WHERE, etc.
If inner query returns an empty set, result is converted to NULLConstruction of outer query determines whether inner query must return a single value
SELECT SalesOrderID, ProductID, UnitPrice, OrderQtyFROM Sales.SalesOrderDetailWHERE SalesOrderID = (SELECT MAX(SalesOrderID) AS LastOrderFROM Sales.SalesOrderHeader);
SELECT SalesOrderID, ProductID, UnitPrice, OrderQtyFROM Sales.SalesOrderDetailWHERE SalesOrderID = (SELECT MAX(SalesOrderID) AS LastOrderFROM Sales.SalesOrderHeader);
Writing multi-valued subqueries
Multi-valued subquery returns multiple values as a single column set to the outer queryUsed with IN predicate
If any value in the subquery result matches IN predicate expression, the predicate returns TRUE
May also be expressed as a JOIN (test both for performance)
SELECT CustomerID, SalesOrderId,TerritoryIDFROM Sales.SalesorderHeaderWHERE CustomerID IN (SELECT CustomerIDFROM Sales.CustomerWHERE TerritoryID = 10);
SELECT CustomerID, SalesOrderId,TerritoryIDFROM Sales.SalesorderHeaderWHERE CustomerID IN (SELECT CustomerIDFROM Sales.CustomerWHERE TerritoryID = 10);
Writing queries using EXISTS with subqueriesThe keyword EXISTS does not follow a column name or other expression.The SELECT list of a subquery introduced by EXISTS typically only uses an asterisk (*).
SELECT CustomerID, PersonIDFROM Sales.Customer AS CustWHERE EXISTS (SELECT * FROM Sales.SalesOrderHeader AS OrdWHERE Cust.CustomerID = Ord.CustomerID);
SELECT CustomerID, PersonIDFROM Sales.Customer AS CustWHERE EXISTS (SELECT * FROM Sales.SalesOrderHeader AS OrdWHERE Cust.CustomerID = Ord.CustomerID);
SELECT CustomerID, PersonIDFROM Sales.Customer AS CustWHERE NOT EXISTS (SELECT * FROM Sales.SalesOrderHeader AS OrdWHERE Cust.CustomerID = Ord.CustomerID);
SELECT CustomerID, PersonIDFROM Sales.Customer AS CustWHERE NOT EXISTS (SELECT * FROM Sales.SalesOrderHeader AS OrdWHERE Cust.CustomerID = Ord.CustomerID);
Using subqueries
Demo
Table Functions
Creating simple views
Views are saved queries created in a database by administrators and developersViews are defined with a single SELECT statementORDER BY is not permitted in a view definition without the use of TOP, OFFSET/FETCH, or FOR XML
To sort the output, use ORDER BY in the outer queryView creation supports additional options beyond the scope of this class
CREATE VIEW HumanResources.EmployeeListASSELECT BusinessEntityID, JobTitle, HireDate, VacationHoursFROM HumanResources.Employee;
SELECT * FROM HumanResources.EmployeeList
CREATE VIEW HumanResources.EmployeeListASSELECT BusinessEntityID, JobTitle, HireDate, VacationHoursFROM HumanResources.Employee;
SELECT * FROM HumanResources.EmployeeList
Creating simple inline table-valued functionsTable-valued functions are created by administrators and developersCreate and name function and optional parameters with CREATE FUNCTIONDeclare return type as TABLEDefine inline SELECT statement following RETURN
CREATE FUNCTION Sales.fn_LineTotal (@SalesOrderID INT)RETURNS TABLEASRETURN SELECT SalesOrderID, CAST((OrderQty * UnitPrice * (1 - SpecialOfferID)) AS DECIMAL(8, 2)) AS LineTotal FROM Sales.SalesOrderDetail WHERE SalesOrderID = @SalesOrderID ;
CREATE FUNCTION Sales.fn_LineTotal (@SalesOrderID INT)RETURNS TABLEASRETURN SELECT SalesOrderID, CAST((OrderQty * UnitPrice * (1 - SpecialOfferID)) AS DECIMAL(8, 2)) AS LineTotal FROM Sales.SalesOrderDetail WHERE SalesOrderID = @SalesOrderID ;
Writing queries with derived tables
Derived tables are named query expressions created within an outer SELECT statementNot stored in database – represents a virtual relational tableWhen processed, unpacked into query against underlying referenced objectsAllow you to write more modular queries
Scope of a derived table is the query in which it is defined
SELECT <column_list>FROM (
<derived_table_definition>) AS <derived_table_alias>;
SELECT <column_list>FROM (
<derived_table_definition>) AS <derived_table_alias>;
Guidelines for derived tables
Derived Tables Must
• Have an alias• Have names for all
columns• Have unique names
for all columns• Not use an ORDER BY
clause (without TOP or OFFSET/FETCH)• Not be referred to
multiple times in the same query
Derived Tables May
• Use internal or external aliases for columns• Refer to parameters
and/or variables• Be nested within other
derived tables
Passing arguments to derived tablesDerived tables may refer to argumentsArguments may be:
Variables declared in the same batch as the SELECT statementParameters passed into a table-valued function or stored procedure
DECLARE @emp_id INT = 9;SELECT orderyear, COUNT(DISTINCT custid) AS cust_countFROM (
SELECT YEAR(orderdate) AS orderyear, custidFROM Sales.OrdersWHERE empid=@emp_id
) AS derived_yearGROUP BY orderyear;
DECLARE @emp_id INT = 9;SELECT orderyear, COUNT(DISTINCT custid) AS cust_countFROM (
SELECT YEAR(orderdate) AS orderyear, custidFROM Sales.OrdersWHERE empid=@emp_id
) AS derived_yearGROUP BY orderyear;
Creating queries with common table expressionsUse WITH clause to create a CTE:
Define the table expression in WITH clauseReference the CTE in the outer queryAssign column aliases (inline or external)Pass arguments if desired
WITH CTE_year AS(SELECT YEAR(OrderDate) AS OrderYear, customerIDFROM Sales.SalesOrderHeader)SELECT orderyear, COUNT(DISTINCT CustomerID) AS CustCountFROM CTE_yearGROUP BY OrderYear;
WITH CTE_year AS(SELECT YEAR(OrderDate) AS OrderYear, customerIDFROM Sales.SalesOrderHeader)SELECT orderyear, COUNT(DISTINCT CustomerID) AS CustCountFROM CTE_yearGROUP BY OrderYear;
Table functions
Demo
Summary
Aggregate functions are used in SELECT, HAVING, and ORDER By clauses, but are most frequently used with the GROUP BY clause and returns a scalar value
Common built-in aggregate functions include
• STDEV• STDEVP• VAR• VARP
• STDEV• STDEVP• VAR• VARP
• SUM• MIN• MAX• AVG• COUNT• COUNT_BIG
• CHECKSUM_AGG• GROUPING• GROUPING_ID
Common Statistical Other
Summary
Use DISTINCT with aggregate functions to only summarize the unique values as it will eliminate duplicate values, not rows
GROUP BY creates groups for output rows, according to unique combination of values specified in the GROUP BY clause. GROUP BY also calculates a summary value for aggregate functions in subsequent phases
HAVING clause provides a search condition that each group must satisfy and is processed after the GROUP BY clause
Summary
Subqueries are nested queries or queries within queries where the results from inner query are passed to the outer query
Type of subqueries includeScalar subqueriesMulti-valued subqueries Subqueries with the EXISTS clause
SummaryViews are named tables expressions with definitions stored in a database that can be referenced in a SELECT statement just like a table
Views are defined with a single SELECT statement and then saved in the database as queries
Table-valued functions are created with the CREATE FUNCTION. They contain a RETURN type of table
Derived tables allow you to write more modular queriesas named query expressions that are created within an outer SELECT statement. They represent a virtual relational table so are not stored in the database
CTEs are similar to derived tables in scope and naming requirements but unlike derived tables, CTEs support multiple definitions, multiple references, and recursion
Course Topics
Querying Microsoft SQL Server 2012 Jump Start
01 | Introducing SQL Server 2012 SQL Server types of statements; other SQL statement elements; basic SELECT statements
02 | Advanced SELECT Statements DISTINCT, Aliases, scalar functions and CASE, using JOIN and MERGE; Filtering and sorting data, NULL values
03 | SQL Server Data Types Introduce data types, data type usage, converting data types, understanding SQL Server function types
04 | Grouping and Aggregating dataAggregate functions, GROUP BY and HAVING clauses, subqueries; self-contained, correlated, and EXISTS; Views, inline-table valued functions, and derived tables
| Lunch BreakEat, drink, and recharge for the afternoon session
©2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Office, Azure, System Center, Dynamics and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.