carson college of business · web view2016/10/02  · a different query approach will be shown...

23
Featherman’s SQL Server Intermediate Functions © Sub-queries A recurring problem with data retrieval is that data is in different tables or worse in different databases. While you can often merge data with INNER JOINS, LEFT or RIGHT JOINS, or FULL JOINS, often the analyst does not have a good idea of how the tables are connected and joins are performed incorrectly. You can receive repeating numbers or inflated numbers (better verify the calculations!) Joins won’t always work, so the analyst and DBA need more precise tools. If joins are not accurately merging the tables of data, then reach for sub–queries aka correlated sub-queries. Here we add columns of metrics from different tables together into one resultset without joining the data. You need e a column in common between the different tables, such as employeeID. Run the queries in the left hand column in SSMS and read the commentary in the right hand column. Perform data visualizations for insight and fun. More info at https://technet.microsoft.com/en-us/library/ms189062(v=sql.105).aspx USE [AdventureWorksDW2012]; SELECT r.[ResellerKey], r.[ResellerName], r. [BusinessType] , COUNT([OrderQuantity]) AS [Unit Sales] , SUM([SalesAmount]) AS [Revenue] FROM [dbo].[DimReseller] as r INNER JOIN [dbo].[FactResellerSales] as rs ON rs.[ResellerKey] = r.[ResellerKey] WHERE [GeographyKey] = 41 GROUP BY r.[ResellerKey], [ResellerName], [BusinessType] ORDER BY [Revenue] DESC ----- USE [AdventureWorksDW2012]; SELECT [ResellerKey], [ResellerName], [BusinessType] In this first example a sub-query replicates the results of an inner join and GROUP BY (). It is show as it is a pretty simple example. Please run both examples to see the similar results. The second query is an example of a sub-query. When you see a second SELECT statement (called an inner query) inside a normal SELECT statement 1

Upload: others

Post on 13-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Carson College of Business · Web view2016/10/02  · A different query approach will be shown later, but here is a correlated sub-query approach that should be within your reach

Featherman’s SQL Server Intermediate Functions © Sub-queries

A recurring problem with data retrieval is that data is in different tables or worse in different databases. While you can often merge data with INNER JOINS, LEFT or RIGHT JOINS, or FULL JOINS, often the analyst does not have a good idea of how the tables are connected and joins are performed incorrectly. You can receive repeating numbers or inflated numbers (better verify the calculations!) Joins won’t always work, so the analyst and DBA need more precise tools. If joins are not accurately merging the tables of data, then reach for sub–queries aka correlated sub-queries. Here we add columns of metrics from different tables together into one resultset without joining the data. You need e a column in common between the different tables, such as employeeID. Run the queries in the left hand column in SSMS and read the commentary in the right hand column. Perform data visualizations for insight and fun. More info at https://technet.microsoft.com/en-us/library/ms189062(v=sql.105).aspx

USE [AdventureWorksDW2012];SELECT r.[ResellerKey], r.[ResellerName], r.[BusinessType], COUNT([OrderQuantity]) AS [Unit Sales], SUM([SalesAmount]) AS [Revenue]FROM [dbo].[DimReseller] as r

INNER JOIN [dbo].[FactResellerSales] as rs ON rs.[ResellerKey] = r.[ResellerKey]WHERE [GeographyKey] = 41

GROUP BY r.[ResellerKey], [ResellerName], [BusinessType]ORDER BY [Revenue] DESC-----USE [AdventureWorksDW2012];SELECT [ResellerKey], [ResellerName], [BusinessType]

, (SELECT COUNT([OrderQuantity]) FROM [dbo].[FactResellerSales] as s WHERE s.[ResellerKey] = r.[ResellerKey]) AS [Unit Sales]

, (SELECT SUM([SalesAmount]) FROM [dbo].[FactResellerSales] as sWHERE s.[ResellerKey] = r.[ResellerKey]) AS [Total Revenue]FROM [dbo].[DimReseller] as rWHERE [GeographyKey] = 41ORDER BY [Total Revenue] DESC

In this first example a sub-query replicates the results of an inner join and GROUP BY (). It is show as it is a pretty simple example. Please run both examples to see the similar results.

The second query is an example of a sub-query. When you see a second SELECT statement (called an inner query) inside a normal SELECT statement (called the outer query), then you know you have a subordinate query aka sub-query.

The sub-queries must be in parentheses. They can add extra columns from different tables (shown next). They can perform aggregating functions; here a count and a sum. The query needs to link the new tables to the base table (here DimReseller).

1

Page 2: Carson College of Business · Web view2016/10/02  · A different query approach will be shown later, but here is a correlated sub-query approach that should be within your reach

If you are fortunate the business requirement can be satisfied by pulling data using an inner join (or other type of join). This isn’t always the case. For example, how can we pull some primary key columns from two tables then add columns of aggregated data from the two tables? How would you do this if you didn’t know SQL? Well you can copy and paste data from Excel. Sub-queries can save you a lot of Excel copy and paste time, and potential embarrassment caused by publishing erroneous reports. The problem is that if the data is in two fact tables each sharing the same primary key, you would have to comingle the data (often impossible if the table schemas are different, or the data is at different levels of aggregation - granularity). You might need to make a new table pulling the primary key values together. Often you see data explosion, missing columns or duplicated values meaning the connection between tables is wrong. But you can research the database and find the relationships and hopefully join the tables. When the joins don’t seem to work, use sub-queries. Sub-queries will prove to be a useful tool in our toolbox. Straight pivot tables won’t work for some of the same problems shown here, the primary key values are too different in each table.

What if you want to pull total sales for every product sold on either sales channel? (Reseller and Internet?). This simple request is apparently not that simple.

USE [AdventureWorksDW2012];SELECT DISTINCT r.[ProductKey] as [Reseller Product ID], i.productkey as [web product ID], sum(r.[OrderQuantity]) as [reseller totals], sum(i.[OrderQuantity]) as [Internet totals]

FROM [dbo].[FactResellerSales] as rFULL JOIN [dbo].[FactInternetSales] as iON r.productkey = i.productkey

GROUP BY r.[ProductKey] , i.productkey ORDER BY r.[ProductKey]

Can you suggest a better way to pull this data? This query pulls 350 rows of data which may be right, may be wrong. This query may be pulling all the data and maybe not.

The ORDER BY line is not working very well, how do you improve this?

At any rate you can see that with our

current knowledge of SQL, we cannot create the dataset need.

Please email [email protected] if you have a solution that works without sub-queries. Thank you.

2

Page 3: Carson College of Business · Web view2016/10/02  · A different query approach will be shown later, but here is a correlated sub-query approach that should be within your reach

USE [AdventureWorksDW2012];SELECT DISTINCT r.[ProductKey] as [Reseller Product ID], i.productkey as [Web Product ID], ISNULL(SUM(r.[OrderQuantity]), 0) as [Rseller totals], ISNULL(SUM(i.[OrderQuantity]), 0) as [Web totals]

FROM [dbo].[FactResellerSales] as rFULL JOIN [dbo].[FactInternetSales] as iON r.productkey = i.productkey

GROUP BY r.[ProductKey] , i.productkey ORDER BY r.[ProductKey]

Before the next code is explained this function is a keeper

(ISNULL(fieldname),0) AS [Reseller Units]. This feature reads, for any of the retrieved rows, if the fieldname value is NULL then replace the NULL with a 0. This is important to clean up the results and later on if you want to add one column to another column, you can’t add a NULL to a number, any field that has NULLS cannot be added to any other field. So NULL + 300 = error, whereas 0 + 300 = 300. Later we will calculate the total for both sales channels.

USE [AdventureWorksDW2012];SELECT COUNT(DISTINCT([ProductKey]))

FROM [dbo].[FactResellerSales] as rs----------------------------------------------------------USE [AdventureWorksDW2012];SELECT COUNT(DISTINCT([ProductKey]))

FROM [dbo].[FactInternetSales] as is

Let’s dive into the data.If you run this first query you discover 334 Different products have been sold in the different Bike Shops,

The second query reports that 158 SKU’s have been sold online.

See all the nulls in the table above? This means that some of the items were sold in the bike shops and not on the web, and vice versa. Only a small sub-set of the products were sold on both

3

Page 4: Carson College of Business · Web view2016/10/02  · A different query approach will be shown later, but here is a correlated sub-query approach that should be within your reach

channels.USE [AdventureWorksDW2012];SELECT COUNT(DISTINCT(r.[ProductKey])) as [Reseller Products], COUNT(DISTINCT(i.[ProductKey])) as [Internet Products] FROM [dbo].[FactResellerSales] as rINNER JOIN [dbo].[FactInternetSales] as iON r.productkey = i.productkey

The Inner Join pulls the values that are in both table. 142 products have been sold on both retail channels..

USE [AdventureWorksDW2012];SELECT COUNT(DISTINCT(r.[ProductKey])) as [Reseller Products], COUNT(DISTINCT(i.[ProductKey])) as [Internet Products] FROM [dbo].[FactResellerSales] as rLEFT JOIN [dbo].[FactInternetSales] as ION r.productkey = i.productkey

The left join pulls all the values from the table to the first table in the FROM statement and the matching values from the joined, second table.

334 products have been sold on the Reseller channel, of those 334, 142 have also been sold on the Internet channel (142 in common)

USE [AdventureWorksDW2012];SELECT COUNT(DISTINCT(r.[ProductKey])) as [Reseller Products], COUNT(DISTINCT(i.[ProductKey])) as [Internet Products]

FROM [dbo].[FactResellerSales] as rRIGHT JOIN [dbo].[FactInternetSales] as iON r.productkey = i.productkey

The right join pulls all the values from the joined table and then finds the matches from the first table in the FROM statement.

158 products have been sold on the Internet channel, of those 158, 142 have also been sold on the Internet channel (142 in common). So the 142 are the matching values.

USE [AdventureWorksDW2012];SELECT COUNT(DISTINCT(r.[ProductKey])) as [Reseller Products], COUNT(DISTINCT(i.[ProductKey])) as [Internet Products]

FROM [dbo].[FactResellerSales] as rFULL JOIN [dbo].[FactInternetSales] as iON r.productkey = i.productkey

Here we break out the FULL JOIN. Now we can see the full count for both channels, whether they were sold on the other channel or not. This is great for counting, but as you saw above, now we introduce lots of NULLS as most of the products were sold on one channel or the other, not both. What’s and analyst to do?

4

Page 5: Carson College of Business · Web view2016/10/02  · A different query approach will be shown later, but here is a correlated sub-query approach that should be within your reach

USE [AdventureWorksDW2012];SELECT COUNT(DISTINCT(r.[ProductKey])) as [Reseller Products] FROM [dbo].[FactResellerSales] as r

WHERE r.[ProductKey] NOT IN (SELECT DISTINCT([ProductKey]) from [dbo].[FactInternetSales] )

192 products have been sold on the Reseller channel but not on the Internet. If you want a product list, then take out the COUNT() term.

If you add some more infomration from the products tables then you can make a recommendation as to what prodcts can also be offered online.

USE [AdventureWorksDW2012];SELECT COUNT(DISTINCT(i.[ProductKey])) as [Internet Products] FROM [dbo].[FactInternetSales] as i

WHERE i.[ProductKey] NOT IN (SELECT DISTINCT([ProductKey]) from [dbo].[FactResellerSales] )

16 products have been sold on the Internt channel but not on the Reseller Channel. If you want a product list, then take out the COUNT() term. In the first query you can add more columns from the products table to see more information. Can you do it?

This is a very common usage of a sub-query to create a very different WHERE clause.

USE [AdventureWorksDW2012];SELECT DISTINCT i.[ProductKey] as [Product ID], [EnglishProductName], [ListPrice]

FROM [dbo].[FactInternetSales] as i INNER JOIN [dbo].[DimProduct] as p ON i.ProductKey = p.ProductKey

WHERE i.[ProductKey] NOT IN (SELECT DISTINCT([ProductKey]) from [dbo].[FactResellerSales] )

Here is the answer.

This is the list of 16 products that are sold on the internet but not in bike stores.

5

Page 6: Carson College of Business · Web view2016/10/02  · A different query approach will be shown later, but here is a correlated sub-query approach that should be within your reach

USE [AdventureWorksDW2012];SELECT r.[ProductKey] as [Reseller Products], SUM(r.OrderQuantity) as [Sales on Reseller Channel], SUM(i.OrderQuantity) as [Sales on Internet Channel], SUM(r.OrderQuantity) + SUM(i.OrderQuantity) as [Total Sales]

FROM [dbo].[FactResellerSales] as rINNER JOIN [dbo].[FactInternetSales] as iON r.productkey = i.productkey

GROUP BY r.[ProductKey]This query provides the data we want but, guess what? On only 142 rows. So what do most analysts do?Here is the two resultsets copied into Excel. Many analysts run two queries and then copy and paste the data from the two queries into an EXCEL spreadsheet. Here a lot of copy/pasting is needed as since identical products have not been sold on both channels, the data gets off kilter. Already considerable copy/paste work has been performed to get the top 4 lines displayed correctly.After the first four product ID’s the items are not aligning.

What’s an analyst to do?

Use a sub- query!!!

So we have not solved the problem yet, but hopefully made the case for the need for some new functionality.

You can tell the correlated sub-queries in that they have their own SELECT statement. They don’t use an INNER JOIN, but they do include a WHERE clause that performs similar functionality. So here the reseller units comes from the FactResellerSales table and the Internet Units comes from the FactInternetSales table. Wow. You can also get similar functionality to work in pivot tables if your data model is set up correctly.

6

Page 7: Carson College of Business · Web view2016/10/02  · A different query approach will be shown later, but here is a correlated sub-query approach that should be within your reach

USE [AdventureWorksDW2012];SELECT [EnglishProductSubcategoryName] as [Sub-Category], [ProductKey], [ProductAlternateKey], [EnglishProductName]

, ISNULL((SELECT SUM([OrderQuantity]) FROM [dbo].[FactResellerSales] as rsWHERE rs.[ProductKey] =p.ProductKey ),0) AS [Reseller Units]

, ISNULL((SELECT SUM([OrderQuantity]) FROM [dbo].[FactInternetSales] as iWHERE i.[ProductKey] =p.ProductKey),0) AS [Internet Units]

--This tot units column From Leo and Andrew Class of 2015 ,ISNULL((SELECT SUM([OrderQuantity]) FROM [dbo].[FactResellerSales] as rsWHERE rs.[ProductKey] =p.ProductKey),0)

+ ISNULL((SELECT SUM([OrderQuantity]) FROM [dbo].[FactInternetSales] as i

WHERE i.[ProductKey] =p.ProductKey),0) AS [Total Units Sold]

FROM [dbo].[DimProduct] as pINNER JOIN [dbo].[DimProductSubcategory] as sc on sc.[ProductSubcategoryKey] = p.[ProductSubcategoryKey]WHERE p.[FinishedGoodsFlag] = 1 ORDER BY [Sub-Category], [ProductKey]

Now we use two subqueries, and look at the trouble to calculate the total units. It has its own two subqueries! That problem is fixed in the next module. This query outputs 397 rows. 47 have 0 sales though so the correct number we need is 347. How do we pull the 347?

It would be nice to filter out the rows that have 0 in the Total Units Sold column. You can see the data visualizations are very useful.

7

Page 8: Carson College of Business · Web view2016/10/02  · A different query approach will be shown later, but here is a correlated sub-query approach that should be within your reach

The products table includes finished goods product #’s that are sold to consumers (and have a finishedgoodsflag value of 1) and products that are consumed in the manufacturing process (and have a finishedgoodsflag value of 0). Filtering by this column is a nice trick to see only the products that are for sale. However there are still products for sale that have not been sold, so further filtering is needed.

USE [AdventureWorksDW2012];SELECT p.[ProductKey], [EnglishProductName], ISNULL(SUM(r.[OrderQuantity]), 0) as [Reseller totals], ISNULL(SUM(i.[OrderQuantity]), 0) as [Internet totals], ISNULL(SUM(r.[OrderQuantity]), 0) + ISNULL(SUM(i.[OrderQuantity]), 0) as [Grand Total]FROM [dbo].[DimProduct] AS pFULL JOIN [dbo].[FactResellerSales] as r ON p.ProductKey = r.ProductKeyFULL JOIN [dbo].[FactInternetSales] as i ON p.ProductKey = i.ProductKeyWHERE [FinishedGoodsFlag] = 1 AND r.[OrderQuantity] >0 OR i.[OrderQuantity] >0

GROUP BY p.[ProductKey], [EnglishProductName]ORDER BY p.[ProductKey]

Thank you Jordan and Alexandria class of 2017 for these two tricks to pull the right data without a sub-query! Brilliant!

These two WSU analysts selected rows from the higher order concept the products table, then did FULL JOINS to bring in the data from the two fact tables.

This first solution uses an expanded WHERE clause to just not include for the GROUP BY analysis, any productID’s that are in the products catalog table but where not sold on either reseller channel.

USE [AdventureWorksDW2012];SELECT p.[ProductKey], [EnglishProductName], ISNULL(SUM(r.[OrderQuantity]), 0) as [Reseller totals], ISNULL(SUM(i.[OrderQuantity]), 0) as [Internet totals], ISNULL(SUM(r.[OrderQuantity]), 0) + ISNULL(SUM(i.[OrderQuantity]), 0) as [Grand Total] FROM [dbo].[DimProduct] AS pFULL join [dbo].[FactResellerSales] as r ON p.ProductKey = r.ProductKeyFULL JOIN [dbo].[FactInternetSales] as i ON p.ProductKey = i.ProductKeyWHERE [FinishedGoodsFlag] = 1

GROUP BY p.[ProductKey], [EnglishProductName]

HAVING NOT (ISNULL(SUM(r.[OrderQuantity]), 0) + ISNULL(SUM(i.[OrderQuantity]), 0) ) =0ORDER BY p.[ProductKey]

This second solution does the grouping of all the sales for all the products, the filters out the rows that have 0 in the grand total. Notice that the ORDER BY has to be after the HAVING.You can only use HAVING () after a GROUP BY ()

8

Page 9: Carson College of Business · Web view2016/10/02  · A different query approach will be shown later, but here is a correlated sub-query approach that should be within your reach

Persistence and understanding the data can create solutions. Sub-queries will still be needed however, so let’s bugger on.

As a follow-on activity, pick of the three highest selling products, and show the data for both sales channels, for each month for 2007 and 2008. Use the similar query below. How would you do this?

USE [AdventureWorksDW2012];

SELECT [ModelName],[ProductAlternateKey], p.[ProductKey], [EnglishProductName], Avg([UnitPrice]) as [Inet Avg. Sales Price], SUM([OrderQuantity]) as [Internet Units Sold], SUM([SalesAmount]) as [Internet Revenue]

FROM [dbo].[DimProduct] as pINNER JOIN [dbo].[FactInternetSales] as FIS ON FIS.[ProductKey] = p.[ProductKey]

GROUP BY [ModelName],[ProductAlternateKey], p.[ProductKey], [EnglishProductName]

ORDER BY [ModelName], [ProductAlternateKey]

Look at this problem. First how can we compare the Inet channel sales throughput and pricing, with the reseller channel.

More importantly how can we create:

A) a list of products that sell one one channel and not the otherB) al list of recommended products that should be selling on both

channels but are notC) An estimated sales volume and revenue of the product is sold on both

channels (vs. currently on one channel). This information is needed to create a production projection and schedule, and an estimate of cash flow needs to purchase products for resale and component parts for production.

OK now let’s get into the Sub-Queries.

9

Page 10: Carson College of Business · Web view2016/10/02  · A different query approach will be shown later, but here is a correlated sub-query approach that should be within your reach

USE [AdventureWorksDW2012];

SELECT [EnglishProductSubcategoryName] AS [Sub-Category], [ModelName],[ProductAlternateKey], p.[ProductKey], [EnglishProductName]

, ISNULL((SELECT SUM([OrderQuantity]) FROM [dbo].[FactResellerSales] as rs WHERE rs.[ProductKey] =p.ProductKey), 0) AS [Reseller Units]

, ISNULL((SELECT SUM([OrderQuantity]) FROM [dbo].[FactInternetSales] as FIS WHERE FIS.[ProductKey] =p.ProductKey), 0) AS [Internet Units]

, ISNULL((SELECT AVG([UnitPrice]) FROM [dbo].[FactResellerSales] as rsWHERE rs.[ProductKey] =p.ProductKey), 0) as [Reseller Avg. Sales Price]

, ISNULL((SELECT AVG([UnitPrice]) FROM [dbo].[FactInternetSales] as FISWHERE FIS.[ProductKey] =p.ProductKey), 0) as [Inet Avg. Sales Price]

FROM [dbo].[DimProduct] as pINNER JOIN [dbo].[DimProductSubcategory] as ps ON ps.ProductSubcategoryKey = p.ProductSubcategoryKey

WHERE [FinishedGoodsFlag]= 1 AND

(SELECT SUM([OrderQuantity]) FROM [dbo].[FactResellerSales] as rsWHERE rs.[ProductKey] =p.ProductKey) > 0 OR

(SELECT SUM([OrderQuantity]) FROM [dbo].[FactInternetSales] as FISWHERE FIS.[ProductKey] =p.ProductKey) >0

ORDER BY [Sub-Category], [ModelName], [ProductAlternateKey]

This query is a great leap forward using sub-queries. First off, where is the GROUP BY statement? These sub-queries create columns of additional aggregated metrics (analytics) from a fact table (prividing similar results to a GROUP BY query. As such they just compute one metric for each row. If you did need to use a GROUP BY inside a sub-query, that is also possible.

Here several sub-queries are used to add more columns to the resultset. Subqueries are wrapped in their own parentheses and have their own SELECT statement.

This query can be improved using Window functions, for example if there are several products sold in a product sub-category, it might be nice in this same report, to show the products % of total sales for the product sub-category for each of the sales channels.

Anyway notice the 4 sub-queries, each are at the same level of granularity. Each sub query creates a new column that either totals the # of units sold for the product or shows the average sales price. (hmm does the average sales price change by month?)

Check out the WHERE clause, go ahead and run the query with just the WHERE [FinishedGoodsFlag] =1 clause and comment out the rest. You should see almost 400 rows of data. About 50 of the rows however do not have any sales on either sales channel. With our expanded WHERE clause you can filter the list to only products that were either sold on one channel or the other.

The WHERE clause has two sub-queries of its own since you can not reference the aggregated field (ie, [Reseller Units]). The next module table variables does allow this.

Now its time to find out which products are sold one one

10

Page 11: Carson College of Business · Web view2016/10/02  · A different query approach will be shown later, but here is a correlated sub-query approach that should be within your reach

channel but not the other (this would reperesent a missed opportunity). Include an analysis of sales price, and its effect on channel sales.

Sub-query challenge: Perform an analysis of what products are priced radically different in the reseller channel (wholesale) and Internet Channel (B2C retail). Compare the pricing to the standard pricing in the products table.

USE [AdventureWorksDW2012]; SELECT [ModelName],[EnglishProductName], p.[ProductKey], [ProductAlternateKey], [Weight], [WeightUnitMeasureCode], SUM([OrderQuantity]) AS [Reseller Units]

, ISNULL((SELECT SUM([OrderQuantity]) FROM [dbo].[FactInternetSales] as FIS WHERE FIS.[ProductKey] =p.ProductKey), 0)AS [Internet Units]

FROM [dbo].[FactResellerSales] as rsINNER JOIN [dbo].[DimProduct] as p ON p.[ProductKey] = rs.[ProductKey]

WHERE NOT EXISTS (SELECT DISTINCT [ProductKey]FROM [dbo].[FactInternetSales] as FISWHERE FIS.ProductKey = rs.ProductKey)

GROUP BY p.[ProductKey], [ProductAlternateKey], [ModelName], [EnglishProductName], [Weight], [WeightUnitMeasureCode]

ORDER BY [ModelName], [Reseller Units] DESC

This query identifies the products sold via the reseller network, that are not sold on the Internet channel. New syntax is used to find products in one fact table that are not in the other fact table --WHERE NOT EXISTS. Study this carefully. Here is a web article that discusses it.

To prove the veracity of the query results (a query of the reseller transaction table using a filter based on a second fact table) a sub query is used. The sub query actually provides the last column of zero’s to prove that the sales for that product SKU is zero.

Challenge: Produce the list of 16 products sold on the Intenet and not in stores. Perform a profitability analysis of these items.

11

Page 12: Carson College of Business · Web view2016/10/02  · A different query approach will be shown later, but here is a correlated sub-query approach that should be within your reach

Provide your analysis of the top 25 products that should be introduced to the Internet channel. Include profitability in the analysis and units (which is a measure of popularity).

USE [AdventureWorksDW2012];

SELECT ps.[EnglishProductSubcategoryName] as [SubCategory],p.[ProductAlternateKey] as [Model #], p.[EnglishProductName],[BusinessType], COUNT(rs.[OrderQuantity]) AS [Reseller Units]

FROM [dbo].[FactResellerSales] as rsINNER JOIN [dbo].[DimProduct] as p ON p.[ProductKey] = rs.[ProductKey]INNER JOIN [dbo].[DimProductSubcategory] as ps ON ps.[ProductSubcategoryKey] =p.[ProductSubcategoryKey]INNER JOIN [dbo].[DimReseller] as r ON r.[ResellerKey] = rs.[ResellerKey]

GROUP BY ps.[EnglishProductSubcategoryName], p.[ProductAlternateKey], p.[EnglishProductName], [BusinessType]ORDER BY [SubCategory], [Model #]

This query is useful to start the analysis of the reseller channel. There are three different retailers (Business Types) that sell our products Warehouse, Value Added reseller and Specialty Bike Shop.

Which reseller type does best selling each of the sub-categories? Does price come into effect more in a warehouse type of store than a Specialty Bike shop, etc.?

12

Page 13: Carson College of Business · Web view2016/10/02  · A different query approach will be shown later, but here is a correlated sub-query approach that should be within your reach

Comparing performance to Goal - How are the salesreps performing? A different query approach will be shown later, but here is a correlated sub-query approach that should be within your reach. This is very helpful!!! First let’s examine the data. We can see that for the year 2007, employee 272 had 4 different sales quotas.

USE AdventureWorksDW2012; SELECT [EmployeeKey], [CalendarYear], [CalendarQuarter], [SalesAmountQuota]

FROM [dbo].[FactSalesQuota]WHERE [CalendarYear]= 2007

ORDER BY [EmployeeKey]

--This query shows the FactSalesQuota table, notice the sales rep quotas are by quarter.

USE [AdventureWorksDW2012];

SELECT rs.[EmployeeKey], CONCAT([FirstName], ' ', [LastName]) as Name, DATEPART(year, [OrderDate]) as [Year], DATEPART(quarter, [OrderDate]) as [Quarter], [SalesAmountQuota], COUNT([SalesAmount]) as [SalesTA's], SUM([SalesAmount]) as [Sales Total]

FROM [dbo].[FactResellerSales] as rsINNER JOIN [dbo].[FactSalesQuota] as q on q.EmployeeKey = rs.EmployeeKey AND q.CalendarYear = DATEPART(year, rs.[OrderDate])AND q.[CalendarQuarter] = DATEPART(quarter, rs.[OrderDate])

INNER JOIN [dbo].[DimEmployee] as e ON e.[EmployeeKey] = q.[EmployeeKey]GROUP BY rs.[EmployeeKey], CONCAT([FirstName], ' ', [LastName]), [SalesAmountQuota], DATEPART(year, [OrderDate]), DATEPART(quarter, [OrderDate])

ORDER BY rs.[EmployeeKey], [Year], [Quarter]

This is a lengthy way to do this and is a slightly different query that is showing sales quotas by quarter for all the sales repas for all the years, and is shown as you my forget subqueries. Go ahead and run this query on the left and compare it. Then filter it to just 2007 data.

Notice the INNER JOIN statement, the tables are being joined on three columns.Display the data with Excel conditional formatting and SSRS gauges.

IF you only INNER JOIN the FactResellerSales table and the FactSalesQuota

13

Page 14: Carson College of Business · Web view2016/10/02  · A different query approach will be shown later, but here is a correlated sub-query approach that should be within your reach

table on the EmployeeKey column you get something called data explosion. Go ahead and run the query on the left with the forst INNER JOIN On only on the EmployeeKey field. A good analyst always is careful that the joins are working as expected. Check your work!!!

First notice you went from 163 rows to 1717. Also notice the numbers are wrong. So if you get into a predicament like this where you are bringing in numbers from one table and matching them to aggregates in another table, realize that the data is at different levels of granularity.

USE [AdventureWorksDW2012]; SELECT e.[EmployeeKey], CONCAT([LastName], ', ', [FirstName]) as [Employee], [CalendarYear]as [Year], [CalendarQuarter] as [Qtr.], [SalesAmountQuota] as [Quota]

, (SELECT SUM([SalesAmount]) FROM [dbo].[FactResellerSales] as sWHERE YEAR([OrderDate]) = 2007 AND DATEPART(quarter, s.[OrderDate]) = q.[CalendarQuarter] AND s.EmployeeKey =q.[EmployeeKey] ) as [Actual]

FROM [dbo].[DimEmployee] as eINNER JOIN [dbo].[FactSalesQuota] as q ON q.[EmployeeKey] = e.[EmployeeKey]WHERE [CalendarYear] = 2007

ORDER BY e.[EmployeeKey]

Ok so here is the sub-query that pulls together data from 3 tables.First the quotas are displayed by quarter for each sales employee. Then the correlated sub-query adds a new column of aggregated data. Each additional (SELECT) statement is just adding a new column.

The sales quotas are by quarter and year, so the sub-query must match that same level of granularity (i.e. quarterly). The sub-query carefully specifies quarter, year, and employee, so that the data is aggregated correctly. So the WHERE statement here acts similar to the GROUP BY() statement.

14

Page 15: Carson College of Business · Web view2016/10/02  · A different query approach will be shown later, but here is a correlated sub-query approach that should be within your reach

USE [AdventureWorksDW2012]; SELECT e.[EmployeeKey], CONCAT([LastName], ', ', [FirstName]) as [Employee], [CalendarYear], SUM([SalesAmountQuota]) as [Quota]

, (SELECT SUM([SalesAmount]) FROM [dbo].[FactResellerSales]WHERE [dbo].[FactResellerSales].[EmployeeKey] = e.[EmployeeKey]) as [2007 Total $ales]

, (SELECT COUNT([SalesOrderNumber]) FROM [dbo].[FactResellerSales]WHERE [dbo].[FactResellerSales].[EmployeeKey] = e.[EmployeeKey]) as [2007 # TA Line Items]

FROM [dbo].[DimEmployee] as e INNER JOIN [dbo].[FactSalesQuota] as sON s.[EmployeeKey] = e.[EmployeeKey] WHERE [CalendarYear] = 2007

GROUP BY e.[EmployeeKey] , CONCAT([LastName], ', ', [FirstName]) ,[CalendarYear] ORDER BY e.[EmployeeKey]

Here we are using a second sub-query to bring in another column to see the number of line items that generated the total sales. (Hmm should this be number of invoices intead of number of line items?) How would you do this?

Let’s make some charts and analysis.

This concludes an introductory module into sub-queries. Please do additional research to discover more many usages of sub-queries.

15