![Page 1: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/1.jpg)
BUSINESS INTELLIGENCE
Data Analysis using SQL
![Page 2: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/2.jpg)
Cube, A. Albano
BI Architecture
2
![Page 3: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/3.jpg)
Cube, A. Albano 3
DATA ANALYSIS USING SQL
A Data warehouse is all about getting answers to business questions, in the form of reports.
Reports must communicate pertinent information clearly and concisely.
Three ways to present information.
Traditional reports.
Pivot tables.
Charts.
There are several kinds of reporting tools on the market.
Good reporting is imperative: Even the best schema design cannot guarantee success if answers are not delivered with useful reports.
![Page 4: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/4.jpg)
Cube, A. Albano
Cuboids in SQL
SELECT L.city, I.brand, T.month, SUM(dollars_sold) FROM fact AS F, location AS L, time AS T, item AS I WHERE F.location_key = L.location_key AND F.time_key =
T.time_key AND F.item_key = I.item_key GROUP BY L.city, I.brand, T.month
MeasureAggregate
Star-Join
Hierarchy levels
Order or pivoting
Business Intelligence Lab
![Page 5: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/5.jpg)
Cube, A. Albano
Roll-up
How many cuboids?
Product
Time
All
Time
Time
Product
All All
Drill-Down
Roll-up
Drill-Down
Drill-Down
Roll-up
Business Intelligence Lab
![Page 6: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/6.jpg)
Cube, A. Albano
6
Data Cube
(extended cube, hypercube)
Total annual sales of TVs in U.S.A. Date
Cou
ntry
*
* TV
VCR PC
1Qtr 2Qtr 3Qtr 4Qtr U.S.A
Canada
Mexico
*
![Page 7: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/7.jpg)
Cube, A. Albano
Data cube in SQL Server
SELECT L.city, I.brand, T.month, SUM(dollars_sold) FROM fact AS F, location AS L, time AS T, item AS I WHERE F.location_key = L.location_key AND F.time_key =
T.time_key AND F.item_key = I.item_key GROUP BY CUBE(L.city, I.brand, T.month)
MeasureAggregate
Star-JoinHierarchy levels
Order or pivoting
GROUP BY ROLLUP(L.city, I.brand, T.month) - all initial subsequences of the group-by attributes
Business Intelligence Lab
![Page 8: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/8.jpg)
Cube, A. Albano
Slice and Dice
8
Product
Time
Slice
Product
Time
Product =A,B,C
Business Intelligence Lab
![Page 9: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/9.jpg)
Cube, A. Albano
Slice in SQL Server
9
SELECT L.city, I.brand, T.month, SUM(dollars_sold) FROM fact AS F, location AS L, time AS T, item AS I WHERE F.location_key = L.location_key AND F.time_key =
T.time_key AND F.item_key = I.item_key AND T.year = 2016 GROUP BY CUBE(L.city, I.brand, T.month)
MeasureAggregate
Star-Join
Hierarchy levels
Order or pivoting
Slice
Business Intelligence Lab
![Page 10: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/10.jpg)
Cube, A. Albano
Star-join executions in SQL Server
• Star-join optimization • automatically detected (vs to be setup in Oracle)
• Bitmap join indexes • not available (vs available in Oracle)
• Columnstore indexes (since SQL Server 2012)
• see docs • http://msdn.microsoft.com/en-us/library/gg492088.aspx
• Example (on a copy of sales_fact) • CREATE CLUSTERED COLUMNSTORE INDEX cci_sales ON sales_fact_copy
Business Intelligence
10
![Page 11: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/11.jpg)
Cube, A. Albano 11
REPORTING TOOLS
Product Type Wine ...
Drag and drop
Year 2010 ...
Total Revenue 56,000 ...
Total Cost 45,000 ...
Total Margin 11,000 ...
![Page 12: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/12.jpg)
Cube, A. Albano 12
SIMPLE REPORTS WITH SQL
Sales(Customer, Product, Brand, Date, City, Region, Area, Quantity, Revenue, Margin)
Slice
Rollup & drill-down
Pivoting
![Page 13: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/13.jpg)
Cube, A. Albano
AIRLINE COMPANIES: DATA ANALYSIS
13
SELECT FlightCode, CompanyName, Class, Time, SUM(UnoccupiedSeats) As TotalUnoccupiedSeats FROM FlightClassSeats f, DepartureTime t, Company c WHERE f.DepartureTimeFK = t.DepartureTimePK AND f.CompanyFK = c.CompanyPK and year = 2015 GROUP BY FlightCode, CompanyName, Class, Time,
![Page 14: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/14.jpg)
Cube, A. Albano
AIRLINE COMPANIES: DATA ANALYSIS
14
SELECT FlightCode, CompanyName, Class, City, SUM(UnoccupiedSeats) As TotalUnoccupiedSeats FROM FlightClassSeats f, DepartureTime t, City c WHERE f.DepartureTimeFK = t.DepartureTimePK AND f.DepartureCityFK = c.CityPK
AND Class=‘Business’ AND year = 2015 GROUP BY FlightCode, CompanyName, Class, City
SELECT year, month, country, SUM(UnoccupiedSeats) As TotalUnoccupiedSeats, SUM(Revenue) As TotalRevenue
FROM FlightClassSeats f, DepartureTime t, City c WHERE f.DepartureTimeFK = t.DepartureTimePK AND f.DestinationCityFK= c.CityPK
AND CompanyName=‘Alitalia’ GROUP BY year, month, country
![Page 15: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/15.jpg)
Cube, A. Albano 15
SIMPLE REPORTS WITH SUBTOTALS
Sales(Customer, Product, Brand, Date, City, Region, Area, Quantity, Revenue, Margin)
![Page 16: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/16.jpg)
Cube, A. Albano 16
SIMPLE REPORTS WITH SUBTOTALS IN SQL
Sales(Customer, Product, Brand, Date, City, Region, Area, Quantity, Revenue, Margin)
![Page 17: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/17.jpg)
Cube, A. Albano 17
SQL: OPERATOR ROLLUP
ROLLUP compute one path through lattice
(A,B) (A) subtotals () totals
GROUP BY ROLLUP(A,B) Important the (attributes order) Semantics: Union of 3 groupings:
![Page 18: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/18.jpg)
Cube, A. Albano 18
SIMPLE REPORTS WITH SUBTOTALS: ROLLUP
2
3
1 (Brand, Product)
(Brand)
( )
![Page 19: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/19.jpg)
Cube, A. Albano 19
SIMPLE REPORTS WITH SUBTOTALS: CROSS-TABULATION
![Page 20: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/20.jpg)
Cube, A. Albano 20
SQL: OPERATOR CUBE
CUBE compute a sub-lattice
(A,B) (A) subtotals (B) subtotals () totals
GROUP BY CUBE(A,B) Important: the (attributes order) doesn’t matter Semantics: Union of 4 groupings:
![Page 21: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/21.jpg)
Cube, A. Albano 21
SIMPLE REPORTS WITH SUBTOTALS: CUBE
2
1
4
3
(Brand, Product)(Brand)
(Product)
( )
![Page 22: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/22.jpg)
Cube, A. Albano 22
MODERATELY DIFFICULT REPORTS WITH COMPARISON BETWEEN COLUMNS (VARIANCE REPORT)
A product may have been sold in one year, but not in the other !
Delta = 100 x (Revenue2009 - Revenue2008)/Revenue2009
![Page 23: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/23.jpg)
Cube, A. Albano 23
JOIN
A B C
1 a x
3 c y
A B C
1 a x
2 b y
SELECT * FROM R NATURAL JOIN S;
A B
1 a
2 b
3 c
R
A C
1 x
3 y
5 z
S
A C
1 x
2 y
S
A B
1 a
2 b
R
SELECT * FROM R NATURAL JOIN S;
![Page 24: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/24.jpg)
Cube, A. Albano 24
JOIN
SELECT * FROM R FULL JOIN S USING (A);
A B C
1 a x
2 b
3 c y
5 z
SELECT * FROM R NATURAL FULL JOIN S;
A B
1 a
2 b
3 c
R
A C
1 x
3 y
5 z
S
![Page 25: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/25.jpg)
Cube, A. Albano 25
SOLUTION USING VIEWS
CREATE VIEW VRevenue09 AS SELECT Brand, Product, SUM(Revenue) AS Revenue2009 FROM Sales WHERE Year(Data) = 2009 GROUP BY Brand, Product;
CREATE VIEW VRevenue08 AS SELECT Brand, Product, SUM(Revenue) AS Revenue2008 FROM Sales WHERE Year(Data) = 2008 GROUP BY Brand, Product;
SELECT VRevenue09.Brand AS Brand, VRevenue09.Product AS Product, Revenue2009, Revenue2008, CASE WHEN Revenue2009 IS NULL THEN -100 WHEN Revenue2008 IS NULL THEN 100 ELSE ROUND(100*(Revenue2009 - Revenue2008)/Revenue2009) END AS Delta FROM VRevenue09 FULL JOIN VRevenue08 USING(Brand, Product) ORDER BY Brand, Product
![Page 26: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/26.jpg)
Cube, A. Albano 26
SOLUTION USING ‘WITH’ CLAUSE
![Page 27: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/27.jpg)
Cube, A. Albano 27
EXERCISE: MODERATELY DIFFICULT REPORTS WITH COMPARISON ACROSS AGGREGATION LEVELS
Sales(Customer, Product, Brand, Date, City, Region, Area, Quantity, Revenue, Margin)
![Page 28: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/28.jpg)
Cube, A. Albano 28
VERY DIFFICULT REPORTS WITHOUT ANALYTIC SQL: RUNNING TOTALS Sales(Customer, Product, Brand, Date, City, Region, Area, Quantity, Revenue, Margin)
![Page 29: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/29.jpg)
Cube, A. Albano 29
VERY DIFFICULT REPORTS WITHOUT ANALYTIC SQL: RANK
Which are the best 5 products sold in Toscana?
Sales(Customer, Product, Brand, Date, City, Region, Area, Quantity, Revenue, Margin)
![Page 30: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/30.jpg)
Cube, A. Albano
Sales(Customer, Product, Brand, Date, City, Region, Area, Quantity, Revenue, Margin) We want to partition the customers into four groups: – Top5%, with 5% of customers with the highest amount of revenues. – Next15%, with 15% of other customers with the highest amount of revenues. – Middle30%, with 30% of other customers with the highest amount of revenues. – Bottom50%, with 50 % of the customers with the lowest amount of revenues. For each customer group we want to know their number, and the percentage of the sum of their revenues compared to total revenue of all sales.
30
VERY DIFFICULT REPORTS WITHOUT ANALYTIC SQL
Group Number of customers
Percent of total revenue
Top5% 1 20 Next15% 3 50 Middle30% 6 20 Bottom50% 10 10
![Page 31: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/31.jpg)
Cube, A. Albano 31
VERY DIFFICULT REPORTS WITHOUT ANALYTIC SQL Sales(Customer, Product, Brand, Date, City, Region, Area, Quantity, Revenue, Margin)
Monthly Total Revenue
Moving Average Monthly Total Revenue (Window 3 or 5)
![Page 32: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/32.jpg)
Cube, A. Albano 32
ANALYTIC SQL
Syntax
![Page 33: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/33.jpg)
Cube, A. Albano
Intuition: Partition By
33
![Page 34: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/34.jpg)
Cube, A. Albano
Intuition: without Partition By
34
![Page 35: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/35.jpg)
Cube, A. Albano 35
ANALYTIC SQL
Execution order
![Page 36: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/36.jpg)
Cube, A. Albano 36
RANK
SELECT Customer, Product, SUM(Revenue) AS TotalRev, FROM Sales WHERE Customer IN (‘C1’, ‘C2’) GROUP BY Customer, Product ORDER BY TotalRev DESC;
Rank 7 6 5 4 3 2 1
Customer Product TotalRev C1
P1 1100
C1 P3 1000
C2 P1
1000 C2
P2 900
C2 P4
800 C1
P2 200
C2 P3 200
RANK ( ) OVER (ORDER BY SUM(Revenue) ) AS Rank
![Page 37: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/37.jpg)
Cube, A. Albano 37
RANK WITH PARTITIONS
SELECT Customer, Product, SUM(Revenue) AS TotalRevenue, FROM Sales WHERE Customer IN (‘C1’, ‘C2’) GROUP BY Customer, Product;
RANK ( ) OVER (PARTITION BY Customer ORDER BY SUM(Revenue) DESC) AS Rank
Customer Product TotalRev C1
P1 1100
C1 P3 1000
C1 P2
200 C2
P1 1000
C2 P2
900 C2
P4 800
C2 P3 200
Rank 1 2 3 1 2 3 4
![Page 38: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/38.jpg)
Cube, A. Albano
RANK vs DENSE_RANK vs ROW_NUMBER
• Consider the values in the ascending order • (10; 20; 20; 30; 30; 40)
• RANK() of a value is 1 + the number of values that strictly precedes it • ranks (1; 2; 2; 4; 4; 6)
• DENSE_RANK() of a value is 1 + the number of distinct values that precedes it • dense ranks (1; 2; 2; 3; 3; 4)
• PERCENT_RANK() is (RANK() – 1) / (TotalRows – 1) • percent ranks (0; 0.2; 0.2; 0.6; 0.6; 1)
• ROW_NUMBER() is the row number • row numbers (1; 2; 3; 4; 5; 6)
• CUME_DIST() is ROW_NUMBER() / TotalRows • cumulative distribution (0.16; 0.33; 0.5; 0.67; 0.83; 1)
• NTILE(3) is the tertile of the value (3 is a parameter, can be any integer) • tertiles (1; 1; 2; 2; 3; 3)
38
![Page 39: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/39.jpg)
Cube, A. Albano
OTHER ANALYTIC FUNCTIONS
• COUNT(), SUM(), AVG(), MIN(), MAX() … and all standard aggregates
Sales(Brand, Product, Revenue)
WITH s AS (SELECT Brand, Product,SUM(Revenue) AS prodRevenue FROM sales GROUP BY Brand, Product) SELECT Brand, Product, prodRevenue,
100 * prodRevenue / SUM(prodRevenue) OVER(PARTITION BY Brand) AS PctOverBrand, 100 * prodRevenue / SUM(prodRevenue) OVER() AS PctOverTot
FROM s
39
Brand Product prodRevenue PctOverBrand PctOverTot
B1 P1 40 40 20 B1 P2 60 60 30 B2 P3 20 20 10 B2 P4 80 80 40
SELECT Brand, Product, SUM(Revenue) AS prodRevenue, 100 * SUM(Revenue) / SUM(SUM(Revenue)) OVER(PARTITION BY Brand) AS PctOverBrand, 100 * SUM(Revenue) / SUM(SUM(Revenue)) OVER() AS PctOverTot
FROM sales GROUP BY Brand, Product
![Page 40: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/40.jpg)
Cube, A. Albano
OTHER ANALYTIC FUNCTIONS
• LAG(attribute, offset, default) and LEAD(attribute, offset, default)
• The value of attribute in offset rows before (LAG) or after (LEAD)
Store Year TotalRev S1
2015 1100
S1 2014 1000
S1 2013
200 S2
2015 1000
S2 2014
900 S2
2013 800
S2 2012 200
PrevRev 1000 200 0 900 800 200 0
SELECT Store, Year, TotalRev, LEAD(TotalRev, 1, 0) OVER(PARTITION BY Store ORDER BY Year DESC) AS PrevRev, FROM TotalSales ORDER BY Store, Year
![Page 41: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/41.jpg)
Cube, A. Albano 41
WINDOWING
Windowing functions are used to compute cumulative, moving and centered aggregates.
Window functions add a value to each row that depends on the other rows in the window.
Examples of window specifications:
ROWS UNBOUNDED PRECEDING. The window begin with the first record of the partition and ends with the current record.
ROWS BETWEEN ... PRECEDING AND ... FOLLOWING. The window include all records that fall within the given offset.
![Page 42: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/42.jpg)
Cube, A. Albano 42
WINDOWING EXAMPLE
Sales(Customer, Product, Brand, Date, City, Region, Area, Quantity, Revenue, Margin)
![Page 43: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/43.jpg)
Cube, A. Albano 43
WINDOWING EXAMPLE
Sales(Customer, Product, Brand, Date, City, Region, Area, Quantity, Revenue, Margin)
![Page 44: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/44.jpg)
Cube, A. Albano 44
EXAMPLE
A moving average of total revenue, with a moving window of 3 months, by month.
Result visualization in Oracle...
Sales(Customer, Product, Brand, Date, City, Region, Area, Quantity, Revenue, Margin)
![Page 45: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/45.jpg)
Cube, A. Albano
ANALYTIC FUNCTIONS IN PENTAHO DATA INTEGRATION
45
![Page 46: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/46.jpg)
Cube, A. Albano
OR CONNECT TO SQL SERVER 2014 ON KDD.DI.UNIPI.IT LOGIN: sobigdata PWD: pisa
46
![Page 47: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/47.jpg)
Cube, A. Albano 47
SUMMARY
SQL has been extended for OLAP operations, because of intensive data warehouse applications during the last decade.
Make sure you understand SQL. It is much more than syntax.
SQL is not select-from-where only.
Grouping and aggregation is a major part of SQL.
![Page 48: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/48.jpg)
Cube, A. Albano
Open Lab
• Redo exercises on ETL using SQL queries instead
48
![Page 49: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/49.jpg)
Cube, A. Albano
Exercise
• FoodMart data mart
• Write a SQL query that returns all constant customers in June 1998
• Constant: with at least two baskets per month for at least two months in the last four months.
49
![Page 50: BUSINESS INTELLIGENCE Data Analysis using SQLdidawiki.cli.di.unipi.it/lib/exe/fetch.php/bdd-infuma/i-analytics-sqlreporting.pdfCube, A. Albano 3 DATA ANALYSIS USING SQL A Data warehouse](https://reader033.vdocument.in/reader033/viewer/2022042020/5e7780630c4a6165dd483492/html5/thumbnails/50.jpg)
Cube, A. Albano
Exercise
WITH salesagg AS ( SELECT customer_id, COUNT(DISTINCT s.time_id) AS npurchases FROM sales_fact s, time_by_day t WHERE s.time_id = t.time_id and
(t.the_year*12 + t.month_of_year) BETWEEN 1998*12+6-3 AND 1998*12+6 GROUP BY customer_id, the_year, month_of_year
) SELECT customer_id FROM salesagg WHERE npurchases > 1 GROUP BY customer_id HAVING Count(*) > 1
50