business intelligence reminds on data...

22
BUSINESS INTELLIGENCE Reminds on Data Warehousing (details at the Decision Support Database course) Data Sciencce & Business Informatics Degree

Upload: others

Post on 25-Apr-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,

BUSINESS INTELLIGENCE

Reminds on Data Warehousing

(details at the Decision Support Database course)

Data Sciencce & Business Informatics Degree

Page 2: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,

BI Architecture2

Laboratory of Data Science

Page 3: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,

Star-schema datawarehouse

A fact table with star-schema dimension tables only

3

time_key

day

day_of_the_week

month

quarter

year

time

branch_key

street

city

province

country

branch

Sales Fact Table

time_key

item_key

employee_key

branch_key

units_sold

dollars_sold

dollars_costMeasures

item_key

item_name

brand

type

supplier_type

item

employee_key

employee_name

employee_type

employee

Laboratory of Data Science

Page 4: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,

Snowflake-schema datawarehouse

A fact table with star-schema, snowflake and parent-child dimension tables

4

time_key

day

day_of_the_week

month

quarter

year

time

branch_key

street

city_key

branch

Sales Fact Table

time_key

item_key

employee_key

branch_key

units_sold

dollars_sold

dollars_costMeasures

item_key

item_name

brand

type

supplier_key

item

supplier_key

supplier_type

supplier

city_key

city

province

country

cityemp_key

emp_name

emp_boss_key

employees

Laboratory of Data Science

Page 5: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,

Which DBMS technology for DW?

Storage technology

Architecture

5

Laboratory of Data Science

Page 6: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,

Storage

RDBMS: record oriented structure

Laboratory of Data Science

6

Columnar: column oriented structure

Advantages:• Faster Scan

• Data Compression (e.g. State)

Page 7: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,

Storage

Correlation value-based database

Data cells contain the index to an order-set value

In-memory database

Data is stored in compressed format in main memory

Extraction-based system

Storage of attribute extracted from continuous data

flows (eg., web traffic, sensors)

...

Laboratory of Data Science

7

Page 8: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,

Architecture

Sequential

SQL query processing by a single

processor

Parallel

SQL query plan processing by a

multi-processor machine, with shared

memory

Distributed (Map-reduce)

SQL query processing distributed to

a set of independent machines

◼ Teradata SQL-MR, Hadoop HiveQL

Laboratory of Data Science

8

Page 9: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,

BI Architecture10

Laboratory of Data Science

Page 10: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,

K-dimensional cuboid11

Milk Bread … ... Pasta

Jan 13

… ...

Dec 13

Feb 13

Product.Category

Store.City

Time.Month

OrangeRome

PiseFlorence

An hyper-cube with K axes, with a level of some hierarchy at each axis. A

cell of the cuboid contains the values of metrics for the conditions given by

the cell coordinates.

Lucca

Laboratory of Data Science

Page 11: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,

Cube navigation by different users12

product

Branch manager look at sales

of his/her stores

for any product and any period

Product managers look at sales

of some products

in any period and in any market

Finance manager look at sales

of a period compared to the previous period

for any product and any market

tim

e

Laboratory of Data Science

Page 12: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,

Cuboids in SQL

13

SELECT L.city, I.brand, T.month, SUM(dollars_sold)FROM fact AS F, location AS L, time AS T, item AS IWHERE F.location_key = L.location_key AND

F.time_key = T.time_key AND F.item_key = I.item_key

GROUP BY L.city, I.brand, T.month

MeasureAggregate

Star-Join

Hierarchy levels

Order or

pivoting

Laboratory of Data Science

Page 13: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,

Roll-up

How many cuboids?14

Product

Time

All

Time

Time

Product

All

All

Drill-Down

Roll-up

Drill-Down

Drill-Down

Roll-up

Laboratory of Data Science

Page 14: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,

15

Data Cube (extended cube, hypercube)

Total annual sales

of TVs in U.S.A.Date

Cou

ntr

y

*

*TV

VCRPC

1Qtr 2Qtr 3Qtr 4Qtr

U.S.A

Canada

Mexico

*

Laboratory of Data Science

Page 15: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,

Data cube in SQL Server16

SELECT L.city, I.brand, T.month, SUM(dollars_sold)FROM fact AS F, location AS L, time AS T, item AS IWHERE F.location_key = L.location_key AND

F.time_key = T.time_key AND F.item_key = I.item_key

GROUP BY CUBE(L.city, I.brand, T.month)

MeasureAggregate

Star-Join

Hierarchy levels

Order or

pivoting

GROUP BY ROLLUP(L.city, I.brand, T.month) - all initial subsequences of the group-by attributes

Laboratory of Data Science

Page 16: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,

Slice and Dice17

Product

Time

Slice

Product

Time

Product =A,B,C

Laboratory of Data Science

Page 17: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,

Slice in SQL Server18

SELECT L.city, I.brand, T.month, SUM(dollars_sold)FROM fact AS F, location AS L, time AS T, item AS IWHERE F.location_key = L.location_key AND

F.time_key = T.time_key AND F.item_key = I.item_key AND T.year = 2016

GROUP BY CUBE(L.city, I.brand, T.month)

MeasureAggregate

Star-Join

Hierarchy levels

Order or

pivoting

Slice

Laboratory of Data Science

Page 18: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,

Star-join executions in SQL Server

Star-join optimization

automatically detected (vs to be setup in Oracle)

Bitmap join indexes

not available (vs available in Oracle)

Columnstore indexes (since SQL Server 2012)

see docs

◼ http://msdn.microsoft.com/en-us/library/gg492088.aspx

Example (on a copy of sales_fact)◼ CREATE CLUSTERED COLUMNSTORE INDEX cci_sales ON sales_fact_copy

Laboratory of Data Science

19

Page 19: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,

Materialized views in SQL Server

Create a (normal) viewCREATE VIEW schema.view_name

WITH SCHEMABINDING -- binds the reference tables schema

AS SELECT …, COUNT_BIG(*) as n

FROM …

WHERE …

GROUP BY …

Make an index on itCREATE UNIQUE CLUSTERED INDEX index_name

ON schema.view_name (attributes);

They are called Indexed Views

http://msdn.microsoft.com/en-us/library/ms191432.aspx

20

Laboratory of Data Science

Page 20: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,

Materialized views in SQL Server21

Laboratory of Data Science

Estimated execution plan Actual execution plan

Page 21: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,

Analytic SQL

Aggregate functions

MIN, MAX, SUM, COUNT, AVG, VAR, VARP, STDEV, STDEVP

Ranking functions

RANK, DENSE_RANK, NTILE, ROW_NUMBER

Analytic functions (since SQL Server 2012)

CUME_DIST, LEAD, FIRST_VALUE, PERCENTILE_CONT, LAG,

PERCENTILE_DISC, LAST_VALUE, PERCENT_RANK

Laboratory of Data Science

22

Page 22: BUSINESS INTELLIGENCE Reminds on Data Warehousingdidawiki.di.unipi.it/lib/exe/fetch.php/mds/lbi/lds.09.dwandolap.pdf · Snowflake-schema datawarehouse A fact table with star-schema,

Exercise

For each product family, store country, and quarter,

show the percentage of sales over the total of the

product family

Laboratory of Data Science

23