mis 451 building business intelligence systems

Post on 13-Jan-2016

27 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

MIS 451 Building Business Intelligence Systems. Logical Design (1). Project Planning. Requirements Analysis. Logical Design. Physical Design. Data Staging. Data Analysis (OLAP). Introduction to Dimensional Modeling. - PowerPoint PPT Presentation

TRANSCRIPT

MIS 451

Building Business Intelligence Systems

Logical Design (1)

2

Project Planning

Requirements Analysis

Physical Design

Logical Design

Data Staging

Data Analysis (OLAP)

3

Introduction to Dimensional Modeling

Dimensional Modeling is a DW logical design technique that seeks to present data in a standard framework that is intuitive for data access and allows for high performance data access.

Intuitive: easy to write SQL High performance: high performance SQL

4

Customer

Places

1

Order

M

Contain

1

OrderLine

MOrder

M

Product

1

Belong to

M

ProductCategory

1

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

ER Model

Dimensional Model (Star Schema)

For detailed information, please refer handout 1.

5

Introduction to Dimensional Modeling

Analytical Report: 2-dimension January sales report by customer state and product category

Query: list sales in Jan. by customer state and product category?

6

Introduction to Dimensional Modeling

Query based on ER Model:

 

Select State, PCName, SUM(Price*Quantity)

From OrderLine OL, Customer C, Product_Category PC, Product P, Order O

Where OL.OID = O.OID and OL.PID = P.PID and O.CID = C.CID and to_char(O.OrderDate,’MON’) = ’JAN’ and P.PCID = PC.PCID

Group by State, PCName

Join: 5 tables

Query based on Dimensional Model:

 

Select State, PCName, SUM(Sales)

From Sales S, Customer C, Product P, Time T

Where S.Time_ Key = T.Time_Key and S.Product_ Key = P.Product_Key and S.Customer_Key = C.Customer_Key and T.Month= ’JAN’

Group by State, PCName

Join: 4 tables

 

7

Fact and Dimension

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

Fact table

Dimension table

8

Fact and Dimension

There are two types of tables in dimensional modeling: Fact table: attributes in fact tables are

measurements for analysis or contents in reports.

Dimension table: attributes in dimension tables are constraints for the measurements or headers in reports.

Dimensions Facts

9

Facts and Dimensions

Criteria Fact Attributes Dimension Attributes

Purpose Measurements for analysis Constraints for the measurements

Reporting use Report content Row or column report headers

Data type Most facts are numeric and additive. There are semi-additive or no-additive facts.

Textual, descriptive

Size Larger number of records Smaller number of records

10

Facts and Dimensions

How to identify facts and dimensions? Requirements Analysis:

Analytical requirements: Marketing managers want to know sales performance for different product category in different states?

Information requirements: quantity of product sold, sales amount, product category, and customer states

ER Model

11

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

12

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

F1: Calculation

F: refers to special considerations for fact table or special type of fact table

13

F1: Calculation

Normalization in RDB 1NF 2NF 3NF

Non-volatile property of data warehouse enables DW design to resist normalization and improve query performance.

14

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

D1: Slowly changing dimension

D: refers to special considerations for dimension table or special type of dimension table

15

D1: Slowly changing dimension

Values of attributes in dimension tables may evolve over time. For example, customers moved from one city to another city.

CID CName State City

101 Jon Arizona Tucson

102 Tom Arizona Tucson

103 Mark Arizona Phoenix

Tom moved from Tucson to Phoenix

Phoenix

16

D1: Slowly changing dimension There are three ways to handle slowly changing dimension. Method 1: Overwrite old values with new values

CID CName State City

101 Jon Arizona Tucson

102 Tom Arizona Tucson

103 Mark Arizona Phoenix

CID CName State City

101 Jon Arizona Tucson

102 Tom Arizona Phoenix

103 Mark Arizona Phoenix

17

D1: Slowly changing dimension

Drawbacks of method 1:

Historical information is totally lost.

We will never know that customer 102 lived in Tucson before.

Moreover, when listing sales by city, all the sales of customer 102 will be counted as part of Phoenix sales, although 102 was in Tucson before.

18

D1: Slowly changing dimension Method 2: Add a new attribute to record current value of the changing attribute.

CID CName State City

101 Jon Arizona Tucson

102 Tom Arizona Tucson

103 Mark Arizona Phoenix

CID CName State Original City Current City

101 Jon Arizona Tucson Tucson

102 Tom Arizona Tucson Phoenix

103 Mark Arizona Phoenix Phoenix

19

D1: Slowly changing dimension

Drawbacks of method 2:

Only partial Historical information (original & current) is kept.

Considering that customer 102 moved from Tucson to Flagstaff then to Phoenix, the customer information of customer 102 only includes Tucson and Phoenix.

20

D1: Slowly changing dimension Method 3: Add a record whenever a dimension attribute changes.

CID CName State City

101 Jon Arizona Tucson

102 Tom Arizona Tucson

103 Mark Arizona Phoenix

21

D1: Slowly changing dimension

Method 3 keep all the information. However,

Is there any problem?

22

D1: Slowly changing dimension Method 4: warehouse key + method 3 Warehouse key is a sequence of non-negative integers served as primary keys of tables in data warehouse.

CID CName State City

101 Jon Arizona Tucson

102 Tom Arizona Tucson

103 Mark Arizona PhoenixWarehouse key

23

D1: Slowly changing dimension

Why warehouse key is needed in data warehouse?

Solve slowly changing dimension problem

Compared with natural keys (i.e., primary keys of tables in RDB, such as CID of customer table), warehouse keys have high join performance.

24

D1: Slowly changing dimension

Warehouse key

Primary keys in dimensional tables are warehouse keys.

Primary key in fact table is a collection of warehouse keys of all/part of its associated dimensions.

25

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

D1: Slowly changing dimension

Notation: Primary key

26

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

D2: Time Dimension

D: refers to special considerations for dimension table or special type of dimension table

27

D2: Time Dimension Data warehouse needs an explicit time

dimension table instead of just a time attribute (e.g, ORDERDATE).

Besides the time attribute, time dimension table includes the following additional attributes:

Day_of_week (1-7); Day_number_in_month (1-31); Day_number_in_year (1-365) Week_number (1-52); month (1-12), Quarter (1-4) Holiday_flag (y/n) Fiscal_quarter, Fiscal_year

28

D2: Time Dimension

Time dimension can:

Save computation effort and improve query performance

Complex queries regarding calendar calculation are hidden from end users of data warehouse.

29

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

D3: Snowflake

D: refers to special considerations for dimension table or special type of dimension table

30

D3: Snowflake

PRODUCT_CATEGORY# PRODUCT_CATEGORY_KEY* PCID* PCNAME

CUSTOMERTIME

SALES

PRODUCT# PRODUCT_KEY* PID* PNAME* PRODUCT_CATEGORY_KEY

REFERECEREFERENCED BY

REFERENCE

REFERENCED BY

REFERENCE

REFERENCED BY

REFERENCE

REFERENCED BY

Snowflake structure

31

D3: Snowflake

Snowflake structure should be avoided in data warehouse design

Tradeoff of avoiding snowflake

Advantage: improve query performance

Disadvantage: require more storage space

top related