the data warehousethe data warehousesmiertsc/4397cis/roiger_chapter06.pdf · the data warehousethe...

40
The Data Warehouse The Data Warehouse Chapter 6

Upload: others

Post on 16-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

The Data WarehouseThe Data Warehouse

Chapter 6

Page 2: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

6.1 Operational Databasesp

Page 3: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Data Modeling and Normalizationg

O O R l i hi• One-to-One Relationships• One-to-Many Relationships• Many-to-Many Relationships

Page 4: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Data Modeling and Normalizationg

• First Normal FormFirst Normal Form• Second Normal Form

Thi d N l F• Third Normal Form

Page 5: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Type IDYear

Make

Income Range

Customer ID

Vehicle - Type Customer

Figure 6.1 A simple entity-relationship diagram

Page 6: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

The Relational ModelThe Relational Model

Page 7: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Table 6.1a • Relational Table for Vehicle-Type

Type ID Make Year

4371 Chevrolet 19956940 Cadillac 20004595 Chevrolet 20014595 Chevrolet 20012390 Cadillac 1997

Page 8: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Table 6.1b • Relational Table for Customer

C t ICustomer IncomeID Range ($) Type ID

0001 70–90K 23900002 30–50K 43710003 70 90K 69400003 70–90K 69400004 30–50K 45950005 70–90K 2390

Page 9: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Table 6.2 • Join of Tables 6.1a and 6.1b

C t ICustomer IncomeID Range ($) Type ID Make Year

0001 70–90K 2390 Cadillac 19970002 30–50K 4371 Chevrolet 19950003 70 90K 6940 Cadillac 20000003 70–90K 6940 Cadillac 20000004 30–50K 4595 Chevrolet 20010005 70–90K 2390 Cadillac 1997

Page 10: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

6.2 Data Warehouse Designg

Page 11: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

The Data Warehouse

“A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile g , ,collection of data in support of management’s decision making process g g p(W.H. Inmon).”

Page 12: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Granularity

Granularity is a term used to describe the level of detail of stored information.

Page 13: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Dependent

ExternalData

pData Mart

Extract/Summarize Data

OperationalDatabase(s)

Decision Support SystemDataWarehouse

ETL Routine(Extract/Transform/Load)

IndependentData Mart

ReportData Mart

Figure 6.2 A data warehouse process model

Page 14: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Entering Data into the WarehouseEntering Data into the Warehouse

• Independent Data Mart• ETL (Extract, Transform, Load Routine)ETL (Extract, Transform, Load Routine)• Metadata

Page 15: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Structuring the Data Warehouse:Structuring the Data Warehouse: Two Methods

• Structure the warehouse model using the star schema

• Structure the warehouse model as a multidimensional arraymultidimensional array

Page 16: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

The Star SchemaThe Star Schema

• Fact Table• Dimension TablesDimension Tables• Slowly Changing Dimensions

Page 17: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Purchase Key Category1 Supermarket2 Travel & Entertainment

Purchase Dimension

3 A t & V hi l Time Dimension

.

...

3 Auto & Vehicle4 Retail5 Restarurant6 Miscellaneous

Time Key Month10 Jan

Time DimensionYearQuarterDay

15 2002...

.

.

.

.

.

.

.

.

.

.

.

.. .

Cardholder Key Purchase Key1 2

Fact TableAmountTime KeyLocation Key

101 14.5015 4 115 8.251 2 103 22.40...

.

.

.

.

.

.

.

.

.

.

.

.

Location Key Street10 425 Church St

Location DimensionRegionStateCity

SCCharleston 3..

.

...

.

...

GenderMale

.Female

Income Range50 - 70,000

.70 - 90,000

Cardholder Key Name1 John Doe

. .2 Sara Smith

Cardholder Dimension

Figure 6.3 A star schema for credit card purchases

. . .. ...

.

...

.

.

Page 18: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

The Multidimensionality of the St S hStar Schema

Page 19: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Cardholder Ci

1 2,10)

PurchaseKey A(C i,1

,2

Time Key

Location Key

Figure 6.4 Dimensions of the fact table shown in Figure 6.3

Page 20: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Additional Relational SchemasAdditional Relational Schemas

• Snowflake Schema• Constellation Schema

Page 21: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Time Key Month5 Dec

Time DimensionYearQuarterDay

431 20018 Jan 13 2002

10 J 15 2002Promotion Key Description

Promotion DimensionCost

1 t h 15 25...

.

.

.

.

.

.

.

.

.

.

.

.

10 Jan 15 2002...

.

.

.

.

.

.

1 watch promo 15.25

Purchase DimensionPurchase Key Category

1 Supermarket2 Travel & Entertainment

Purchase Dimension

3 Auto & Vehicle4 Retail5 Restarurant

Cardholder Key Purchase Key1 2

Purchase Fact TableAmountTime KeyLocation Key

101 14.5015 4 115 8.25

6 Miscellaneous

Cardholder Key Promotion Key1 1

Promotion Fact TableResponseTime Key

5 Yes2 1 5 No

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

1 2 103 22.40...

.

.

.

.

.

.

.

.

.

Location Key Street5 425 Church St

Location DimensionRegionStateCity

SCCharleston 3...

.

.

.

.

.

.

.

.

.

.

.

.

GenderMale

.

.

Female

Income Range50 - 70,000

.

.

70 - 90,000

Cardholder Key Name1 John Doe

.

...

2 Sara Smith

Cardholder Dimension

Figure 6.5 A constellation schema for credit card purchases and promotions

. .. .

Page 22: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Decision Support: Analyzing theDecision Support: Analyzing the Warehouse Data

• Reporting Data• Analyzing Data• Knowledge Discoveryg y

Page 23: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

6 3 On-line Analytical Processing6.3 On-line Analytical Processing

Page 24: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

OLAP O iOLAP Operations

• Slice – A single dimension operation• Dice – A multidimensional operationDice A multidimensional operation• Roll-up – A higher level of generalization

ill d A l l f d il• Drill-down – A greater level of detail• Rotation – View data from a new perspective

Page 25: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Month = Dec.

Region = TwoCategory = Vehicle

Count = 110Amount = 6,720Region Two

Dec.

Sep.

Oct.

Nov.

May

Jun.

Jul.

Aug.

pM

onth

Mar.

Feb.

Apr.

y

Jan.

FourThreeTwo

Supe

rmar

ket

Mis

cella

neou

s

Res

taur

ant

Trav

el

Ret

ail

Vehi

cle

Category

RegionOne

wo

Figure 6.6 A multidimensional cube for credit card purchases

Category

Page 26: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Concept Hierarchy

A mapping that allows attributes to beA mapping that allows attributes to beviewed from varying levels of detail.

Page 27: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Region

State

CityCity

Street Address

Figure 6.7 A concept hierarchy for location

Page 28: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Month = Oct./Nov/Dec.Category = SupermarketRegion = OneCategory = Supermarket

Q4

Q2

Q3

Tim

e

On

FourThreeTwoQ1

Supe

rmar

ket

isce

llane

ous

Res

taur

ant

Trav

el

Ret

ail

Vehi

cle

RegionOne

S

Mi

Category

Figure 6.8 Rolling up from months to quarters

Page 29: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

6.4 Excel Pivot Tables for Data AnalysisAnalysis

Page 30: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Creating a Simple Pivot Table

Page 31: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Figure 6.9 A pivot table template

Page 32: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Figure 6.10 A summary report for income range

Page 33: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Figure 6.11 A pie chart for income range

Page 34: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

i bl f h iPivot Tables for HypothesisTestingTesting

Page 35: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Figure 6.12 A pivot table showing age and credit card insurance choice

Page 36: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Figure 6.13 Grouping the credit card promotion data by age

Page 37: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Figure 6.14 PivotTable Layout Wizard

Page 38: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Creating a Multidimensional Pivot Table

Page 39: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Watch Promo = No

Magazine Promo = YesLife Insurance Promo = Yes

No

ch P

rom

o

YesWat

c

No

agazineo

No

Yes

Yes

Life Insurance Promo

MagaPromo

Figure 6.15 A credit card promotion cube

Page 40: The Data WarehouseThe Data Warehousesmiertsc/4397cis/Roiger_Chapter06.pdf · The Data WarehouseThe Data Warehouse Chapter 6. 6.1 Operational Databases. ... Data Warehouse ETL Routine

Figure 6.16 A pivot table with page variables for credit card promotions