best practices in data warehousing design and · pdf fileusing the best practices in any field...

32
Copyright ' 2003, SAS Institute Inc. All rights reserved. Best Practices in Data Warehousing Design and Implementation Rein Mertens SAS Netherlands

Upload: dinhdieu

Post on 28-Mar-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Copyright © 2003, SAS Institute Inc. All rights reserved.

Best Practices inData WarehousingDesign and ImplementationRein MertensSAS Netherlands

Copyright © 2003, SAS Institute Inc. All rights reserved. 2

Best Practices�

! A best practice is a technique or methodology that, through experience and research, has proven to reliably lead to a desired result. A commitment to using the best practices in any field is a commitment to using all the knowledge and technology at one's disposal to ensure success. The term is used frequently in the fields of health care, government administration, the education system, project management, hardware and software product development, and elsewhere.

! Using past experiences to plan future developments

Copyright © 2003, SAS Institute Inc. All rights reserved. 3

Data Warehousing approaches over time

! Use operational data direct? (�virtual dwh�)

! Monolithic data warehouse? (�big bang�)

! Data marts?

! Evolutionary approach? (�think big, start small�)

Copyright © 2003, SAS Institute Inc. All rights reserved. 4

Inventory

Product

Orders

Contacts

Billing

Operational sourcesystems

Departmental datawarehouses and

business unit specificdata marts

Marketing

Finance

Sales

Personal datamarts

External data

Metadata

WEB

WAP telephone

PDA

Workstation

WWW browser

Operational datastore

(b) An Information Architecture based on Data Martswithout an Enterprise Layer

Metadata

Metadata

?

Data MartsIndependent

Copyright © 2003, SAS Institute Inc. All rights reserved. 5

Data Warehousing in 2003 requires an Architected Environment! Supports all forms of information use:

� Suppliers/Organization/Customer/Enterprise� Query and reporting� Analytics / Data mining� Multi-dimensional analysis and OLAP

! Ensures end-to-end consistency:� Co-ordinates access to operational data� Foundation layer� Metadata support− Maintenance− User view

Copyright © 2003, SAS Institute Inc. All rights reserved. 6

Copyright © 2003, SAS Institute Inc. All rights reserved. 7

Data Warehousing in 2003

SAS

SAS

SAS

Staging

Step #1 - Extract and Transform from source data into Staging Area

SASSAS

SAS

Central Warehouse

Step #2 - Data QualityData ValidationSlowly changing dimensions

Finance

Customers

HCM

Suppliers

Step #3 - Transform into dimensional model

Excel

SAS

SAP

Oracle

PeopleSoft

Source

ETL StudioData Surveyors

ETL Studio transformationsData Quality plugins

More ETL flows

Copyright © 2003, SAS Institute Inc. All rights reserved. 8

Data Warehousing in 2003

SAS

SAS

SAS

Staging

SASSAS

SAS

Central Warehouse

Step #2 - Data QualityData ValidationSlowly changing dimensions

Finance

Customers

HCM

Suppliers

Step #3 - Transform into dimensional model

Excel

SAS

SAP

Oracle

PeopleSoft

Source

ETL Studio transformationsData Quality plugins

More ETL flows

Step #1 - Extract and Transform from source data into Staging Area

ETL StudioData Surveyors

Copyright © 2003, SAS Institute Inc. All rights reserved. 9

Working with operational data

! How do you know about your operational world?

! External metadata� Operational databases� Operational systems− May need navigational assistance

! (Near)realtime?

! Data Profiling!

Copyright © 2003, SAS Institute Inc. All rights reserved. 10

Data Analysis

Copyright © 2003, SAS Institute Inc. All rights reserved. 11

Data Warehousing in 2003

SAS

SAS

SAS

Staging

SASSAS

SAS

Central Warehouse

Step #2 - Data QualityData ValidationSlowly changing dimensions

Finance

Customers

HCM

Suppliers

Step #3 - Transform into dimensional model

Excel

SAS

SAP

Oracle

PeopleSoft

Source

ETL Studio transformationsData Quality plugins

More ETL flows

Step #1 - Extract and Transform from source data into Staging Area

ETL StudioData Surveyors

Copyright © 2003, SAS Institute Inc. All rights reserved. 12

The importance of staging areas

! Support an incremental approach

! A place for various data validation� Data cleansing� Data content validation� Data integrity� Key validation

! Slowly changing dimension management

Copyright © 2003, SAS Institute Inc. All rights reserved. 13

Copyright © 2003, SAS Institute Inc. All rights reserved. 14

SAS

SAS

SAS

Staging

SASSAS

SAS

Central Warehouse

Step #2 - Data QualityData ValidationSlowly changing dimensions

Finance

Customers

HCM

Suppliers

Step #3 - Transform into dimensional model

Excel

SAS

SAP

Oracle

PeopleSoft

Source

ETL Studio transformationsData Quality plugins

More ETL flows

Data Warehousing in 2003

Step #1 - Extract and Transform from source data into Staging Area

ETL StudioData Surveyors

Copyright © 2003, SAS Institute Inc. All rights reserved. 15

What is a central data model?

! Normalized relational information

! Foundation Layer

! Example: Orion Star sample data model

Copyright © 2003, SAS Institute Inc. All rights reserved. 16

ContinentContinent_Id: INTEGER

Continent_Name: CHARACTER(30

CountryCountry: CHARACTER(2)

Country_Name: CHARACTER(45)Population: NUMERIC(6)Office: CHARACTER(2)Dir: CHARACTER(3)Country_Id: INTEGERContinent_Id: INTEGER (FK)Country_Former_Nam: CHARACTER(45

CustomerCustomer_Id: INTEGER

Country: CHARACTER(2)Gender: CHARACTER(1)Personal_Id: CHARACTER(15)Customer_Name: CHARACTER(40)Customer_Firstname: CHARACTER(20Customer_Lastname: CHARACTER(30Birthday: DATECustomer_Address: CHARACTER(40)Street_Id: INTEGER (FK)Street_Number: CHARACTER(8)Customer_Type_Id: INTEGER (FK)

Customer_TypeCustomer_Type_Id: INTEGER

Customer_Type: CHARACTER(40)Customer_Group_Id: INTEGERCustomer_Group: CHARACTER(40

OrderOrder_Id: INTEGER

Employee_Id: INTEGER (FKCustomer_Id: INTEGER (FKOrder_Date: DATEDelivery_Date: DATEOrder_Type: INTEGER

Order_ItemOrder_Id: INTEGER (FK)Order_Item_No: INTEGER

Product_Id: INTEGER (FK)Amount: SMALLINTPrice: DECIMAL(12,2)Unit_Cost_Price: DECIMAL(12,2Promotion: DECIMAL(5,2)

OrganizationEmployee_Id: INTEGER

Org_Name: CHARACTER(40)Country: CHARACTER(2)Org_Level_Id: INTEGER (FK)Start_Date: DATEEnd_Date: DATEOrg_Ref_Id: INTEGER (FK)

Org_LevelOrg_Level_Id: INTEGER

Org_Text: CHARACTER(40)Price_List

Product_Id: INTEGER (FK)Start_Date: DATE

End_Date: DATEUnit_Cost_Price: DECIMAL(12,2)Unit_Sales_Price: DECIMAL(12,2

ProductProduct_Id: INTEGER

Product_Name: CHARACTER(45Supplier_Id: INTEGER (FK)Product_Level_Id: INTEGER (FK)Product_Level: NUMERIC(3)Product_Ref_Id: INTEGER (FK)

Street_CodeStreet_Id: INTEGER

Country: CHARACTER(2)Street_Name: CHARACTER(30Zip_Code: CHARACTER(10)From_Street_No: NUMERIC(8)To_Street_No: NUMERIC(8)City: CHARACTER(22)County: CHARACTER(25)State: CHARACTER(2)City_Id: INTEGERCounty_Id: INTEGERState_Id: INTEGER

SupplierSupplier_Id: INTEGER

Supplier_Name: CHARACTER(30)Street_Id: INTEGER (FK)Supplier_Address: CHARACTER(30Supplier_Street_Nu: NUMERIC(3)Country: CHARACTER(2)

Product_LevelProduct_Level_Id: INTEGER

Product_Level_Name: CHARACTER(30

StaffEmployee_Id: INTEGER (FKStart_Date: DATE

Salary: DECIMAL(12,2)Birthday: DATEEnd_Date: DATEEmp_Hire_Date: DATEGender: CHARACTER(1)Emp_Term_Date: DATEJob_Title: CHARACTER(25)

PromotionProduct_Id: INTEGER (FK)Start_Date: DATE

End_Date: DATESales_Price: DECIMAL(12,2Promotion: DECIMAL(5,2)

Physical Normalized Model

Copyright © 2003, SAS Institute Inc. All rights reserved. 17

Metadata import

! Common Warehouse Metamodel (CWM)

! Meta Integration Model Bridge (MIMB)

! Registers lots of metadata at once

Copyright © 2003, SAS Institute Inc. All rights reserved. 18

Copyright © 2003, SAS Institute Inc. All rights reserved. 19

Data Warehousing in 2003

SAS

SAS

SAS

Staging

SASSAS

SAS

Central Warehouse

Step #2 - Data QualityData ValidationSlowly changing dimensions

Finance

Customers

HCM

Suppliers

Step #3 - Transform into dimensional model

Excel

SAS

SAP

Oracle

PeopleSoft

Source

ETL Studio transformationsData Quality plugins

More ETL flows

Step #1 - Extract and Transform from source data into Staging Area

ETL StudioData Surveyors

Copyright © 2003, SAS Institute Inc. All rights reserved. 20

Working with dimensional models

! Purpose-oriented data mart

! Dimension tables

! Fact tables with measures

Copyright © 2003, SAS Institute Inc. All rights reserved. 21

Time_DimDate: INTEGER

Weekday_No: SMALLINTWeekday_Eu: SMALLINTWeekday_Name: VARCHAR(20)Week_No: SMALLINTWeek_Name: CHARACTER(7)Month_No: SMALLINTMonth_Name: VARCHAR(20)Quarter: CHARACTER(6)Year: CHARACTER(4)Holiday_Us: VARCHAR(45)

Product_DimProduct_Id: INTEGER

Product_Name: CHARACTER(45)Supplier_Id: INTEGERProduct_Group: CHARACTER(40)Product_Category: CHARACTER(40)Product_Line: CHARACTER(40)

Customer_DimCustomer_Id: INTEGER

Customer_Country: CHARACTER(2)Gender: CHARACTER(1)Customer_Name: CHARACTER(40)Customer_Firstname: CHARACTER(20)Customer_Lastname: CHARACTER(30)Birthday: DATECustomer_Type: CHARACTER(40)Customer_Group: CHARACTER(40)Age: DECIMAL(3) Order_Fact

Customer_Id: INTEGER (FK)Employee_Id: INTEGER (FK)Street_Id: INTEGER (FK)Product_Id: INTEGER (FK)Order_Date: DATE (FK)Order_Id: INTEGER

Order_Type: SMALLINTDelivery_Date: DATEAmount: DECIMAL(5)Price: NUMERIC(8,2)Cost_Price_Pr_Unit: DECIMAL(12,2)Promotion: DECIMAL(5,2)

Organization_DimEmployee_Id: INTEGER

Country: CHARACTER(2)Employee_Name: CHARACTER(40)Group: CHARACTER(40)Section: CHARACTER(40)Department: CHARACTER(40)Company: CHARACTER(40)Job_Title: CHARACTER(25)Gender: CHARACTER(1)Salary: DECIMAL(12,2)Birthday: DATEEmp_Hire_Date: DATEEmp_Term_Date: DATEStart_Date: DATEEnd_Date: DATE

Geography_DimStreet_Id: INTEGER

Street_Name: CHARACTER(40)Zipcode: CHARACTER(10)City: CHARACTER(40)County: VARCHAR(40)Province: CHARACTER(40)Region: CHARACTER(40)State: VARCHAR(30)Country: CHARACTER(2)Continent: CHARACTER(40)

Physical Data Model of Order Star Schema

Copyright © 2003, SAS Institute Inc. All rights reserved. 22

Next Steps

! OLAP� Design cubes for common queries/reports� Star schema is good basis for this

! Semantic layer for end user reporting� SAS Information Map Studio

Copyright © 2003, SAS Institute Inc. All rights reserved. 23

Copyright © 2003, SAS Institute Inc. All rights reserved. 24

Copyright © 2003, SAS Institute Inc. All rights reserved. 25

SAS® Information Map StudioHR Information Map

HR Data

Copyright © 2003, SAS Institute Inc. All rights reserved. 26

How to develop

! Document it!

! Multi-developer projects

! Check-in, check-out

! Audit history

! Dev, Test, Prod environments

! Reusable added code

Copyright © 2003, SAS Institute Inc. All rights reserved. 27

How to deploy

! Execute ETL in a production environment

! Easily managed deployment

! Schedule for periodic runs� Uses integrated LSF Scheduler from Platform Computing� Ready for use with standard schedulers too− cron (unix), at (Windows)

Copyright © 2003, SAS Institute Inc. All rights reserved. 28

Copyright © 2003, SAS Institute Inc. All rights reserved. 29

Successful Data Warehouse implementations

! Shift corporate culture to rely upon DWH

! Align planning with corporate strategy

! Provide adequate resources to maintain/expand

! Demonstrate return-on-investment

! Build executive support

! Maintain complete and up-to-date content

! Employ correct technology/Upgrade at best freq.

! Plan sufficient training

Copyright © 2003, SAS Institute Inc. All rights reserved. 30

Successful Data Warehouse implementations

! Shift corporate culture to rely upon DWH 92% ! Align planning with corporate strategy 75%! Provide adequate resources to maintain/expand 75%! Demonstrate return-on-investment 50% ! Build executive support 42% ! Maintain complete and up-to-date content 42% ! Employ correct technology/Upgrade at best freq. 42% ! Plan sufficient training 33%

Copyright © 2003, SAS Institute Inc. All rights reserved. 31

Summary

! Planning is key

! Architected environment

! Don�t be afraid of ERP data

! Integrate cleansing and validation

! Don�t underestimate the value of data modeling

! Metadata

! Be sure the DWH is used

Copyright © 2003, SAS Institute Inc. All rights reserved. 32Copyright © 2003, SAS Institute Inc. All rights reserved. 32