govindarajan data vault

29
1 Data Vault Modeling The Next Generation DW Approach http://www.globytes.com http://www.globytes.com http://www.globytes.com http://www.globytes.com [email protected] [email protected] [email protected] [email protected] Presented by: Siva Govindarajan 

Upload: vamshig999

Post on 07-Apr-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 1/29

1

Data Vault ModelingThe Next Generation DW Approach

http://www.globytes.comhttp://www.globytes.comhttp://www.globytes.comhttp://www.globytes.com

[email protected]@[email protected]@globytes.com

Presented by: Siva Govindarajan 

Page 2: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 2/29

2

Introduction

Data Vault Place in Evolution

Data Warehousing Architecture

Data Vault Components Case Study

Typical 3NF / Star Schema

Data Vault Approach

Temporal Data + Auditability

Implementation of Data Vault

Conclusion

Data Vault ModelingAgenda

Page 3: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 3/29

3

Data Vault concept originally conceived

 by Dan Linstedt Enterprise Data Warehouse Modeling

approach.

Hybrid approach - 3NF & Star Schema Adaptable to changes

Auditable

Timed right with Technology

Introduction

Page 4: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 4/29

4

1960s - Codd, Date et. al Normal Forms

1980s - Normal Forms adapted to DWs

1985+ - Star Schema for OLAP

1990s - Data Vault concept developedDan Linstedt

2000+ - Data Vault concept published by

Dan Linstedt

Data Vault Place in Evolution

Page 5: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 5/29

5

Data Warehousing Architecture

Bill Inmon’s Data Warehouse Architecture

Data Mart

Data Mart

Data Marts

Data Mart

Data Mart

Data MartsEnterprise

DW

Staging(optional)

OLTP

SystemsStaging

& ODS

  A d  a  p

  t  i  v e

  M  a  r  t  s

Kimball’s Data Warehouse Bus Architecture

Data Mart

OLTPSystems Staging

Data Mart

Data Mart

Data Mart

Page 6: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 6/29

6

Hub Entities

Candidate Keys + Load Time + Source

Link Entities

FKs from Hub + Load Time + Source

Satellite Entities

Descriptive Data + Load Time + Source + End Time

Dimension = Hub+ Satellite

Fact = Satellite + Link [+ Hub ]

Data Vault Components

Page 7: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 7/29

7

Modeling Approaches

Typical 3NF / Star Schema Data Vault approach

Scenario

Typical Fact and Dimension  New Dimension data

Reference Data change - 1:M to M:M

Additional attributes to Dimension

Case Study

Page 8: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 8/29

8

Typical 3F / Star Schema

TRORDER

tr_order_id

dim_product_id

systemorderid

systemrootorderid

globalrootorderid

orderquantity

totalexecutedqty

eventtimestamp

DIMPRODUCT

dim_product_id

product_id

product_code

productdesc

product_name

1. Typical Fact and Dimension

Page 9: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 9/29

9

Typical 3F / Star Schema

DIMPRODUCT

dim_product_id

product_id

product_code

productdesc

product_name

product_category_i dproduct_category_i dproduct_category_i dproduct_category_i d

product_category_nameproduct_category_nameproduct_category_nameproduct_category_name

p r o d u c t _ c a t e g o r y _ c o d ep r o d u c t _ c a t e g o r y _ c o d ep r o d u c t _ c a t e g o r y _ c o d ep r o d u c t _ c a t e g o r y _ c o d e

p r o d u c t _ c a t e g o r y _ d e s cp r o d u c t _ c a t e g o r y _ d e s cp r o d u c t _ c a t e g o r y _ d e s cp r o d u c t _ c a t e g o r y _ d e s c

i s_strategyi s_strategyi s_strategyi s_strategy

TRORDER

tr_order_id

dim_product_id

systemorderid

systemrootorderid

globalrootorderid

orderquantity

totalexecutedqty

eventtimestamp

RE FP RODUCTCA TE GORYRE FP RODUCTCA TE GORYRE FP RODUCTCA TE GORYRE FP RODUCTCA TE GORYproduct_category_i dproduct_category_i dproduct_category_i dproduct_category_i d

product_category_nameproduct_category_nameproduct_category_nameproduct_category_name

p r o d u c t _ c a t e g o r y _ c o d ep r o d u c t _ c a t e g o r y _ c o d ep r o d u c t _ c a t e g o r y _ c o d ep r o d u c t _ c a t e g o r y _ c o d e

p r o d u c t _ c a t e g o r y _ d e s cp r o d u c t _ c a t e g o r y _ d e s cp r o d u c t _ c a t e g o r y _ d e s cp r o d u c t _ c a t e g o r y _ d e s c

i s_strategyi s_strategyi s_strategyi s_strategy

2. ew Dimension Data

Page 10: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 10/29

10

Typical 3F / Star Schema

DIMPRODUCT

dim_product_id

product_id

product_code

productdesc

product_name

TRORDER

tr_order_id

dim_product_id

systemorderid

systemrootorderid

globalrootorderid

orderquantity

totalexecutedqty

eventtimestampdi m_product_category_i ddi m_product_category_i ddi m_product_category_i ddi m_product_category_i d

RE FP RODUCTCA TE GORYRE FP RODUCTCA TE GORYRE FP RODUCTCA TE GORYRE FP RODUCTCA TE GORY

product_category_i dproduct_category_i dproduct_category_i dproduct_category_i d

product_category_nameproduct_category_nameproduct_category_nameproduct_category_name

p r o d u c t _ c a t e g o r y _ c o d ep r o d u c t _ c a t e g o r y _ c o d ep r o d u c t _ c a t e g o r y _ c o d ep r o d u c t _ c a t e g o r y _ c o d e

p r o d u c t _ c a t e g o r y _ d e s cp r o d u c t _ c a t e g o r y _ d e s cp r o d u c t _ c a t e g o r y _ d e s cp r o d u c t _ c a t e g o r y _ d e s c

i s_strategyi s_strategyi s_strategyi s_strategy

DIMP RODUCTCA TE GORYDIMP RODUCTCA TE GORYDIMP RODUCTCA TE GORYDIMP RODUCTCA TE GORY

di m_product_category_i ddi m_product_category_i ddi m_product_category_i ddi m_product_category_i d

product_category_code

product_category_name

product_category_desc

is_strategy

2. ew Dimension Data

Page 11: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 11/29

Page 12: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 12/29

12

Typical 3F / Star Schema

DIMPRODUCT

dim_product_id

product_id

product_code

productdesc

product_name

lengthlengthlengthlength

heightheightheightheight

weightweightweightweightp a c k a g i n gp a c k a g i n gp a c k a g i n gp a c k a g i n g

c o l o rc o l o rc o l o rc o l o r

sizesizesizesize

TRORDER

tr_order_id

dim_product_id

systemorderid

systemrootorderid

globalrootorderid

orderquantity

totalexecutedqty

eventtimestampdim_product_category_iddim_product_category_iddim_product_category_iddim_product_category_id

4. Additional attributes to Dimension

Page 13: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 13/29

13

Data Vault Approach1. Typical Hub, Satellite and Link 

(Replacing Fact/Dimension)

HUB_PRODUCT

hub_product_idhub_product_idhub_product_idhub_product_id

hub_record_sourcehub_record_sourcehub_record_sourcehub_record_source

hub_load_datehub_load_datehub_load_datehub_load_date

product_id

product_code

SAT_PRODUCT01SAT_PRODUCT01SAT_PRODUCT01SAT_PRODUCT01

sat_product01_idsat_product01_idsat_product01_idsat_product01_id

hub_product_idhub_product_idhub_product_idhub_product_id

sat_load_datesat_load_datesat_load_datesat_load_date

sat_end_datesat_end_datesat_end_datesat_end_date

sat_record_sourcesat_record_sourcesat_record_sourcesat_record_source

productdesc

product_name

SATORDER01SATORDER01SATORDER01SATORDER01

sat_order01_idsat_order01_idsat_order01_idsat_order01_id

hub_order_idhub_order_idhub_order_idhub_order_id

sat_load_datesat_load_datesat_load_datesat_load_date

sat_end_datesat_end_datesat_end_datesat_end_date

sat_record_sourcesat_record_sourcesat_record_sourcesat_record_source

orderquantity

totalexecutedqty

LNK_ORDER

lnk_order_idlnk_order_idlnk_order_idlnk_order_id

lnk_load_datelnk_load_datelnk_load_datelnk_load_date

lnk_record_sourcelnk_record_sourcelnk_record_sourcelnk_record_source

hub_product_idhub_product_idhub_product_idhub_product_id

hub_order_idhub_order_idhub_order_idhub_order_id

HUB_ORDERhub_order_idhub_order_idhub_order_idhub_order_id

hub_record_sourcehub_record_sourcehub_record_sourcehub_record_source

hub_load_datehub_load_datehub_load_datehub_load_date

tr_order_id

systemorderid

systemrootorderid

globalrootorderid

Page 14: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 14/29

Page 15: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 15/29

Page 16: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 16/29

16

Data Vault Approach4. Additional attributes to Dimension

HUB_PRODUCT

hub_product_idhub_product_idhub_product_idhub_product_id

hub_ r ecor d_ sour cehub_ r ecor d_ sour cehub_ r ecor d_ sour cehub_ r ecor d_ sour ce

hub_ l oad_ dat ehub_ l oad_ dat ehub_ l oad_ dat ehub_ l oad_ dat e

product_id

product_code

SAT_PRODUCT01

sat_product01_idsat_product01_idsat_product01_idsat_product01_id

hub_product_idhub_product_idhub_product_idhub_product_id

sat _ l oad_ dat esat _ l oad_ dat esat _ l oad_ dat esat _ l oad_ dat e

sat _ end_ dat esat _ end_ dat esat _ end_ dat esat _ end_ dat e

sat _ r ecor d_ sour cesat _ r ecor d_ sour cesat _ r ecor d_ sour cesat _ r ecor d_ sour ce

productdesc

product_name

S A T _ P R O D U C T 0 2S A T _ P R O D U C T 0 2S A T _ P R O D U C T 0 2S A T _ P R O D U C T 0 2

sat_product02_idsat_product02_idsat_product02_idsat_product02_id

hub_product_idhub_product_idhub_product_idhub_product_id

sat _ l oad_ dat esat _ l oad_ dat esat _ l oad_ dat esat _ l oad_ dat e

sat _ end_ dat esat _ end_ dat esat _ end_ dat esat _ end_ dat e

sat _ r ecor d_ sour cesat _ r ecor d_ sour cesat _ r ecor d_ sour cesat _ r ecor d_ sour ce

length

height

weight

packaging

color

Page 17: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 17/29

Page 18: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 18/29

18

Make it Simple

View for Dimension

View For Fact

Data Loading

Take advantage of Technology

DW Appliances

High-throughput storage devices

RDBMS Features

Implementation of Data Vault

Page 19: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 19/29

19

Create views for Dimension

From our Demo model, Join Product related Hubs and

Satellites to build View for Product Dimension as shown

 below :

HUB_PRODUCT

SAT_PRODUCT01

SAT_PRODUCT02

Make it Simple

Page 20: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 20/29

20

Create Views For Fact

Join Order related Hubs, Satellites and Links to buildView for ORDER Fact using ORDER related Hubs/Satellites and Links.

HUB_ORDER 

SAT_ORDER01

LNK_ORDER 

Make it Simple

Page 21: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 21/29

21

Typically data loading for a Data Vault is in thefollowing sequence.

Hubs for Dimensions Links for Dimensions

Satellites for Dimensions

Hubs for Fact (if any ) Links for Fact

Satellites for Fact

Refer to Data Vault Series article 5 of Linstedt for 

further details

Data Loading

Page 22: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 22/29

22

 Next wave of IT Data Warehouse hardware solutions are

the Data Warehouse Appliances. Take advantage of thetechnology where possible. Few DW Appliances in the

market :

Oracle Exadata

 Netezza

Teradata

RedBrick 

GreenPlum

DW Appliances

Page 23: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 23/29

23

Utilize Storage devices designed for DW / OLAP

applications. Following are some examples only andwould change over time. Please do your research based

on your sizing requirements.

EMC CLARiiON Storage family

Texas Memory systems RamSan – Solid State

devices

Hitachi USP / VSP Storage family

Other Hybrid solutions with High performancestorage and Solid State devices

High-throughput storage devices

Page 24: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 24/29

24

Some Oracle Features catering DW / OLAP

applications: Exadata Smart Scans

OLAP Based Materialized views

Partitioning Reference partition Advanced data compression

Automatic degree of parallelism (ADOP).

Star query optimization

Oracle OLAP/DWH Features.

RDBMS Features

Page 25: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 25/29

25

ConclusionIn this presentation we have addressed high level overview of Data

Vault Architecture. Topics discussed are

Data Vault concept and architecture

Data Vault Components such as Hubs, Satellites and Link tables

Typical modeling challenges with traditional modeling

approaches

How those challenges could be handled using Data VaultModeling Approach.

Auditing and Temporal data capture using DV Approach.

And finally, some implementation details

If interested in learning more, tryDan Linstedt's special coaching area at:

http://danLinstedt.com/my-coach.

Page 26: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 26/29

26

Questions ?

?

Page 27: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 27/29

27

References

Data Vault Series articles by Dan E. Linstedt --

http://www.tdan.com

Referred few articles from Genesee Academy --

http://geneseeacademy.com

Articles and books by Ralph Kimball and Bill Inmon

from multiple sources.

Technical Documents from

http://www.oracle.com/technetwork 

Page 28: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 28/29

28

 Special Thanks to

Dan Linstedt for review and corrections to the article.

My colleague Mark Bruscke for his assistance in preparing

the article.

Page 29: Govindarajan Data Vault

8/3/2019 Govindarajan Data Vault

http://slidepdf.com/reader/full/govindarajan-data-vault 29/29

29

The End

Contact Email :

[email protected]