govindarajan data vault
TRANSCRIPT
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 1/29
1
Data Vault ModelingThe Next Generation DW Approach
http://www.globytes.comhttp://www.globytes.comhttp://www.globytes.comhttp://www.globytes.com
[email protected]@[email protected]@globytes.com
Presented by: Siva Govindarajan
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 2/29
2
Introduction
Data Vault Place in Evolution
Data Warehousing Architecture
Data Vault Components Case Study
Typical 3NF / Star Schema
Data Vault Approach
Temporal Data + Auditability
Implementation of Data Vault
Conclusion
Data Vault ModelingAgenda
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 3/29
3
Data Vault concept originally conceived
by Dan Linstedt Enterprise Data Warehouse Modeling
approach.
Hybrid approach - 3NF & Star Schema Adaptable to changes
Auditable
Timed right with Technology
Introduction
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 4/29
4
1960s - Codd, Date et. al Normal Forms
1980s - Normal Forms adapted to DWs
1985+ - Star Schema for OLAP
1990s - Data Vault concept developedDan Linstedt
2000+ - Data Vault concept published by
Dan Linstedt
Data Vault Place in Evolution
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 5/29
5
Data Warehousing Architecture
Bill Inmon’s Data Warehouse Architecture
Data Mart
Data Mart
Data Marts
Data Mart
Data Mart
Data MartsEnterprise
DW
Staging(optional)
OLTP
SystemsStaging
& ODS
A d a p
t i v e
M a r t s
Kimball’s Data Warehouse Bus Architecture
Data Mart
OLTPSystems Staging
Data Mart
Data Mart
Data Mart
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 6/29
6
Hub Entities
Candidate Keys + Load Time + Source
Link Entities
FKs from Hub + Load Time + Source
Satellite Entities
Descriptive Data + Load Time + Source + End Time
Dimension = Hub+ Satellite
Fact = Satellite + Link [+ Hub ]
Data Vault Components
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 7/29
7
Modeling Approaches
Typical 3NF / Star Schema Data Vault approach
Scenario
Typical Fact and Dimension New Dimension data
Reference Data change - 1:M to M:M
Additional attributes to Dimension
Case Study
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 8/29
8
Typical 3F / Star Schema
TRORDER
tr_order_id
dim_product_id
systemorderid
systemrootorderid
globalrootorderid
orderquantity
totalexecutedqty
eventtimestamp
DIMPRODUCT
dim_product_id
product_id
product_code
productdesc
product_name
1. Typical Fact and Dimension
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 9/29
9
Typical 3F / Star Schema
DIMPRODUCT
dim_product_id
product_id
product_code
productdesc
product_name
product_category_i dproduct_category_i dproduct_category_i dproduct_category_i d
product_category_nameproduct_category_nameproduct_category_nameproduct_category_name
p r o d u c t _ c a t e g o r y _ c o d ep r o d u c t _ c a t e g o r y _ c o d ep r o d u c t _ c a t e g o r y _ c o d ep r o d u c t _ c a t e g o r y _ c o d e
p r o d u c t _ c a t e g o r y _ d e s cp r o d u c t _ c a t e g o r y _ d e s cp r o d u c t _ c a t e g o r y _ d e s cp r o d u c t _ c a t e g o r y _ d e s c
i s_strategyi s_strategyi s_strategyi s_strategy
TRORDER
tr_order_id
dim_product_id
systemorderid
systemrootorderid
globalrootorderid
orderquantity
totalexecutedqty
eventtimestamp
RE FP RODUCTCA TE GORYRE FP RODUCTCA TE GORYRE FP RODUCTCA TE GORYRE FP RODUCTCA TE GORYproduct_category_i dproduct_category_i dproduct_category_i dproduct_category_i d
product_category_nameproduct_category_nameproduct_category_nameproduct_category_name
p r o d u c t _ c a t e g o r y _ c o d ep r o d u c t _ c a t e g o r y _ c o d ep r o d u c t _ c a t e g o r y _ c o d ep r o d u c t _ c a t e g o r y _ c o d e
p r o d u c t _ c a t e g o r y _ d e s cp r o d u c t _ c a t e g o r y _ d e s cp r o d u c t _ c a t e g o r y _ d e s cp r o d u c t _ c a t e g o r y _ d e s c
i s_strategyi s_strategyi s_strategyi s_strategy
2. ew Dimension Data
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 10/29
10
Typical 3F / Star Schema
DIMPRODUCT
dim_product_id
product_id
product_code
productdesc
product_name
TRORDER
tr_order_id
dim_product_id
systemorderid
systemrootorderid
globalrootorderid
orderquantity
totalexecutedqty
eventtimestampdi m_product_category_i ddi m_product_category_i ddi m_product_category_i ddi m_product_category_i d
RE FP RODUCTCA TE GORYRE FP RODUCTCA TE GORYRE FP RODUCTCA TE GORYRE FP RODUCTCA TE GORY
product_category_i dproduct_category_i dproduct_category_i dproduct_category_i d
product_category_nameproduct_category_nameproduct_category_nameproduct_category_name
p r o d u c t _ c a t e g o r y _ c o d ep r o d u c t _ c a t e g o r y _ c o d ep r o d u c t _ c a t e g o r y _ c o d ep r o d u c t _ c a t e g o r y _ c o d e
p r o d u c t _ c a t e g o r y _ d e s cp r o d u c t _ c a t e g o r y _ d e s cp r o d u c t _ c a t e g o r y _ d e s cp r o d u c t _ c a t e g o r y _ d e s c
i s_strategyi s_strategyi s_strategyi s_strategy
DIMP RODUCTCA TE GORYDIMP RODUCTCA TE GORYDIMP RODUCTCA TE GORYDIMP RODUCTCA TE GORY
di m_product_category_i ddi m_product_category_i ddi m_product_category_i ddi m_product_category_i d
product_category_code
product_category_name
product_category_desc
is_strategy
2. ew Dimension Data
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 11/29
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 12/29
12
Typical 3F / Star Schema
DIMPRODUCT
dim_product_id
product_id
product_code
productdesc
product_name
lengthlengthlengthlength
heightheightheightheight
weightweightweightweightp a c k a g i n gp a c k a g i n gp a c k a g i n gp a c k a g i n g
c o l o rc o l o rc o l o rc o l o r
sizesizesizesize
TRORDER
tr_order_id
dim_product_id
systemorderid
systemrootorderid
globalrootorderid
orderquantity
totalexecutedqty
eventtimestampdim_product_category_iddim_product_category_iddim_product_category_iddim_product_category_id
4. Additional attributes to Dimension
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 13/29
13
Data Vault Approach1. Typical Hub, Satellite and Link
(Replacing Fact/Dimension)
HUB_PRODUCT
hub_product_idhub_product_idhub_product_idhub_product_id
hub_record_sourcehub_record_sourcehub_record_sourcehub_record_source
hub_load_datehub_load_datehub_load_datehub_load_date
product_id
product_code
SAT_PRODUCT01SAT_PRODUCT01SAT_PRODUCT01SAT_PRODUCT01
sat_product01_idsat_product01_idsat_product01_idsat_product01_id
hub_product_idhub_product_idhub_product_idhub_product_id
sat_load_datesat_load_datesat_load_datesat_load_date
sat_end_datesat_end_datesat_end_datesat_end_date
sat_record_sourcesat_record_sourcesat_record_sourcesat_record_source
productdesc
product_name
SATORDER01SATORDER01SATORDER01SATORDER01
sat_order01_idsat_order01_idsat_order01_idsat_order01_id
hub_order_idhub_order_idhub_order_idhub_order_id
sat_load_datesat_load_datesat_load_datesat_load_date
sat_end_datesat_end_datesat_end_datesat_end_date
sat_record_sourcesat_record_sourcesat_record_sourcesat_record_source
orderquantity
totalexecutedqty
LNK_ORDER
lnk_order_idlnk_order_idlnk_order_idlnk_order_id
lnk_load_datelnk_load_datelnk_load_datelnk_load_date
lnk_record_sourcelnk_record_sourcelnk_record_sourcelnk_record_source
hub_product_idhub_product_idhub_product_idhub_product_id
hub_order_idhub_order_idhub_order_idhub_order_id
HUB_ORDERhub_order_idhub_order_idhub_order_idhub_order_id
hub_record_sourcehub_record_sourcehub_record_sourcehub_record_source
hub_load_datehub_load_datehub_load_datehub_load_date
tr_order_id
systemorderid
systemrootorderid
globalrootorderid
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 14/29
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 15/29
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 16/29
16
Data Vault Approach4. Additional attributes to Dimension
HUB_PRODUCT
hub_product_idhub_product_idhub_product_idhub_product_id
hub_ r ecor d_ sour cehub_ r ecor d_ sour cehub_ r ecor d_ sour cehub_ r ecor d_ sour ce
hub_ l oad_ dat ehub_ l oad_ dat ehub_ l oad_ dat ehub_ l oad_ dat e
product_id
product_code
SAT_PRODUCT01
sat_product01_idsat_product01_idsat_product01_idsat_product01_id
hub_product_idhub_product_idhub_product_idhub_product_id
sat _ l oad_ dat esat _ l oad_ dat esat _ l oad_ dat esat _ l oad_ dat e
sat _ end_ dat esat _ end_ dat esat _ end_ dat esat _ end_ dat e
sat _ r ecor d_ sour cesat _ r ecor d_ sour cesat _ r ecor d_ sour cesat _ r ecor d_ sour ce
productdesc
product_name
S A T _ P R O D U C T 0 2S A T _ P R O D U C T 0 2S A T _ P R O D U C T 0 2S A T _ P R O D U C T 0 2
sat_product02_idsat_product02_idsat_product02_idsat_product02_id
hub_product_idhub_product_idhub_product_idhub_product_id
sat _ l oad_ dat esat _ l oad_ dat esat _ l oad_ dat esat _ l oad_ dat e
sat _ end_ dat esat _ end_ dat esat _ end_ dat esat _ end_ dat e
sat _ r ecor d_ sour cesat _ r ecor d_ sour cesat _ r ecor d_ sour cesat _ r ecor d_ sour ce
length
height
weight
packaging
color
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 17/29
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 18/29
18
Make it Simple
View for Dimension
View For Fact
Data Loading
Take advantage of Technology
DW Appliances
High-throughput storage devices
RDBMS Features
Implementation of Data Vault
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 19/29
19
Create views for Dimension
From our Demo model, Join Product related Hubs and
Satellites to build View for Product Dimension as shown
below :
HUB_PRODUCT
SAT_PRODUCT01
SAT_PRODUCT02
Make it Simple
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 20/29
20
Create Views For Fact
Join Order related Hubs, Satellites and Links to buildView for ORDER Fact using ORDER related Hubs/Satellites and Links.
HUB_ORDER
SAT_ORDER01
LNK_ORDER
Make it Simple
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 21/29
21
Typically data loading for a Data Vault is in thefollowing sequence.
Hubs for Dimensions Links for Dimensions
Satellites for Dimensions
Hubs for Fact (if any ) Links for Fact
Satellites for Fact
Refer to Data Vault Series article 5 of Linstedt for
further details
Data Loading
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 22/29
22
Next wave of IT Data Warehouse hardware solutions are
the Data Warehouse Appliances. Take advantage of thetechnology where possible. Few DW Appliances in the
market :
Oracle Exadata
Netezza
Teradata
RedBrick
GreenPlum
DW Appliances
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 23/29
23
Utilize Storage devices designed for DW / OLAP
applications. Following are some examples only andwould change over time. Please do your research based
on your sizing requirements.
EMC CLARiiON Storage family
Texas Memory systems RamSan – Solid State
devices
Hitachi USP / VSP Storage family
Other Hybrid solutions with High performancestorage and Solid State devices
High-throughput storage devices
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 24/29
24
Some Oracle Features catering DW / OLAP
applications: Exadata Smart Scans
OLAP Based Materialized views
Partitioning Reference partition Advanced data compression
Automatic degree of parallelism (ADOP).
Star query optimization
Oracle OLAP/DWH Features.
RDBMS Features
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 25/29
25
ConclusionIn this presentation we have addressed high level overview of Data
Vault Architecture. Topics discussed are
Data Vault concept and architecture
Data Vault Components such as Hubs, Satellites and Link tables
Typical modeling challenges with traditional modeling
approaches
How those challenges could be handled using Data VaultModeling Approach.
Auditing and Temporal data capture using DV Approach.
And finally, some implementation details
If interested in learning more, tryDan Linstedt's special coaching area at:
http://danLinstedt.com/my-coach.
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 26/29
26
Questions ?
?
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 27/29
27
References
Data Vault Series articles by Dan E. Linstedt --
http://www.tdan.com
Referred few articles from Genesee Academy --
http://geneseeacademy.com
Articles and books by Ralph Kimball and Bill Inmon
from multiple sources.
Technical Documents from
http://www.oracle.com/technetwork
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 28/29
28
Special Thanks to
Dan Linstedt for review and corrections to the article.
My colleague Mark Bruscke for his assistance in preparing
the article.
8/3/2019 Govindarajan Data Vault
http://slidepdf.com/reader/full/govindarajan-data-vault 29/29
29
The End
Contact Email :