ensemble modeling and data vault 2014

18
Data Vault Modeling DW2.0 & Unstructured Data Big Data Agile DW Ensemble Modeling Ensemble Modeling & Data Vault © 2014 Genesee Academy, LLC USA +1 303 526 0340 Sweden 072 736 8700 [email protected] www.GeneseeAcademy.com 2014

Upload: hans-hultgren

Post on 29-Nov-2014

512 views

Category:

Technology


0 download

DESCRIPTION

Data Modeling for the Agile Data Warehouse means breaking down business concepts into lower level tables. This process of Unified Decomposition results in concept Ensembles - like the popular Data Vault modeling approach.

TRANSCRIPT

Page 1: Ensemble Modeling and Data Vault 2014

Data Vault Modeling DW2.0 & Unstructured Data Big Data Agile DW Ensemble Modeling

Ensemble Modeling

& Data Vault

© 2014 Genesee Academy, LLC USA +1 303 526 0340 Sweden 072 736 8700 [email protected] www.GeneseeAcademy.com

2014

Page 2: Ensemble Modeling and Data Vault 2014

© 2014 Genesee Academy, LLC

Ensemble Modeling & Data Vault

A G E N D A • Ensemble Modeling • &Unified Decomposition • Data Vault Ensemble • Colors of Data Vault • Data Vault Hubs, Links and

Satellites • More Information

2

Author, Advisor, Speaker & Industry Analyst; President Genesee Academy LLC, Principal at

About Hans Hultgren:

Book available on Amazon.com

gohansgo

Page 3: Ensemble Modeling and Data Vault 2014

© 2014 Genesee Academy, LLC

A Saga of Data Warehousing

Once upon a time data warehousing was becoming more popular and everyone was eager to build their own. But whenever they tried they failed. They called upon their best to fix this but they just couldn’t solve the problem.

They discovered that meeting the needs of the data warehouse meant that the tables got too big and too hard to work with. They just could not handle changes over time. If the smallest thing changed it always meant they had to change the entire table. When just a single attribute was updated they had to insert a record for all of the attributes. All seemed lost.

But around the world there were rebels who questioned the conventional wisdom. And their voices were finally heard: Why not separate the things that change from the things that don’t change?

3

Page 4: Ensemble Modeling and Data Vault 2014

© 2014 Genesee Academy, LLC

Ensemble Modeling™

4

All the parts of a thing taken together, so that each part is considered only in relation to the whole.

• The constellation of component parts acts as a whole – an Ensemble.

• With Ensemble Modeling the Core Business Concepts that we define and model are represented as a whole – an ensemble – including all of the component parts.

Page 5: Ensemble Modeling and Data Vault 2014

© 2014 Genesee Academy, LLC

Based on Unified Decomposition™

5

• With the EDW, we break things out into parts for flexibility, agility, and generally to facilitate the capture of things that are either interpreted in different ways or changing independently of each other.

• At the same time a core premise of data warehousing is integration and moving to a common standard view of unified concepts. So we also want to tie things together – Unify.

Page 6: Ensemble Modeling and Data Vault 2014

© 2014 Genesee Academy, LLC

THE DATA VAULT ENSEMBLE: APPLYING THE ENSEMBLE

6

Page 7: Ensemble Modeling and Data Vault 2014

© 2014 Genesee Academy, LLC

The Data Vault Ensemble

7

• The Data Vault Ensemble conforms to a single key – embodied in the Hub construct.

• The component parts for the Data Vault Ensemble include: – Hub The Natural Business Key – Link The Natural Business Relationships – Satellite All Context, Descriptive Data and History

Page 8: Ensemble Modeling and Data Vault 2014

© 2014 Genesee Academy, LLC

The Data Vault modeling approach

HUB

LINK

SAT

SAT

3NF Data Vault Dimensional

8

Details / Context

Core Concept Business Keys

Associations / Relationships

Entity Dim

Page 9: Ensemble Modeling and Data Vault 2014

© 2014 Genesee Academy, LLC

Modeling Comparison

Facts contain all three types of data… Dimensions can also contain all types

*** Requires complex loading routines for key dependencies…

Region

Sale Fact

Customer

Product Vendor Employee

Store

Details

Business Keys

Associations

Start Schema and Snow Flake Models:

9

Page 10: Ensemble Modeling and Data Vault 2014

© 2014 Genesee Academy, LLC

Modeling Comparison

Region Store

Sale

Sale LI

Customer

Product

Vendor

Employee

3rd Normal Form has the same issue: each construct – or Entity – typically contains a business key, one or more associations and also details (context, descriptive data)…

10

Page 11: Ensemble Modeling and Data Vault 2014

© 2014 Genesee Academy, LLC

Link

Colors of the Data Vault

Region

Vendor

Link Store

Sat

Product

Link

Customer

Employee

Sale

Sat

Sat Sat Sat Sat Sat Sat Sat

Sat Sat Sat Sat Sat Sat Sat Sat Sat Sat Sat Sat Sat Sat

Sat Sat Sat Sat Sat Sat Sat Sat Sat Sat Sat Sat

Sat Sat Sat

Sat Sat Sat Sat Sat Sat

Link

11

Sat Sat Sat Sat

Sat Sat

Sat Sat

Sat Sat

Sat Sat Sat

Sat Sat Sat Sat Sat Sat

Page 12: Ensemble Modeling and Data Vault 2014

© 2014 Genesee Academy, LLC

Data Vault means thinking differently

12

Customer

Customer

• The minimal construct then for an “entity” such as “Customer” is now a

Hub with a set of Satellites

Page 13: Ensemble Modeling and Data Vault 2014

© 2014 Genesee Academy, LLC

Data Vault Modeling Process

• The Modeling Process for creating a Data Vault model includes three primary steps:

1) Identify and Model your Core Business Concepts • Business Interviews is at the heart of this step

What do you do? What are the main things you work with?

• Also find best/target Natural Business Key 2) Identify and Model your Natural Business Relationships

• Specific Unique Relationships • Be considerate of the Unit of Work and Grain

3) Analyze and Design your Context Satellites • Consider Rate of Change, Type of Data

and also the Sources of your data during design process

13

Page 14: Ensemble Modeling and Data Vault 2014

© 2014 Genesee Academy, LLC

Hubs

– A Hub Construct in Data Vault • contains Business Key • only the Business Key • contains No Context • is always 1:1 with EWBK

– A Hub Table contains only • Business Key • Surrogate Key (Data Warehouse) • Load Date / Time Stamp • Record Source

14

Record source

Date/Time Stamp

Business Key

H_Customer_SID

H_Customer

Page 15: Ensemble Modeling and Data Vault 2014

© 2014 Genesee Academy, LLC

Links

– A Link Construct in Data Vault • contains Relationship • only a Relationship • contains No Context • is always 1:1 with Relationship

– A Link Table contains only • 2-n FKs for the Relationship • Surrogate Key (Data Warehouse) • Load Date / Time Stamp • Record Source

15

L_Cust_Class_SID

H_Customer_SID

H_Sequence2_SID

Date/Time Stamp

Record source

L_Cust_Class Record source

Date/Time Stamp

Business Key

H_Customer_SID

H_Customer

– Unique – Specific – Natural

Business Relationship

Page 16: Ensemble Modeling and Data Vault 2014

© 2014 Genesee Academy, LLC

Satellites

– A Satellite Construct in Data Vault • contains Context only • has no FKs (no relationships) • Designed by * Rate of Change

* Type of Data * System…

– A Satellite Table contains only • Business Key FK + • Load Date / Time Stamp • Context Data… • Record Source

16

Context A Context B Context C

H_Customer_SID

Record source Context D

Date/Time Stamp

S_Customer

Record source

Date/Time Stamp

Business Key

H_Customer_SID

H_Customer

Page 17: Ensemble Modeling and Data Vault 2014

© 2014 Genesee Academy, LLC

About Data Vault Ensemble

17

Estimated 800 Data Vault based Data Warehouses around the world