data vault overview

88
Data Vault Model & Methodology © Dan Linstedt, 2011-2012 all rights reserved 1

Upload: empowered-holdings-llc

Post on 28-May-2015

5.543 views

Category:

Business


8 download

DESCRIPTION

Data Vault Modeling and Methodology introduction that I provided to a Montreal event in September 2011. It covers an introduction and overview of the Data Vault components for Business Intelligence and Data Warehousing. I am Dan Linstedt, the author and inventor of Data Vault Modeling and methodology. If you use the images anywhere in your presentations, please credit http://LearnDataVault.com as the source (me).Thank-you kindly,Daniel Linstedt

TRANSCRIPT

Page 1: Data Vault Overview

1

Data Vault Model &

Methodology© Dan Linstedt, 2011-2012 all rights

reserved

Page 2: Data Vault Overview

2

Agenda• Introduction – why are you here?• What is a Data Vault? Where does it come from?• Star Schema, 3nf, and Data Vault pros and cons

AS AN EDW solution..• When is a Data Vault a good fit?

o Benefits of Data Vault Modeling & Methodology

• <BREAK>• When to NOT use a Data Vault• Fundamental Paradigm Shift• Business Keys & Business Processes• Technical Review• Query Performance (PIT & Bridge)• What wasn’t covered in this presentation…

Page 3: Data Vault Overview

3

A bit about me…• Author, Inventor, Speaker – and

part time photographer…• 25+ years in the IT industry• Worked in DoD, US Gov’t, Fortune

50, and so on…

• Find out more about the Data Vault:o http://www.youtube.com/LearnDataVaulto http://LearnDataVault.com

• Full profile on http://www.LinkedIn.com/dlinstedt

Page 4: Data Vault Overview

4

Why Are YOU Here?• Your Expectations?• Your Questions?• Your Background?• Areas of Interest?

• Biggest question:

What are the top 3 pains your current EDW / BI solution is experiencing?

Page 5: Data Vault Overview

5

What is it?Where did it come

from? Defining the Data Vault Space

Page 6: Data Vault Overview

6

Data Vault Time Line

20001960 1970 1980 1990

E.F. Codd invented relational modeling

Chris Date and Hugh Darwen Maintained and Refined Modeling

1976 Dr Peter ChenCreated E-R Diagramming

Early 70’s Bill Inmon Began Discussing Data Warehousing

Mid 60’s Dimension & Fact Modeling presented by General Mills and Dartmouth University

Mid 70’s AC Nielsen PopularizedDimension & Fact Terms

Mid – Late 80’s Dr Kimball Popularizes Star Schema

Mid 80’s Bill InmonPopularizes Data Warehousing

Late 80’s – Barry Devlin and Dr Kimball Release “Business Data Warehouse”

1990 – Dan Linstedt Begins R&D on Data Vault Modeling

2000 – Dan Linstedt releases first 5 articles on Data Vault Modeling

Page 7: Data Vault Overview

7

Data Vault Modeling…

Took 10 years of Research and Design, including TESTING

to become flexible, consistent, and

scalable

Page 8: Data Vault Overview

8

What IS a Data Vault? (Business

Definition)

• Data Vault Modelo Detail orientedo Historical traceabilityo Uniquely linked set of

normalized tableso Supports one or more

functional areas of business

ProcurementSales DeliveryContracts

FinancePlanning

Operations

Business KeysSpan / CrossLines of Business

Functional Area

• Data Vault Methodology– CMMI, Project Plan– Risk, Governance, Versioning– Peer Reviews, Release Cycles– Repeatable, Consistent,

Optimized– Complete with Best Practices

for BI/DW

Page 9: Data Vault Overview

The Data Vault Model• The Data Vault model is a data modeling approach

…so it fits into the family of modeling approaches:

9

3rd Normal Form

Data Vault Star Schema

• While 3rd Normal Form is optimal for Operational Systems

…and Star Schema is optimal for OLAP Delivery / Data Marts

…the Data Vault is optimal for the Data Warehouse (EDW)

Page 10: Data Vault Overview

10

Supply Chain Analogy

Data Vault(EDW)

Source Systems

Data Marts

Page 11: Data Vault Overview

11

What Does One Look Like?

Customer

Sat

Sat

Sat

F(x)

Customer

Product

Sat

Sat

Sat

F(x)

Product

Order

Sat

Sat

Sat

F(x)

Order

Elements:•Hub•Link•Satellite

Link

F(x)

Sat

Records a history of the interaction

Hub = List of Unique Business KeysLink = List of Relationships, AssociationsSatellites = Descriptive Data

Page 12: Data Vault Overview

HUB

LINK

Satellite

Satellite

Colorized Perspective…Data Vault

Details

Business Keys

Associations

The Data Vault uniquely separates the Business Keys (Hubs) from the Associations (Links) and both of these from the Details that describe them and provide context (Satellites).

3rd NF & Star Schema

12

(separation)

(Colors Concept Originated By: Hans Hultgren)

Page 13: Data Vault Overview

13

Star Schemas, 3NF, Data Vault:

Pros & ConsDefining the Data Vault Space

Why NOT use Star Schemas as an EDW?Why NOT use 3NF as an EDW?

Why NOT use Data Vault as a Data Delivery Model?

Page 14: Data Vault Overview

14

Star Schema Pros/Cons as an EDW

PROS• Good for multi-dimensional

analysis• Subject oriented answers• Excellent for aggregation points• Rapid development /

deployment• Great for some historical

storage

CONS• Not cross-business functional• Use of junk / helper tables• Trouble with VLDW• Unable to provide integrated

enterprise information• Can’t handle ODS or

exploration warehouse requirements

• Trouble with data explosion in near-real-time environments

• Trouble with updates to type 2 dimension primary keys

• Trouble with late arriving data in dimensions to support real-time arriving transactions

• Not granular enough information to support real-time data integration

Page 15: Data Vault Overview

15

3nf Pros/Cons as an EDWPROS• Many to many linkages• Handle lots of information• Tightly integrated information• Highly structured• Conducive to near-real time

loads• Relatively easy to extend

CONS• Time driven PK issues• Parent-child complexities• Cascading change impacts• Difficult to load• Not conducive to BI tools• Not conducive to drill-down• Difficult to architect for an

enterprise• Not conducive to spiral/scope

controlled implementation• Physical design usually

doesn’t follow business processes

Page 16: Data Vault Overview

16

Data Vault Pros/Cons as an EDW

PROS• Supports near-real time and

batch feeds• Supports functional business

linking• Extensible / flexible• Provides rapid build / delivery

of star schema’s• Supports VLDB / VLDW• Designed for EDW• Supports data mining and AI• Provides granular detail• Incrementally built

CONS• Not conducive to OLAP

processing• Requires business

analysis to be firm• Introduces many join

operations

Page 17: Data Vault Overview

17

Analogy: The Porsche, the SUV and the Big Rig

• Which would you use to win a race?• Which would you use to move a house?• Would you adapt the truck and enter a race with Porches and expect to

win?

Page 18: Data Vault Overview

18

A Quick Look at Methodology IssuesBusiness Rule Processing, Lack of Agility, and

Future proofing your new solution

Page 19: Data Vault Overview

19

EDW Architecture: Generation 1

• Quality routines• Cross-system dependencies• Source data filtering• In-process data manipulation

• High risk of incorrect data aggregation• Larger system = increased impact• Often re-engineered at the SOURCE• History can be destroyed (completely re-computed)

Sales

Finance

Contracts

Staging(EDW)

StarSchemas

Enterprise BI Solution

(batch)

Conformed DimensionsJunk Tables

Helper TablesFactless Facts

ComplexBusiness

Rules+Dependencies

Complex Business Rules #2

Staging + History

Page 20: Data Vault Overview

20

#1 Cause of BI Initiative Failure

Re-EngineeringFor

Every Change!

Anyone?

Let’s take a look at one example…

Page 21: Data Vault Overview

21

Re-Engineering

Customer

CustomerTransactions

Sales

Finance

Current Sources

Source

Join

BusinessRules

Data Flow (Mapping)

CustomerPurchases

** NEW SYSTEM**

IMPACT!!

Page 22: Data Vault Overview

22

Federated Star Schema Inhibiting

Agility

Time

Effort& Cost

High

Low

Start MaintenanceCycle Begins

Changing and Adjusting conformed dimensions causes an exponential rise in the cost curve over time

RESULT: Business builds their own Data Marts!

Data Mart 1

Data Mart 2

Data Mart 3

The main driver for this is the maintenance costs, and re-engineering of the existing system which occurs for each new “federated/conformed” effort. This increases delivery time, difficulty, and maintenance costs.

Page 23: Data Vault Overview

23

EDW Architecture: Generation 2

Sales

Finance

Contracts

Staging EDW(Data Vault)

StarSchemas

ErrorMarts

ReportCollections

Enterprise BI SolutionSOA

(real-time)

(batch)

(batch)

ComplexBusiness

Rules

The business rules are moved closer to the business, improving IT reaction time, reducing cost and minimizing

impacts to the enterprise data warehouse (EDW)

• Repeatable• Consistent• Fault-tolerant• Supports phased release

• Scalable• Auditable

FUNDAMENTAL GOALS

Unstructured

Data

Page 24: Data Vault Overview

24

NO Re-Engineering

Customer

CustomerTransactions

Sales

Finance

Current Sources

StageCopy

StageCopy

HubCustome

r

HubAcct

HubProduc

t

Link Transacti

on

Data Vault

CustomerPurchases

** NEW SYSTEM**

StageCopy

IMPACT!!

NO IMPACT!!!NO RE-ENGINEERING!

Page 25: Data Vault Overview

25

Progressive Agility and Responsiveness of

IT

Time

Effort& Cost

High

Low

Start MaintenanceCycle Begins

Foundational Base Built

New Functional Areas AddedInitial DV Build Out

Re-Engineering does NOT occur with a Data Vault Model. This keeps costs down, and maintenance easy. It also reduces complexity of the existing architecture.

Page 26: Data Vault Overview

26

What’s Wrong With the OLD METHODOLOGY?Using Star Schemas as your Data Warehouse leads to….

Page 27: Data Vault Overview

27

Dimensionitis• DimensionItis: Incurable Disease, the symptoms are the creation

of new dimensions because the cost and time to conform existing dimensions with new attributes rises beyond the business ability to pay…

…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...

…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...

…………………...…………………...…………………...…………………...…………………...…………………...…………………...

…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………... …………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...

…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...

…………………...…………………...…………………...…………………...…………………...…………………...…………………...

…………………...…………………...…………………...…………………...…………………...

…………………...…………………...…………………...…………………...…………………...…………………...…………………...

…………………...…………………...…………………...…………………...…………………...…………………...

…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………... …………………...

…………………...…………………...…………………...…………………...…………………...

…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...

…………………...…………………...…………………...…………………...…………………...…………………...

…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...

…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...

…………………...…………………...…………………...…………………...

…………………...…………………...…………………...…………………...…………………...

…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...

…………………...…………………...…………………...…………………...…………………...…………………...…………………...

Business Says: Avoid the re-engineering costs, just “copy” the dimensions and create a new one for OUR department…

What can it hurt?

Page 28: Data Vault Overview

28

Deformed Dimensions• Deformity: The URGE to continue “slamming data” into an existing

conformed dimension until it simply cannot sustain any further changes, the result: a deformed dimension and a HUGE re-engineering cost / nightmare.

Re-Engineering the Load Processes EACH

TIME!

…………………………………… ………………… ………………… ………………… ………………… ………………… …………………

V1Comple

xLoad

90 days, $125k

Business Change

………………………………………………………………………………………………………………………………………………………………………………………………………………………………

V2

Complex

Load

120 days, $200k

Business Change

………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ……………………………………

V3

Complex

Load

180 days, $275k

Business Change

Business Wants a Change!Business said: Just add that to the existing Dimension, it will be easy right?

Page 29: Data Vault Overview

29

Silo Building / IT Non-Agility

• Business Says: Take the dimension you have, copy it, and change it… This should be cheap, and easy right?

Customer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_Type

First Star

Customer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneFact_ABCFact_DEFFact_PDQFact_MYFACT

Customer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_Type

Customer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_Type

Customer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_Type

We built our own because IT costs too much…

SALES

We built our own because IT took too long…

FINANCE

We built our own because we needed customized dimension data…

MARKETING

Business ChangeTo Modify Existing Star = 180 days, $275k

Page 30: Data Vault Overview

30

Why is Data Vault a Good Fit?

Page 31: Data Vault Overview

31

What are the top business

obstacles in your data warehouse

today?

Page 32: Data Vault Overview

32

Poor Agility

Inconsistent Answer Sets

Needs Accountability

Demands Auditability

Desires IT Transparency

Are you feeling Pinned Down?

Page 33: Data Vault Overview

33

What are the top technology

obstacles in yourdata warehouse

today?

Page 34: Data Vault Overview

34

Complex Systems

Real-Time Data Arrival

Unimaginable Data Growth

Master Data Alignment

Bad Data Quality

Late Delivery/Over Budget

Are your systems CRUMBLING?

Page 35: Data Vault Overview

35

Have lead you down a

painful path…

Yugo

Worlds Worst Car

Existing Solutions

Page 36: Data Vault Overview

36

Projects Cancelled & Restarted

Re-engineering required to absorb new systemsComplexity drives

maintenance cost Sky highDisparate Silo Solutions

provide inaccurate answers!Severe lack of

Accountability

Page 37: Data Vault Overview

37

There must be a better way…

There IS a better way!

How can you overcome

these obstacles?

Page 38: Data Vault Overview

38

It’s Called the

Data Vault Model

and Methodology

Page 39: Data Vault Overview

39

What is it?

It’s a simpleEasy-to-use

PlanTo build your

valuableData Warehouse!

Page 40: Data Vault Overview

40

Uncomplicated Design

Simple Build-out

Rapid Adaptability

Understandable Standards

Effortless Scalability

Painless Auditability

Pursue Your Goals!

What’s the Value?

Page 41: Data Vault Overview

41

Why Bother With Something New?

Old Chinese proverb: 'Unless you change direction, you're apt to end up where you're headed.'

Page 42: Data Vault Overview

42

What Are the Issues?

This is NOT what you want happening to your project!

THE GAP!!

Page 43: Data Vault Overview

43

What Are the Foundational Keys?

Flexibility

Scalability

Productivity

Page 44: Data Vault Overview

44

Key: Flexibility

Enabling rapid change on a massive scale without downstream impacts!

Page 45: Data Vault Overview

45

Key: Scalability

Providing no foreseeable barrier to increased size and scope

People, Process, & Architecture!

Page 46: Data Vault Overview

46

Key: Productivity

Enabling low complexity systems with high value output at a rapid

pace

Page 47: Data Vault Overview

47

< BREAK TIME >

Page 48: Data Vault Overview

48

How does it work?Bringing the Data Vault to Your Project

Page 49: Data Vault Overview

49

Key: Flexibility

Adding new components to the EDW has NEAR ZERO impact to:• Existing Loading Processes• Existing Data Model• Existing Reporting & BI Functions• Existing Source Systems• Existing Star Schemas and Data Marts

No Re-

Engineeri

ng!

Page 50: Data Vault Overview

50

Case In Point:

Result of flexibility of the Data Vault Model allowed them to merge 3 companies in 90 days – that is ALL systems, ALL DATA!

Page 51: Data Vault Overview

51

Key: Scalability in Architecture

Scaling is easy, its based on the following principles• Hub and spoke design• MPP Shared-Nothing Architecture• Scale Free Networks

Page 52: Data Vault Overview

52

Case In Point:

Result of scalability was to produce a Data Vault model that scaled to 3 Petabytes in size, and is still growing today!

Page 53: Data Vault Overview

53

Key: Scalability in Team Size

You should be able to SCALE your TEAM as well!With the Data Vault methodology, you can:

Scale your team when desired, at different points in the project!

Page 54: Data Vault Overview

54

Case In Point:(Dutch Tax Authority)

Result of scalability was to increase ETL developers for each new source system, and reassign them when the system was completely loaded to the Data Vault

Page 55: Data Vault Overview

55

Key: Productivity

Increasing Productivity requires a reduction in complexity.The Data Vault Model simplifies all of the following:• ETL Loading Routines• Real-Time Ingestion of Data• Data Modeling for the EDW• Enhancing and Adapting for Change to the Model• Ease of Monitoring, managing and optimizing

processes

Page 56: Data Vault Overview

56

Case in Point:Result of Productivity was: 2 people in 2 weeks merged 3 systems, built a full Data Vault EDW, 5 star schemas and 3 reports.

These individuals generated:• 90% of the ETL code for moving the data

set• 100% of the Staging Data Model• 75% of the finished EDW data Model• 75% of the star schema data model

Page 57: Data Vault Overview

57

The Competing Bid?

The competition bid this with 15 people and 3 months to completion, at a cost of $250k! (they bid a Very complex system)

Our total cost? $30k and 2 weeks!

Page 58: Data Vault Overview

58

Results?

Changing the direction of the river takes less effort than stopping the flow

of water

Page 59: Data Vault Overview

59

When NOT to use the Data Vault Model &

Methodology

Page 60: Data Vault Overview

60

When NOT to Use the Data Vault

• You have:o a small set of point solution requirementso a very short time-frame for deliveryo To use the data one-time, then throw it awayo a single source system, single source applicationo A single business analyst in the entire company

• You do NOT have:o audit requirements forcing you to keep historyo multiple data center consolidation effortso near-real-time to worry abouto massive batch data to integrateo External data feeds outside your controlo Requirements to do trend analysis of all your datao Pain – that forces you to reengineer every time you ask for a change to

your current data warehousing systems

Page 61: Data Vault Overview

61

Fundamental Paradigm Shift

Exploring differences in the architecture, implementation, and process design.

Page 62: Data Vault Overview

62

It’s Not Just a Data Model…

SUCCESS!

Model Methodology

Page 63: Data Vault Overview

63

Different From ANYTHING ELSE!

• The Business Rules go after the Data Warehouse!• Data is interpreted on the way OUT!• Hold on… We do distinguish between HARD and SOFT

business rules…

Ok, now tell my WHY this is important?

Page 64: Data Vault Overview

64

EDW: The Old Way of Loading

Corporate Fraud Accountability Title XI consists of seven sections. Section 1101 recommends a name for this title as “Corporate Fraud Accountability Act of 2002”. It identifies corporate fraud and records tampering as criminal offenses and joins those offenses to specific penalties. It also revises sentencing guidelines and strengthens their penalties. This enables the SEC to temporarily freeze large or unusual payments.

Source 1

Source 2

Source 3

Business RulesChangeData!

Staging

HR Mart

Sales Mart

Finance Mart

Are changes to data ON THE WAY IN to the EDW equivalent to records tampering?

Page 65: Data Vault Overview

65

EDW: The New Compliant Way

1. Implement a Raw Data Vault Data Warehouse2. Move the business rules “downstream”

Page 66: Data Vault Overview

66

Business Keys & Business Processes

Page 67: Data Vault Overview

67

Business Keys & Business

Processes

Time

ProcurementSales

$$Revenue

DeliveryContractsFinance

PlanningManufacturing

CustomerContact

Sales Procurement

SLS123 SLS123SLS123 *P123MFG

*P123MFG

Excel Spreadsheet

Manual Process

NO VISIBILITY!

Page 68: Data Vault Overview

68

Technical ReviewHub, Link, Satellite - Definitions

Page 69: Data Vault Overview

69

HUB Data Examples

SQN CUST_ACCT LOAD_DTS RECORD_SRC1 ABC123 10-14-2000 SALES2 ABC-123 10-14-2000 SALES3 *ABC-123 10-14-2000 FINANCE4 123,ABCD 10-15-2000 CONTRACTS5 PEF-2956 10-16-2000 CONTRACTS

HUB_CUST_ACCTSQN PART_NUM LOAD_DTS RECORD_SRC1 MFG-25862 10-14-2000 MANUFACT2 MFG*25266 10-14-2000 MANUFACT3 *P25862 10-14-2000 PLANNING4 MFG_25862 10-15-2000 DELIVERY5 CN*25266 10-16-2000 DELIVERY

HUB_PART_NUMBER

SEQUENCE<BUSINESS KEY>{LAST SEEN DATE}<LOAD DATE><RECORD SOURCE>

Hub Structure

} Unique Index} Optional

Page 70: Data Vault Overview

70

Link Structures

LPS_SQNPRODUCT_SQNSUPPLIER_SQNLPS_LOAD_DTSLPS_REC_SOURCELPS_ENCR_KEY

Link_Product_Supplier Link_Customer_Account_Employee

LCAE_SQNCUSTOMER_SQNACCOUNT_SQNEMPLOYEE_SQNLCAE_LOAD_DTSLCAE_REC_SOURCE

UniqueIndex

SEQUENCE<HUB KEY SQN 1><HUB KEY SQN 2><HUB KEY SQN N>{LAST SEEN DATE}{CONFIDENCE}{STRENGTH}<LOAD DATE><RECORD SOURCE>

Link Structure

Unique Index

} Optional

Dynamic Link

Page 71: Data Vault Overview

71

Satellites Split By Source System

PARENT SEQUENCELOAD DATE<LOAD-END-DATE><RECORD-SOURCE>NamePhone NumberBest time of day to reachDo Not Call Flag

SAT_SALES_CUSTPARENT SEQUENCELOAD DATE<LOAD-END-DATE><RECORD-SOURCE>First NameLast NameGuardian Full NameCo-Signer Full NamePhone NumberAddressCityState/ProvinceZip Code

SAT_FINANCE_CUST

PARENT SEQUENCELOAD DATE<LOAD-END-DATE><RECORD-SOURCE>Contact NameContact EmailContact Phone Number

SAT_CONTRACTS_CUST

PARENT SEQUENCELOAD DATE<LOAD-END-DATE><RECORD-SOURCE>{user defined descriptive data}{or temporal based timelines}

Satellite StructurePrimaryKey

Page 72: Data Vault Overview

72

Why do we build Links this way?

Page 73: Data Vault Overview

History Teaches Us…If we model for ONE relationship in the EDW, we BREAK the

others!

73

Portfolio

Customer

M

M

5 yearsFrom now X

Portfolio

Customer

M

1

10 Years ago X

Portfolio

Customer

1

MToday:

Hub Portfolio

Hub Customer

1

M

The EDW is designed to handle TODAY’S relationship, as soon as history is loaded, it breaks the model!

This situation forces re-engineering of the model, load routines, and queries!

Page 74: Data Vault Overview

74

History Teaches Us…If we model with a LINK table, we can handle ALL the

requirements!

Portfolio

Customer

M

M

5 years from now

Portfolio

Customer

1

MToday:

Portfolio

Customer

M

1

10 Years ago This design is flexible, handles past, present, and future relationship changes with NO RE-ENGINEERING!

Hub Portfolio

Hub Customer

1

M

LNKCust-Port

M

1

Page 75: Data Vault Overview

Base EDW Created in CorporateFinancials in USA

HubHub

SatSatSatSat

HubHub

SatSatSatSat

LinkLink

SatSatSatSat

Applying the Data Vault to Global

DW2.0

HubHub

SatSatSatSatLinkLink

Manufacturing EDW in China

HubHub

SatSatSatSat

Planning in Brazil

LinkLink

HubHub

SatSatSatSatLinkLink

75

Page 76: Data Vault Overview

76

Hub Customer Hub OrderLnk Cust-Order

Sat Customer Sat Order Sat Order

DASD – Raid 0+1

Each table receives it’s own I/O channel, and it’s own Raid 0+1 Disk

DASD – Raid 0+1DASD – Raid 0+1

DASD – Raid 0+1 DASD – Raid 0+1 DASD – Raid 0+1

Extreme Data Vault Partitioning

Page 77: Data Vault Overview

77

Query PerformancePoint-in-time and Bridge Tables, overcoming query issues

Page 78: Data Vault Overview

78

Purpose Of PIT & Bridge• To reduce the number of joins, and to reduce the

amount of data being queried for a given range of time.

• These two together, allow “direct table match”, as well as table elimination in the queries to occur.

• These tables are not necessary for the entire model; only when:o Massive amounts of data are foundo Large numbers of Satellites surround a Hub or Linko Large query across multiple Hubs & Links is necessaryo Real-time-data is flowing in, uninterrupted

• What are they?o Snapshot tables – Specifically built for query speed

Page 79: Data Vault Overview

79

PIT Table Architecture

Hub Custome

r

HubOrder

Hub Product

Link Line Item

SatelliteLine Item

Sat 1

Sat 2

Sat 3

Sat 4

PIT Sat

Sat 1

Sat 2

Sat 3

Sat 4

PIT Sat

Sat 1

Sat 2

PARENT SEQUENCELOAD DATE{Satellite 1 Load Date}{Satellite 2 Load Date}{Satellite 3 Load Date}{…}{Satellite N Load Date}

Satellite: Point In Time

PrimaryKey

Page 80: Data Vault Overview

80

PIT Table Example

SQN LOAD_DTS NAME1 10-14-2000 Dan L1 11-01-2000 Dan Linedt1 12-31-2000 Dan Linstedt

SAT_CUST_CONTACT_NAMESQN LOAD_DTS CELL1 10-14-2000 999-555-12121 10-15-2000 999-111-12341 10-16-2000 999-252-28341 10-17-2000 999.257-28371 10-18-2000 999-273-5555

SAT_CUST_CONTACT_CELLSQN LOAD_DTS ADDR1 08-01-200026 Prospect1 09-29-2000 26 Prosp St.1 12-17-2000 28 November1 01-01-2001 26 Prospect St

SAT_CUST_CONTACT_ADDR

SQN LOAD_DTS SAT_NAME_LDTS SAT_CELL_LDTSSAT_ADDR_LDTS1 08-01-2000 NULL NULL 08-01-20001 09-01-2000 NULL NULL 08-01-20001 10-01-2000 NULL NULL 09-29-20001 11-01-2000 11-01-2000 10-18-2000 09-29-20001 12-01-2000 11-01-2000 10-18-2000 09-29-20001 01-01-2001 12-31-2000 10-18-2000 01-01-2001

Snapshot Date

Page 81: Data Vault Overview

81

BridgeTable Architecture

Hub Seller

Hub Product

Link

Satellite

Sat 1

Sat 2

Sat 3

Sat 4

Bridge

Hub Parts

Link

Satellite

UNIQUE SEQUENCELOAD DATE{Hub 1 Sequence #}{Hub 2 Sequence #}{Hub 3 Sequence #}{Link 1 Sequence #}{Link 2 Sequence #}{…}{Link N Sequence #}{Hub 1 Business Key}{Hub 2 Business Key}{…}{Hub N Business Key}

Satellite: BridgePrimary

Key

Page 82: Data Vault Overview

82

Bridge Table Data Example

SQN LOAD_DTS SELL_SQN SELL_ID PROD_SQN PROD_NUMPART_SQN PART_NUM1 08-01-2000 15 NY*1 2756 ABC-123-9K 525 JK*2*42 09-01-2000 16 CO*24 2654 DEF-847-0L 324 MN*5-23 10-01-2000 16 CO*24 82374 PPA-252-2A 9938 DD*2*34 11-01-2000 24 AZ*25 25222 UIF-525-88 7 UF*9*05 12-01-2000 99 NM*5 81 DAN-347-7F 16 KI*9-26 01-01-2001 99 NM*5 81 DAN-347-7F 24 DL*0-5

Snapshot Date

Bridge Table: Seller by Product by Part

Page 83: Data Vault Overview

83

What WASN’T Covered• ETL Automation• ETL Implementation• SQL Query Logic• Balanced MPP design• Data Vault Modeling on Appliances• Deep Dive on Structures (Hubs, Links, Satellites)• What happens when you break the rules?• Project management, Risk management &

mitigation, methodology & approach• Automation: Automated DV modeling, Automated

ETL production• Change Management• Temporal Data Modeling Concerns… And so on…

Page 84: Data Vault Overview

84

Conclusions

Page 85: Data Vault Overview

85

Who’s Using It?

Page 86: Data Vault Overview

86

The Experts Say…“The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” Bill Inmon

“The Data Vault is foundationally strong and exceptionally scalable architecture.”

Stephen Brobst

“The Data Vault is a technique which some industry experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” Doug Laney

Page 87: Data Vault Overview

87

More Notables…

“This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.”

Howard Dresner

“[The Data Vault] captures a practical body of knowledge for data warehouse development which both agile and traditional practitioners will benefit

from..”Scott Ambler

Page 88: Data Vault Overview

88

Where To Learn More• The Technical Modeling Book:

http://LearnDataVault.com

• The Discussion Forums: & eventshttp://LinkedIn.com – Data Vault Discussions

• Contact me:http://DanLinstedt.com - web [email protected] - email

• World wide User Group (Free)http://dvusergroup.com