agile data engineering - intro to data vault modeling (2016)

Post on 14-Jan-2017

593 Views

Category:

Data & Analytics

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

KENT GRAZIANO

AGILE DATA ENGINEERING: INTRODUCTION TO DATA VAULT DATA

MODELING

@KentGraziano kentgraziano.com

2

Agenda

Bio

What do we mean by Agile?

What is a Data Vault?

Where does it fit in an DW/BI architecture

How to design a Data Vault model

Being “agile” with Data Vault

What’s new in DV 2.0

3

My Bio

› Senior Technical Evangelist, Snowflake Computing› Oracle ACE Director (BI/DW)› Certified Data Vault Master and DV 2.0 Practitioner› Data Modeling, Data Architecture and Data Warehouse

Specialist• 30+ years in IT• 25+ years of Oracle-related work• 20+ years of data warehousing experience

› Member – DAMA Houston› Former-Member: Boulder BI Brain Trust (

http://www.boulderbibraintrust.org/)› Author & Co-Author of a bunch of books

• The Business of Data Vault Modeling • The Data Model Resource Book (1st Edition)

› Blogger: The Data Warrior› Past-President of Oracle Development Tools User Group

and Rocky Mountain Oracle User Group

4

Manifesto for Agile Software Development

“We are uncovering better ways of developing software by doing it and helping others do it.

Through this work we have come to value:

Individuals and interactions over processes and tools

Working software over comprehensive documentation

Customer collaboration over contract negotiation

Responding to change over following a plan

That is, while there is value in the items on the right, we value the items on the left more.”

http://agilemanifesto.org/

5

Applying the Agile Manifesto to DW

(C) Kent Graziano

User Stories instead of requirements documents

Time-boxed iterations› Iteration has a standard length› Choose one or more user stories to fit in that

iteration

Rework is part of the game› There are no “missed requirements”... only those

that haven’t been delivered or discovered yet.

6

Data Vault Definition

TDAN.com Article

The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of

business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable,

consistent and adaptable to the needs of the enterprise.

Architected specifically to meet the needs of today’s enterprise data warehouses

DAN LINSTEDT: Defining the Data Vault

7

What is Data Vault Trying to Solve?

(C) Kent Graziano

What are our other Enterprise Data Warehouse options?› Third-Normal Form (3NF): Complex primary keys (PK’s)

with cascading snapshot dates› Star Schema (Dimensional): Difficult to reengineer fact

tables for granularity changes

Difficult to get it right the first time

Not adaptable to rapid business change

NOT AGILE!

8

Data Vault Time Line

© LearnDataVault.com

20001960 1970 1980 1990

E.F. Codd invented relational modeling

Chris Date and Hugh Darwen Maintained and Refined Modeling

1976 Dr Peter ChenCreated E-R Diagramming

Early 70’s Bill Inmon Began Discussing Data Warehousing

Mid 60’s Dimension & Fact Modeling presented by General Mills and Dartmouth University

Mid 70’s AC Nielsen PopularizedDimension & Fact Terms

Mid – Late 80’s Dr Kimball Popularizes Star Schema

Mid 80’s Bill InmonPopularizes Data Warehousing

Late 80’s – Barry Devlin and Dr Kimball Release “Business Data Warehouse”

1990 – Dan Linstedt Begins R&D on Data Vault Modeling

2000 – Dan Linstedt releases first 5 articles on Data Vault Modeling

9

Data Vault Evolution

(C) Kent Graziano

The work on the Data Vault approach began in the early 1990s, and completed around

1999.

Throughout 1999, 2000, and 2001, the Data Vault design was tested, refined, and deployed

into specific customer sites.

In 2002, the industry thought leaders were asked to review the architecture.

This is when I attend my first DV seminar in Denver and met Dan!

In 2003, Dan began teaching the modeling techniques to the mass public.

In 2014, Dan introduced DV 2.0!

10

Where does a Data Vault Fit?

© LearnDataVault.com

STAGING EDWDATA VAULT

DATA MARTS(STAR SCHEMAS)

DATA MARTS(STAR SCHEMAS)

DATA MARTS(STAR SCHEMAS)

11

Where does Data Vault fit?

©Oracle Corp

Data Vault goes here

12

Data Vault: 3 Simple Structures

© LearnDataVault.com

EDWDATA VAULT

HUB

LINK

SATELITE

01

02

03

13

Data Vault Core Architecture

© LearnDataVault.com

HUBS

Unique List of Business Keys

LINKS

Unique List of Relationships across keys

SATELITES

Descriptive Data

› Satellites have one and only one parent table› Satellites cannot be “Parents” to other tables› Hubs cannot be child tables

14

Common Attributes

© LearnDataVault.com

Required – all structures

› Primary key – PK› Load date time

stamp – DTS› Record source –

REC_SRC

Required – Satellites only

› Load end date time stamp – LEDTS

› Optional in DV 2.0

Optional – Hubs & Links only

› Last seen dates – LSDTs

› MD5KEY (REQUIRED IN DV 2.0)

Optional – Satellites only

› Load sequence ID – LDSEQ_ID

› Update user – UPDT_USER

› Update DTS – UPDT_DTS

› MD5DIFF

15

1. Hub = Business Keys

(C) Kent Graziano

Hubs = Unique Lists of Business KeysBusiness Keys are used to TRACK and IDENTIFY key information

New: DV 2.0 uses MD5 Hash of the BK for the PK

16

2: Links = Associations

(C) Kent Graziano

Links = Transactions and AssociationsThey are used to hook together multiple sets of information

In DV 2.0 the BK attributes may migrate to the Links for faster query

17

Modeling Links - 1:1 or 1:M?

(C) Kent Graziano

Today Tomorrow With a Link in The Data Vault

Relationship is a 1:1 so why model a Link?

The business rule can change to a 1:M.

You discover new data later.

No need to change the EDW structure.

Existing data is fine.

New data is added.

18

3. Satellites = Descriptors

(C) Kent Graziano

Satellites provide context for the Hubs and the LinksTracks changes over time - Like SCD 2

In DV 2.0 use HASH_DIFF to detect changes

19

Data Vault Model Flexibility (Agility)

(C) Kent Graziano

Goes beyond standard 3NF

Based on natural business keys

Hyper normalized› Hubs and Links only hold keys and meta data› Satellites split by rate of change and/or source

Enables Agile data modeling› Easy to add to model without having to change existing

structures and load routines• Relationships (links) can be dropped and created on-demand.

› No more reloading history because of a missed requirement

Not system surrogate keys

Allows for integrating data across functions and source systems more easily› All data relationships are key driven

20

Data Vault Extensibility

(C) LearnDataVault.com

Adding new components to the EDW has NEAR ZERO impact to:

› Existing Loading Processes

› Existing Data Model› Existing Reporting &

BI Functions› Existing Source

Systems› Existing Star

Schemas and Data Marts

21

Data Vault Productivity

(C) Kent Graziano

› Standardized modeling rules• Highly repeatable and learnable modeling

technique

• Can standardize load routineso Delta Driven processo Re-startable, consistent loading patterns.

• Can standardize extract routineso Rapid build of new or revised Data Marts

• Can be automated

• Can use a BI-meta layer to virtualize the reporting structureso Example: OBIEE Business Model and

Mapping toolo Example: BOBJ Universe Business Layer

• Can put views on the DV structures as wello Simulate ODS/3NF or Star Schemas

22

Data Vault Adaptability

(C) Kent Graziano

› The Data Vault holds granular historical relationships.• Holds all history for all time, allowing any

source system feeds to be reconstructed on-demando Easy generation of Audit Trails for data

lineage and compliance.

o Data Mining can discover new relationships between elements

o Patterns of change emerge from the historical pictures and linkages.

› The Data Vault can be accessed by power-users

23

Other Benefits of a Data Vault

(C) Kent Graziano

› Modeling it as a DV forces integration of the Business Keys upfront• Good for organizational alignment

› An integrated data set with raw data extends it’s value beyond BI:• Source for data quality projects• Source for master data • Source for data mining • Source for Data as a Service (DaaS) in

an SOA (Service Oriented Architecture).

24

Other Benefits of a Data Vault

(C) Kent Graziano

› Upfront Hub integration simplifies the data integration routines required to load data marts.• Helps divide the work a bit.

› It is much easier to implement security on these granular pieces.

› Granular, re-startable processes enable pin-point failure correction.

› It is designed and optimized for real-time loading in its core architecture (without any tweaks or mods).

25

How to be Agile using DV

(C) Kent Graziano

Model iteratively› Use Data Vault data

modeling technique› Create basic components,

then add over time

Virtualize the Access Layer› Don’t waste time building

facts and dimensions up front

ETL and testing takes too long› “Project” objects using

pattern-based DV model with database views (or BI meta layer)

Users see real reports with real data

› Can always build out for performance in another iteration

26

WHAT IS

THE WORLD'S SMALLEST DATA VAULT?

27

Worlds Smallest Data Vault

© LearnDataVault.com

Hub CustomerHub_Cust_Seq_ID

Hub_Cust_NumHub_Cust_Load_DTSHub_Cust_Rec_Src

Hub_Cust_Seq_IDSat_Cust_Load_DTS

Sat_Cust_Load_End_DTSSat_Cust_NameSat_Cust_Rec_Src

Satellite Customer Name

› The Data Vault doesn’t have to be “BIG”.

› A Data Vault can be built incrementally.

› Reverse engineering one component of the existing models is not uncommon.

› Building one part of the Data Vault, then changing the marts to feed from that vault is a best practice.

› The smallest Enterprise Data Warehouse consists of two tables: • One Hub, • One Satellite

28

Notably…

› In 2008 Bill Inmon stated that the “Data Vault is the optimal approach for modeling the EDW in the DW2.0 framework.” (DW2.0)

› The number of Data Vault users in the US surpassed 500 in 2010 and grows rapidly (http://danlinstedt.com/about/dv-customers/)

29

Organizations using Data Vault

› WebMD Health Services

› Anthem Blue-Cross Blue Shield

› MD Anderson Cancer Center

› Denver Public Schools

› Independent Purchasing Cooperative (IPC, Miami) • Owner of Subway

› Kaplan

› US Defense Department

› Colorado Springs Utilities

› State Court of Wyoming

› Federal Express

› US Dept. Of Agriculture

30

What’s New in DV2.0?

© LearnDataVault.com

Modeling Structure Includes…

› NoSQL, and Non-Relational DB systems, Hybrid Systems

› Minor Structure Changes to support NoSQL

01 02 03 04

New ETL Implementation Standards

› For true real-time support

› For NoSQL support

New Architecture Standards

› To include support for NoSQL data management systems

New Methodology Components

› Including CMMI, Six Sigma, and TQM

› Including Project Planning, Tracking, and Oversight

› Agile Delivery Mechanisms

› Standards, and templates for Projects

31

What’s New in DV2.0?

This model is fully compliant with Hadoop, needs NO changes to work properly

Note: Business Keys replicated to the Link structure for “join” capabilities on the way out to Data Marts.

© LearnDataVault.com

32

Summary

Data Vault provides a data modeling technique that allows:

Model Agility Productivity So? Agile Data Warehousing?

01 02 03

› Enabling rapid changes and additions

› Enabling low complexity systems with high value output at a rapid pace

› Easy projections of dimensional models

33

› Available on Amazon:

http://www.amazon.com/Better-Data-Modeling-Introduction-Engineering-ebook /dp/ B018BREV1C/

Shameless Plug:

34

› Available on Amazon.com

› Soft Cover or Kindle Format

› Now also available in PDF at LearnDataVault.com

› Hint: Kent is the Technical Editor

Super Charge Your Data Warehouse

35

› Available on Amazon:

http://www.amazon.com/Building-Scalable-Data-Warehouse-Vault/dp/0128025107/

New DV 2.0 Book

36

Register at wwdvc.com

37

Data Vault References

www.youtube.com/LearnDataVault www.facebook.com/learndatavault

www.learndatavault.comwww.danlinstedt.com

38

QUESTIONS?

39

Contact Information

KENT GRAZIANOSnowflake Computingwww.snowflake.net

kent.graziano@snowflake.net

@KentGraziano

http://kentgraziano.com

top related