(otw13) agile data warehousing: introduction to data vault modeling

40
Agile Data Warehouse Modeling: Introduction to Data Vault Modeling Kent Graziano Data Warrior LLC Twitter @KentGraziano

Upload: kent-graziano

Post on 14-May-2015

3.041 views

Category:

Technology


3 download

DESCRIPTION

This is the presentation I gave at OakTable World 2013 in San Francisco. #OTW13 was held at the Children's Creativity Museum next to the Moscone Convention Center and was in parallel with Oracle OpenWorld 2013. The session discussed our attempts to be more agile in designing enterprise data warehouses and how the Data Vault Data Modeling technique helps in that approach.

TRANSCRIPT

Page 1: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Agile Data Warehouse Modeling:

Introduction to Data Vault Modeling

Kent Graziano

Data Warrior LLC

Twitter @KentGraziano

Page 2: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Agenda

Bio

What do we mean by Agile?

What is a Data Vault?

Where does it fit in an Oracle BI architecture

How to design a Data Vault model

Being “agile”

Page 3: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

My Bio

Oracle ACE Director

Certified Data Vault Master and DV 2.0 Architect

Blogger: Oracle Data Warrior

Data Architecture and Data Warehouse Specialist ● 30+ years in IT

● 20+ years of Oracle-related work

● 15+ years of data warehousing experience

Co-Author of ● The Business of Data Vault Modeling

● The Data Model Resource Book (1st Edition)

Editor of “The” Data Vault Book

Past-President of ODTUG and Rocky Mountain Oracle User Group

Page 4: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Manifesto for Agile Software Development

“We are uncovering better ways of developing

software by doing it and helping others do it.

Through this work we have come to value:

Individuals and interactions over processes and

tools

Working software over comprehensive

documentation

Customer collaboration over contract negotiation

Responding to change over following a plan

That is, while there is value in the items on the right,

we value the items on the left more.”

http://agilemanifesto.org/

Page 5: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Applying the Agile Manifesto to DW

User Stories instead of

requirements documents

Time-boxed iterations ● Iteration has a standard length

● Choose one or more user stories to fit in that

iteration

Rework is part of the game ● There are no “missed requirements”... only

those that haven’t been delivered or

discovered yet.

Page 6: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Data Vault Definition

The Data Vault is a detail oriented, historical tracking

and uniquely linked set of normalized tables that

support one or more functional areas of business.

It is a hybrid approach encompassing the best of

breed between 3rd normal form (3NF) and star

schema. The design is flexible, scalable, consistent

and adaptable to the needs of the enterprise.

Dan Linstedt: Defining the Data Vault TDAN.com Article

Architected specifically to meet the needs

of today’s enterprise data warehouses

Page 7: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

What is Data Vault Trying to Solve?

What are our other Enterprise

Data Warehouse options?

● Third-Normal Form (3NF): Complex

primary keys (PK’s) with cascading

snapshot dates

● Star Schema (Dimensional): Difficult to

reengineer fact tables for granularity

changes

Difficult to get it right the first

time

Not adaptable to rapid

business change

NOT AGILE!

(C) Kent Graziano

Page 8: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Data Vault Time Line

2000 1960 1970 1980 1990

E.F. Codd invented

relational modeling

Chris Date and

Hugh Darwen

Maintained and

Refined

Modeling

1976 Dr Peter Chen

Created E-R

Diagramming

Early 70’s Bill

Inmon Began

Discussing Data

Warehousing

Mid 60’s Dimension & Fact

Modeling presented by

General Mills and Dartmouth

University

Mid 70’s AC Nielsen

Popularized

Dimension & Fact Terms

Mid – Late 80’s Dr Kimball

Popularizes Star Schema

Mid 80’s Bill Inmon

Popularizes Data

Warehousing

Late 80’s – Barry

Devlin and Dr Kimball

Release “Business

Data Warehouse”

1990 – Dan Linstedt

Begins R&D on Data

Vault Modeling

2000 – Dan Linstedt

releases first 5

articles on Data Vault

Modeling

© LearnDataVault.com

Page 9: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Data Vault Evolution

The work on the Data Vault approach began in the

early 1990s, and completed around 1999.

Throughout 1999, 2000, and 2001, the Data Vault

design was tested, refined, and deployed into specific

customer sites.

In 2002, the industry thought leaders were asked to

review the architecture.

● This is when I attend my first DV seminar in Denver and met

Dan!

In 2003, Dan began teaching the modeling techniques

to the mass public.

(C) Kent Graziano

Page 10: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Where does a Data Vault Fit?

© LearnDataVault.com

Page 11: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Oracle Information Management Reference Architecture

Staging Layer ● Change tables

● Reject tables for Data Quality

● External tables for file feeds

Foundation Layer ● Transactional granularity

maintained

● Process neutral: no user or

business requirements

● Just recording what happened

Access and Performance

Layer ● Dimensional model

● “Star Schemas”

● Process specific: targeting user

and business requirements

Page 12: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Where does Data Vault fit?

Data Vault goes here

Page 13: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

What is a Foundation Layer?

Basis for long term enterprise scale data

warehouse

Must be atomic level data

● A historical source of facts

Not based on any one data source or system

Single point of integration

Flexible

Extensible

Provides data to the access/reporting layer

(C) Kent Graziano

Page 14: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

How to be Agile using DV and Oracle

Model iteratively ● Use Data Vault data modeling technique

● Create basic components, then add over time

Virtualize the Access Layer ● Don’t waste time building facts and dimensions up front

● ETL and testing takes too long

● “Project” objects using pattern-based DV model with OBIEE

BMM or Oracle Views

Users see real reports with real data

(C) Kent Graziano

Page 15: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Data Vault: 3 Simple Structures

© LearnDataVault.com

Page 16: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Data Vault Core Architecture

Hubs = Unique List of Business Keys

Links = Unique List of Relationships across

keys

Satellites = Descriptive Data

Satellites have one and only one parent table

Satellites cannot be “Parents” to other tables

Hubs cannot be child tables

© LearnDataVault.com

Page 17: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

1. Hub = Business Keys

Hubs = Unique Lists of Business Keys Business Keys are used to TRACK and IDENTIFY key information

(C) Kent Graziano

Page 18: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Hub Definition

What Makes a Hub Key? ● A Hub is based on an identifiable business key.

● An identifiable business key is an attribute that is used in the source systems to locate data.

● The business key has a very low propensity to change, and usually is not editable on the source systems.

● The business key has the same semantic meaning, and the same granularity across the company, but not necessarily the same format.

Attributes and Ordering ● All attributes are mandatory.

● Sequence ID 1st, Busn. Key 2nd , Load Date 3rd ,Record Source Last (4th).

● All attributes in the Business Key form a UNIQUE Index.

© LearnDataVault.com

Page 19: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

2: Links = Associations

Links = Transactions and Associations They are used to hook together multiple sets of information

(C) Kent Graziano

Page 20: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Link Definitions

What Makes a Link? ● A Link is based on identifiable business element

relationships. ● Otherwise known as a foreign key,

● AKA a business event or transaction between business keys,

● The relationship shouldn’t change over time ● It is established as a fact that occurred at a specific point in time

and will remain that way forever.

● The link table may also represent a hierarchy.

Attributes ● All attributes are mandatory

(C) LearnDataVault.com

Page 21: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Modeling Links - 1:1 or 1:M?

Today:

● Relationship is a 1:1 so why model a Link?

Tomorrow:

● The business rule can change to a 1:M.

● You discover new data later.

With a Link in the Data Vault:

● No need to change the EDW structure.

● Existing data is fine.

● New data is added.

(C) Kent Graziano

Page 22: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

3. Satellites = Descriptors

Satellites provide context for the Hubs and the Links

(C) Kent Graziano

Page 23: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Satellite Definitions

What Makes a Satellite? ● A Satellite is based on an non-identifying business

elements.

● The Satellite data changes, sometimes rapidly, sometimes slowly.

● The Satellite is dependent on the Hub or Link key as a parent,

● Satellites are never dependent on more than one parent table.

● The Satellite is never a parent table to any other table (no snow flaking).

Attributes and Ordering ● All attributes are mandatory – EXCEPT END DATE.

● Parent ID 1st, Load Date 2nd, Load End Date 3rd,Record Source Last.

(C) LearnDataVault.com

Page 24: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Satellite Entity- Details

A Satellite has only 1 foreign key; it is dependent on the parent table (Hub or Link)

A Satellite may or may not have an “Item Numbering” attribute.

A Satellite’s Load Date represents the date the EDW saw the data (must be a delta set).

● This is not Effective Date from the Source!

A Satellite’s Record Source represents the actual source of the row (unit of work).

To avoid Outer Joins, you must ensure that every satellite has at least 1 entry for every Hub Key.

(C) LearnDataVault.com

Page 25: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Data Vault Model Flexibility (Agility)

Goes beyond standard 3NF • Hyper normalized

● Hubs and Links only hold keys and meta data

● Satellites split by rate of change and/or source

• Enables Agile data modeling

● Easy to add to model without having to change existing structures and load routines

• Relationships (links) can be dropped and created on-demand.

● No more reloading history because of a missed requirement

Based on natural business keys • Not system surrogate keys

• Allows for integrating data across functions and source systems more easily

● All data relationships are key driven.

© LearnDataVault.com

Page 26: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Data Vault Extensibility

Adding new components to

the EDW has NEAR ZERO

impact to:

• Existing Loading

Processes

• Existing Data Model

• Existing Reporting & BI

Functions

• Existing Source Systems

• Existing Star Schemas

and Data Marts

© LearnDataVault.com

Page 27: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Standardized modeling rules • Highly repeatable and learnable modeling technique

• Can standardize load routines

● Delta Driven process

● Re-startable, consistent loading patterns.

• Can standardize extract routines

● Rapid build of new or revised Data Marts

• Can be automated

‣ Can use a BI-meta layer to virtualize the reporting structures

‣ Example: OBIEE Business Model and Mapping tool

‣ Can put views on the DV structures as well

‣ Simulate ODS/3NF or Star Schemas

Data Vault Productivity

(C) Kent Graziano

Page 28: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

• The Data Vault holds granular historical relationships.

• Holds all history for all time, allowing any source system feeds to be reconstructed on-demand

• Easy generation of Audit Trails for data lineage and compliance.

• Data Mining can discover new relationships between elements

• Patterns of change emerge from the historical pictures and linkages.

• The Data Vault can be accessed by power-users

© LearnDataVault.com

Data Vault Adaptability

Page 29: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Other Benefits of a Data Vault

Modeling it as a DV forces integration of the Business Keys upfront. • Good for organizational alignment.

An integrated data set with raw data extends it’s value beyond BI: • Source for data quality projects

• Source for master data

• Source for data mining

• Source for Data as a Service (DaaS) in an SOA (Service Oriented Architecture).

Upfront Hub integration simplifies the data integration routines required to load data marts.

• Helps divide the work a bit.

It is much easier to implement security on these granular pieces.

Granular, re-startable processes enable pin-point failure correction.

It is designed and optimized for real-time loading in its core architecture (without any tweaks or mods).

© LearnDataVault.com

Page 30: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
Page 31: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Worlds Smallest Data Vault

The Data Vault doesn’t have to be “BIG”.

An Data Vault can be built incrementally.

Reverse engineering one component of the existing models is not uncommon.

Building one part of the Data Vault, then changing the marts to feed from that vault is a best practice.

The smallest Enterprise Data Warehouse consists of two tables: ● One Hub,

● One Satellite

Hub_Cust_Seq_ID

Hub_Cust_Num

Hub_Cust_Load_DTS

Hub_Cust_Rec_Src

Hub Customer

Hub_Cust_Seq_ID

Sat_Cust_Load_DTS

Sat_Cust_Load_End_DTS

Sat_Cust_Name

Sat_Cust_Rec_Src

Satellite Customer Name

© LearnDataVault.com

Page 32: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Notably…

In 2008 Bill Inmon stated that the “Data Vault

is the optimal approach for modeling the EDW

in the DW2.0 framework.” (DW2.0)

The number of Data Vault users in the US

surpassed 500 in 2010 and grows rapidly

(http://danlinstedt.com/about/dv-customers/)

Page 33: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Organizations using Data Vault

WebMD Health Services

Anthem Blue-Cross Blue Shield

MD Anderson Cancer Center

Denver Public Schools

Independent Purchasing Cooperative (IPC, Miami) • Owner of Subway

Kaplan

US Defense Department

Colorado Springs Utilities

State Court of Wyoming

Federal Express

US Dept. Of Agriculture

Page 34: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

What’s New in DV2.0?

Modeling Structure Includes… ● NoSQL, and Non-Relational DB systems, Hybrid Systems

● Minor Structure Changes to support NoSQL

New ETL Implementation Standards ● For true real-time support

● For NoSQL support

New Architecture Standards ● To include support for NoSQL data management systems

New Methodology Components ● Including CMMI, Six Sigma, and TQM

● Including Project Planning, Tracking, and Oversight

● Agile Delivery Mechanisms

● Standards, and templates for Projects

© LearnDataVault.com

Page 35: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Conclusion?

Changing the direction of the river takes less

effort than stopping the flow of water © LearnDataVault.com

Page 36: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Summary

• Data Vault provides a data

modeling technique that

allows:

‣ Model Agility

‣ Enabling rapid changes and additions

‣ Productivity

‣ Enabling low complexity systems with high

value output at a rapid pace

‣ Easy projections of dimensional models

‣ So? Agile Data Warehousing?

Page 37: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Super Charge Your Data Warehouse

Available on Amazon.com

Soft Cover or Kindle Format

Now also available in PDF at

LearnDataVault.com

Hint: Kent is the Technical

Editor

Page 38: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Data Vault References

www.learndatavault.com

www.danlinstedt.com

On LinkedIn:

http://www.linkedin.com/groups?gid=44926

On YouTube:

www.youtube.com/LearnDataVault

On Facebook:

www.facebook.com/learndatavault

Page 39: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
Page 40: (OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling

Contact Information

Kent Graziano

The Oracle Data Warrior

Data Warrior LLC

[email protected]

Visit my blog at

http://kentgraziano.com