operational data vault

105
Data Vault: What’s Next? © Dan Linstedt, 2011-2012 all rights reserved 1

Upload: empowered-holdings-llc

Post on 19-Jan-2015

12.923 views

Category:

Technology


8 download

DESCRIPTION

I gave this presentation at the Advanced Architecture Conference, Bill Inmon, 2011 in Evergreen, Colorado. This presentation covers a new breed of data warehousing called Operational Data Warehousing. These are the next steps in business intelligence towards self-service BI and enabling users to do more with their enterprise data warehouse solution. Specifically, it talks about how the Data Vault model fits in to this picture. If you would like to use the slides, please e-mail me first, I'd be happy to discuss it with you.

TRANSCRIPT

Page 1: Operational Data Vault

1

Data Vault:What’s Next?

© Dan Linstedt, 2011-2012 all rights reserved

Page 2: Operational Data Vault

2

Agenda• Introduction – why are you here?• Short Data Vault Review• What’s Next? Advanced Architecture…• Defining Operational Data Warehousing• Why is Data Vault a Good Fit?• <BREAK>• Fundamental Paradigm Shift• Business Keys & Business Processes• Technical Review• Query Performance (PIT & Bridge)• What wasn’t covered in this presentation…

Page 3: Operational Data Vault

3

A bit about me…• Author, Inventor, Speaker – and part

time photographer…• 25+ years in the IT industry• Worked in DoD, US Gov’t, Fortune 50,

and so on…

• Find out more about the Data Vault:o http://YouTube.com/LearnDataVaulto http://LearnDataVault.com

• Slides available:o http://SlideShare.neto Search: “Advanced Architecture Data Vault”

• Full profile on http://www.LinkedIn.com/dlinstedt

Page 4: Operational Data Vault

4

Why Are You Here?• Your Expectations?• Your Questions?• Your Background?• Areas of Interest?

• Biggest question:

What are the top 3 pains your current EDW / BI solution is experiencing?

Page 5: Operational Data Vault

5

Short Data Vault ReviewWhat is it and where did it come from?

Page 6: Operational Data Vault

Data Warehousing Timeline

20001960 1970 1980 1990

E.F. Codd invented relational modeling

Chris Date and Hugh Darwen Maintained and Refined Modeling

1976 Dr Peter ChenCreated E-R Diagramming

Early 70’s Bill Inmon Began Discussing Data Warehousing

Mid 60’s Dimension & Fact Modeling presented by General Mills and Dartmouth University

Mid 70’s AC Nielsen PopularizedDimension & Fact Terms

Mid – Late 80’s Dr Kimball Popularizes Star Schema

Mid 80’s Bill InmonPopularizes Data Warehousing

Late 80’s – Barry Devlin and Dr Kimball Release “Business Data Warehouse”

1990 – Dan Linstedt Begins R&D on Data Vault Modeling

2000 – Dan Linstedt releases first 5 articles on Data Vault Modeling

2010

2010- DVAlive and WellAround theWorld

Page 7: Operational Data Vault

7

Data Vault Modeling…

Took 10 years of Research and Design, including TESTING

to become flexible, consistent, and

scalable

Page 8: Operational Data Vault

8

What IS a Data Vault? (Business

Definition)

• Data Vault Modelo Detail orientedo Historical traceabilityo Uniquely linked set of

normalized tableso Supports one or more

functional areas of business

ProcurementSales DeliveryContracts

FinancePlanning

Operations

Business KeysSpan / CrossLines of Business

Functional Area

• Data Vault Methodology– CMMI, Project Plan– Risk, Governance, Versioning– Peer Reviews, Release Cycles– Repeatable, Consistent,

Optimized– Complete with Best Practices

for BI/DW

Page 9: Operational Data Vault

9

Supply Chain Analogy

Data Vault(EDW)

Source Systems

Data Marts

Page 10: Operational Data Vault

10

What Does One Look Like?

Customer

Sat

Sat

Sat

F(x)

Customer

Product

Sat

Sat

Sat

F(x)

Product

Order

Sat

Sat

Sat

F(x)

Order

Elements:•Hub•Link•Satellite

Link

F(x)

Sat

Records a history of the interaction

Hub = List of Unique Business KeysLink = List of Relationships, AssociationsSatellites = Descriptive Data

Page 11: Operational Data Vault

HUB

LINK

Satellite

Satellite

Colorized Perspective…Data Vault

Details

Business Keys

Associations

The Data Vault uniquely separates the Business Keys (Hubs) from the Associations (Links) and both of these from the Details that describe them and provide context (Satellites).

3rd NF & Star Schema

11

(separation)

(Colors Concept Originated By: Hans Hultgren)

Page 12: Operational Data Vault

12

A Quick Look at Methodology IssuesBusiness Rule Processing, Lack of Agility, and

Future proofing your new solution

Page 13: Operational Data Vault

13

EDW Architecture: Generation 1

• Quality routines• Cross-system dependencies• Source data filtering• In-process data manipulation

• High risk of incorrect data aggregation• Larger system = increased impact• Often re-engineered at the SOURCE• History can be destroyed (completely re-computed)

Sales

Finance

Contracts

Staging(EDW)

StarSchemas

Enterprise BI Solution

(batch)

Conformed DimensionsJunk Tables

Helper TablesFactless Facts

ComplexBusiness

Rules+Dependencies

Complex Business Rules #2

Staging + History

Page 14: Operational Data Vault

14

#1 Cause of BI Initiative Failure

Re-EngineeringFor

Every Change!

Anyone?

Let’s take a look at one example…

Page 15: Operational Data Vault

15

Re-Engineering

Customer

CustomerTransactions

Sales

Finance

Current Sources

Source

Join

BusinessRules

Data Flow (Mapping)

CustomerPurchases

** NEW SYSTEM**

IMPACT!!

Page 16: Operational Data Vault

16

Federated Star Schema Inhibiting

Agility

Time

Effort& Cost

High

Low

Start MaintenanceCycle Begins

Changing and Adjusting conformed dimensions causes an exponential rise in the cost curve over time

RESULT: Business builds their own Data Marts!

Data Mart 1

Data Mart 2

Data Mart 3

The main driver for this is the maintenance costs, and re-engineering of the existing system which occurs for each new “federated/conformed” effort. This increases delivery time, difficulty, and maintenance costs.

Page 17: Operational Data Vault

17

EDW Architecture: Generation 2

Sales

Finance

Contracts

Staging EDW(Data Vault)

StarSchemas

ErrorMarts

ReportCollections

Enterprise BI SolutionSOA

(real-time)

(batch)

(batch)

ComplexBusiness

Rules

The business rules are moved closer to the business, improving IT reaction time, reducing cost and minimizing

impacts to the enterprise data warehouse (EDW)

• Repeatable• Consistent• Fault-tolerant• Supports phased release

• Scalable• Auditable

FUNDAMENTAL GOALS

Unstructured

Data

Page 18: Operational Data Vault

18

NO Re-Engineering

Customer

CustomerTransactions

Sales

Finance

Current Sources

StageCopy

StageCopy

HubCustome

r

HubAcct

HubProduc

t

Link Transacti

on

Data Vault

CustomerPurchases

** NEW SYSTEM**

StageCopy

IMPACT!!

NO IMPACT!!!NO RE-ENGINEERING!

Page 19: Operational Data Vault

19

Progressive Agility and Responsiveness of

IT

Time

Effort& Cost

High

Low

Start MaintenanceCycle Begins

Foundational Base Built

New Functional Areas AddedInitial DV Build Out

Re-Engineering does NOT occur with a Data Vault Model. This keeps costs down, and maintenance easy. It also reduces complexity of the existing architecture.

Page 20: Operational Data Vault

20

Why is Data Vault a Good Fit?

Page 21: Operational Data Vault

21

What are the top business

obstacles in your data warehouse

today?

Page 22: Operational Data Vault

22

Poor Agility

Inconsistent Answer Sets

Needs Accountability

Demands Auditability

Desires IT Transparency

Are you feeling Pinned Down?

Page 23: Operational Data Vault

23

What are the top technology

obstacles in yourdata warehouse

today?

Page 24: Operational Data Vault

24

Complex Systems

Real-Time Data Arrival

Unimaginable Data Growth

Master Data Alignment

Bad Data Quality

Late Delivery/Over Budget

Are your systems CRUMBLING?

Page 25: Operational Data Vault

25

Have lead you down a

painful path…

Yugo

Worlds Worst Car

Existing Solutions

Page 26: Operational Data Vault

26

Projects Cancelled & Restarted

Re-engineering required to absorb new systemsComplexity drives

maintenance cost Sky highDisparate Silo Solutions

provide inaccurate answers!Severe lack of

Accountability

Page 27: Operational Data Vault

27

There must be a better way…

There IS a better way!

How can you overcome

these obstacles?

Page 28: Operational Data Vault

28

It’s Called the

Data Vault Model

and Methodology

Page 29: Operational Data Vault

29

What is it?

It’s a simpleEasy-to-use

PlanTo build your

valuableData Warehouse!

Page 30: Operational Data Vault

30

Uncomplicated Design

Simple Build-out

Rapid Adaptability

Understandable Standards

Effortless Scalability

Painless Auditability

Pursue Your Goals!

What’s the Value?

Page 31: Operational Data Vault

31

Why Bother With Something New?

Old Chinese proverb: 'Unless you change direction, you're apt to end up where you're headed.'

Page 32: Operational Data Vault

32

What Are the Issues?

This is NOT what you want happening to your project!

THE GAP!!

Page 33: Operational Data Vault

33

What Are the Foundational Keys?

Flexibility

Scalability

Productivity

Page 34: Operational Data Vault

34

Key: Flexibility

Enabling rapid change on a massive scale without downstream impacts!

Page 35: Operational Data Vault

35

Key: Scalability

Providing no foreseeable barrier to increased size and scope

People, Process, & Architecture!

Page 36: Operational Data Vault

36

Key: Productivity

Enabling low complexity systems with high value output at a rapid

pace

Page 37: Operational Data Vault

37

How does it work?Bringing the Data Vault to Your Project

Page 38: Operational Data Vault

38

Key: Flexibility

Adding new components to the EDW has NEAR ZERO impact to:• Existing Loading Processes• Existing Data Model• Existing Reporting & BI Functions• Existing Source Systems• Existing Star Schemas and Data Marts

No Re-

Engineeri

ng!

Page 39: Operational Data Vault

39

Case In Point:

Result of flexibility of the Data Vault Model allowed them to merge 3 companies in 90 days – that is ALL systems, ALL DATA!

Page 40: Operational Data Vault

40

Key: Scalability in Architecture

Scaling is easy, its based on the following principles• Hub and spoke design• MPP Shared-Nothing Architecture• Scale Free Networks

Page 41: Operational Data Vault

41

Case In Point:

Result of scalability was to produce a Data Vault model that scaled to 3 Petabytes in size, and is still growing today!

Page 42: Operational Data Vault

42

Key: Scalability in Team Size

You should be able to SCALE your TEAM as well!With the Data Vault methodology, you can:

Scale your team when desired, at different points in the project!

Page 43: Operational Data Vault

43

Case In Point:(Dutch Tax Authority)

Result of scalability was to increase ETL developers for each new source system, and reassign them when the system was completely loaded to the Data Vault

Page 44: Operational Data Vault

44

Key: Productivity

Increasing Productivity requires a reduction in complexity.The Data Vault Model simplifies all of the following:• ETL Loading Routines• Real-Time Ingestion of Data• Data Modeling for the EDW• Enhancing and Adapting for Change to the Model• Ease of Monitoring, managing and optimizing

processes

Page 45: Operational Data Vault

45

Case in Point:Result of Productivity was: 2 people in 2 weeks merged 3 systems, built a full Data Vault EDW, 5 star schemas and 3 reports.

These individuals generated:• 90% of the ETL code for moving the data

set• 100% of the Staging Data Model• 75% of the finished EDW data Model• 75% of the star schema data model

Page 46: Operational Data Vault

46

The Competing Bid?

The competition bid this with 15 people and 3 months to completion, at a cost of $250k! (they bid a Very complex system)

Our total cost? $30k and 2 weeks!

Page 47: Operational Data Vault

47

Results?

Changing the direction of the river takes less effort than stopping the flow

of water

Page 48: Operational Data Vault

48

< BREAK TIME >

Page 49: Operational Data Vault

49

What’s Next?A look at what’s around the corner for Data Warehousing and Business Intelligence, believe me, it’s going to get

interesting fast.

Page 50: Operational Data Vault

50

Operational Data VaultData Co-Location:• Transactions & Transaction History• Master Data & Master Data History• Metadata & Metadata History• External Data & External Data

History• Business Rules & Business Rule

History• Security / Access data & History• Unstructured Data Ties & History• Real-time Data Feeds DIRECTLY in

to the data store

Operational Applications ON TOP of the warehouse!

Page 51: Operational Data Vault

51

Extreme Automation!Automated Creation of Data Models:• Staging Models• Data Vault Models• Star Schema Models• Cube Models• Excel Models (spreadsheets)• Data Mining Models (table structures)

Automated Creation of ETL Processes:• Staging Loads• Data Vault (Data Warehouse Loads)• Star Schema Loads (80% solutions)• Cube Loads (80% solutions)• Excel Loads / Queries (80% solutions)• Data Mining Queries (80% solutions)

Other Automated Components:• Initial Metadata Population• Initial Master Data Population• Generated Testing Scripts

http://www.jmorganmarketing.com/should-social-crm-be-automated/

Page 52: Operational Data Vault

52

Results of all of this?EDW Will:• become BACK OFFICE!!• become SELF-RELIANT /

SELF-HEALING• adapt to new structures,

new hardware, and new data

• automatically backup and remove old data

Self-Reliance

http://images.businessweek.com/ss/06/10/bestunder25/source/1.htm

Page 53: Operational Data Vault

53

How Long Will it Take?My milestone predictions:• 1 yr: Operational Data Vault• 2 yrs: Beginning

automation of business rules

• 3 yrs: Beginning dynamic restructuring in the DV

• 4 yrs: Oper Apps contain BI & metadata & Master data GUI’s in a single place

• 5 yrs: the “all-in-one” appliance, containing 75% of what we need at the firmware levels to do all these things

http://thypolarlife.wordpress.com/2011/08/02/this-moment-in-time/

Page 54: Operational Data Vault

54

Why Should I Care?

• Because the Data Warehouse combined with the operational applications on top, make for a self-service BI environment

• Because this technology is the heart of Data Warehousing!

• Because the future is now• Because it will happen with or

without you… You do want a job right?

Page 55: Operational Data Vault

What About Tooling?

55

Auto-

matio

nOntolog

y

Cross-Referenc

e

ConfigTemplate

s

Source DDL

Target DDL

New Models

ETL Code

Documentation

Test Data

SQL Code

DataPatterns

Page 56: Operational Data Vault

56

Who’s Tooling Today?

WhereScapeQuipu

RapidACE

BI-ReadyCentennium

AnalytixDS

Nexus

Page 57: Operational Data Vault

57

What Does It Add Up To?

Page 58: Operational Data Vault

58

What’s the Key Ingredient?

Page 59: Operational Data Vault

59

Defining Operational Data WarehousingWhat is an ODW and How did we get here?

Page 60: Operational Data Vault

60

What IS An Operational DW?

• A raw, time-variant, integrated, non-volatile data warehouse, on top of which sits an operational application – “editing and changing data”.

• However, instead of updates and deletes in place, the data is “marked” deleted, and updates are turned in to Inserts, creating a delta audit trail along the way.

• Yes, it’s an operational application on top of the integrated data warehouse (or in this case, Data Vault model).

Page 61: Operational Data Vault

61

Mid 90’s “Active” DWBecomes ImportantBut has to wait for TechnologyTo Catch Up!

Oper/Active DW Timeline

20001980 1990

Data WarehousesSplit From OperationalSystems

2010

Real-Time & Oper BIMake the Scene(Users Want DirectControl & Up to the Minute Data)Teradata

makesReal advances in Active DW“Appliances” begin appearingOn-scene

2002 - Cendant-TRGCreates Worlds FirstOperational Data Vault

Page 62: Operational Data Vault

62

How Did We Get Here?

Parts are © Teradata – Stephen Brobst, CTO

7

DDW

Dynamic Alterations

To StructureSystem Of

Record

How do you dynamically adapt

to business?

15432

Event Based

Triggering TakesHold

PrimarilyBatch

Increase inAd-HocQueries

AnalyticalModeling

Grows

Continuous Update &

Time Sensitive QueriesBecome

Important

ActivateOperationalizePredictAnalyzeReportWhat

Happened?WHY did

it happen?

What WILL

Happen?

What IS happening?

What do you

WANT to Happen?

6

ODW

Application Direct Edits to

Data in the EDW

Can you change what is

happening?

Page 63: Operational Data Vault

63

ODV Overview

ODVDirect

Inserts

NO

STAGING

AREA

Web-Services(Direct Feeds)

Applications(Direct edits)

Virtual Marts(Direct Access)

Metadata Rules(Direct Edits)Batch Loads

(Direct Feeds)

Unstructured Feeds(Indirect Feeds)

Page 64: Operational Data Vault

64

What is the architecture?

Data Vault EDW• Stored• Analyzed / Scored

Virtual Marts

Real-TimeMiningEngine

Staging Area

Non-S.O.R.Historical Batch Data

SORReal-Time Data

Real-TimeCollector

Web Interface (usually)

OperationalSystems

OperationalAlerts

StrategicReports& OLAP

Operational Systems

UnstructuredSemi-Structured

Non-SORBatch Data

OperationalApplicationsMaster Data

OperationalMetadata

Management

Direct Edits

Direct Edits

• Flexible• Accountable• Compliant • Scalable

• Normalized• Dynamic• Granular• Historic• Integrated by business

key

MasterData

Page 65: Operational Data Vault

65

What must an ODW have?

• Operational Application(s) on-top of the single data store

• All the up-time and maintenance requirements of a standard operational application (24x7x365, 6 9’s reliance, etc…)

• Inflow and outflow of information; bi-directional data flow to & from the service bus (SOA/ESB, etc..)

• Capacity to incorporate and store existing batch loads and accept real-time data from other feeds

• Ability to interface with unstructured data sets

• All the inherent design necessities of an EDW

Page 66: Operational Data Vault

66

Why should I care?TWO REASONS:

Page 67: Operational Data Vault

67

Under the Covers…

Hub Seller

Hub Product

Link

Satellite

Sat 1

Sat 2

Sat 3

Sat 4

Hub Parts

Link

Satellite

Application

Data AccessControl Layer

OperationalData Vault

(ODW) Layer

1. Read Data for Edit

2. Lock Business Key Rows

3. Present in GUI

4. Accept Ins, Upd, Del

5. Perform Insert / Status change

6. Release Lock On Business Key Rows

Presents Data to User in Conformed Screens

Page 68: Operational Data Vault

68

Dropping by the Way-Side

• No…o ETLo BATCH DRIVEN PROCESSINGo “Synchronization” with the Source Systemo missing source data

o No scalability problemso No ODS needed!o No “Master Data” system neededo No Staging area needed

Page 69: Operational Data Vault

69

Positives• Data in the ODW can be governed• Audit trail built in• Delta’s only are stored• NEW applications can be created to

“automatically” generate Cubes/Star Schemas – these apps can be run by the users…

• Self-Service BI is enabled!• Master data can be “marked, scored,

stored” in the same place as the EDW

Page 70: Operational Data Vault

70

Old Components Still There?

• Staging areas will exist as long as there is external data to load and integrate

• ODS areas may still exist as long as there are other legacy applications existing as source systems

• Master Data areas may still exist as long as the logic is not built directly in to the “operational DW application”

Page 71: Operational Data Vault

71

Secure ODV Technical Layers

Common Data Object Area

Database Interface

Local DB Interface

Global DB Interface

Persistence Cache DB InterfaceSecurity Interface(Encryption Too)

Logging Interface

Scheduling Interface

Notification Interface

File Management

Interface

Inbound APIVisible Objects Outbound API

Authentication API

Pedigree API

Aggregation API

Security Key MgrAPI

Kit API

Transaction API

Packaging API

Master Data API

Busn. Intelligence API

Services

Format Interface

Web Server Locally BasedPersistent DB Cache for

Joining

Web Server Locally BasedPersistent DB Cache for

JoiningGlobal DB Local DB1 Local DB2

ComponentGroups

Vault Accessibility Subject Area API

Page 72: Operational Data Vault

72

What are the benefits?• Simplified Architecture• Single Copy of the data!• No “intermediate” IT work to do• Users become empowered, with direct access to

data sets• Of course, using the Data Vault model, you gain

ALL the benefits of the Data Vault (Scalability, flexibility, etc…)

• NOTE: Two or more “users” can actually EDIT different parts of the same record at the same time!

• Integrating external data basically makes it all available to the application immediately!

• NO NEED TO BUILD A SEPARATE EDW!!

Page 73: Operational Data Vault

73

What are the drawbacks?• No current “application” is using the Data Vault

for operational data• In other words, off-the-shelf apps in this area do

not yet exist – you have to “build it” yourself• Self-Service BI application technology is nascent

or non-existent today• Master Data & Metadata Applications are not

currently available on top of Data Vault

Page 74: Operational Data Vault

74

Technical ReviewHub, Link, Satellite - Definitions

Page 75: Operational Data Vault

75

HUB Data Examples

SQN CUST_ACCT LOAD_DTS RECORD_SRC1 ABC123 10-14-2000 SALES2 ABC-123 10-14-2000 SALES3 *ABC-123 10-14-2000 FINANCE4 123,ABCD 10-15-2000 CONTRACTS5 PEF-2956 10-16-2000 CONTRACTS

HUB_CUST_ACCTSQN PART_NUM LOAD_DTS RECORD_SRC1 MFG-25862 10-14-2000 MANUFACT2 MFG*25266 10-14-2000 MANUFACT3 *P25862 10-14-2000 PLANNING4 MFG_25862 10-15-2000 DELIVERY5 CN*25266 10-16-2000 DELIVERY

HUB_PART_NUMBER

SEQUENCE<BUSINESS KEY>{LAST SEEN DATE}<LOAD DATE><RECORD SOURCE>

Hub Structure

} Unique Index} Optional

Page 76: Operational Data Vault

76

Link Structures

LPS_SQNPRODUCT_SQNSUPPLIER_SQNLPS_LOAD_DTSLPS_REC_SOURCELPS_ENCR_KEY

Link_Product_Supplier Link_Customer_Account_Employee

LCAE_SQNCUSTOMER_SQNACCOUNT_SQNEMPLOYEE_SQNLCAE_LOAD_DTSLCAE_REC_SOURCE

UniqueIndex

SEQUENCE<HUB KEY SQN 1><HUB KEY SQN 2><HUB KEY SQN N>{LAST SEEN DATE}{CONFIDENCE}{STRENGTH}<LOAD DATE><RECORD SOURCE>

Link Structure

Unique Index

} Optional

Dynamic Link

Page 77: Operational Data Vault

77

Satellites Split By Source System

PARENT SEQUENCELOAD DATE<LOAD-END-DATE><RECORD-SOURCE>NamePhone NumberBest time of day to reachDo Not Call Flag

SAT_SALES_CUSTPARENT SEQUENCELOAD DATE<LOAD-END-DATE><RECORD-SOURCE>First NameLast NameGuardian Full NameCo-Signer Full NamePhone NumberAddressCityState/ProvinceZip Code

SAT_FINANCE_CUST

PARENT SEQUENCELOAD DATE<LOAD-END-DATE><RECORD-SOURCE>Contact NameContact EmailContact Phone Number

SAT_CONTRACTS_CUST

PARENT SEQUENCELOAD DATE<LOAD-END-DATE><RECORD-SOURCE>{user defined descriptive data}{or temporal based timelines}

Satellite StructurePrimaryKey

Page 78: Operational Data Vault

78

Why do we build Links this way?

Page 79: Operational Data Vault

History Teaches Us…If we model for ONE relationship in the EDW, we BREAK the

others!

79

Portfolio

Customer

M

M

5 yearsFrom now X

Portfolio

Customer

M

1

10 Years ago X

Portfolio

Customer

1

MToday:

Hub Portfolio

Hub Customer

1

M

The EDW is designed to handle TODAY’S relationship, as soon as history is loaded, it breaks the model!

This situation forces re-engineering of the model, load routines, and queries!

Page 80: Operational Data Vault

80

History Teaches Us…If we model with a LINK table, we can handle ALL the

requirements!

Portfolio

Customer

M

M

5 years from now

Portfolio

Customer

1

MToday:

Portfolio

Customer

M

1

10 Years ago This design is flexible, handles past, present, and future relationship changes with NO RE-ENGINEERING!

Hub Portfolio

Hub Customer

1

M

LNKCust-Port

M

1

Page 81: Operational Data Vault

Base EDW Created in CorporateFinancials in USA

HubHub

SatSatSatSat

HubHub

SatSatSatSat

LinkLink

SatSatSatSat

Applying the Data Vault to Global

DW2.0

HubHub

SatSatSatSatLinkLink

Manufacturing EDW in China

HubHub

SatSatSatSat

Planning in Brazil

LinkLink

HubHub

SatSatSatSatLinkLink

81

Page 82: Operational Data Vault

82

Hub Customer Hub OrderLnk Cust-Order

Sat Customer Sat Order Sat Order

DASD – Raid 0+1

Each table receives it’s own I/O channel, and it’s own Raid 0+1 Disk

DASD – Raid 0+1DASD – Raid 0+1

DASD – Raid 0+1 DASD – Raid 0+1 DASD – Raid 0+1

Extreme Data Vault Partitioning

Page 83: Operational Data Vault

83

Query PerformancePoint-in-time and Bridge Tables, overcoming query issues

Page 84: Operational Data Vault

84

Purpose Of PIT & Bridge• To reduce the number of joins, and to reduce the

amount of data being queried for a given range of time.

• These two together, allow “direct table match”, as well as table elimination in the queries to occur.

• These tables are not necessary for the entire model; only when:o Massive amounts of data are foundo Large numbers of Satellites surround a Hub or Linko Large query across multiple Hubs & Links is necessaryo Real-time-data is flowing in, uninterrupted

• What are they?o Snapshot tables – Specifically built for query speed

Page 85: Operational Data Vault

85

PIT Table Architecture

Hub Custome

r

HubOrder

Hub Product

Link Line Item

SatelliteLine Item

Sat 1

Sat 2

Sat 3

Sat 4

PIT Sat

Sat 1

Sat 2

Sat 3

Sat 4

PIT Sat

Sat 1

Sat 2

PARENT SEQUENCELOAD DATE{Satellite 1 Load Date}{Satellite 2 Load Date}{Satellite 3 Load Date}{…}{Satellite N Load Date}

Satellite: Point In Time

PrimaryKey

Page 86: Operational Data Vault

86

PIT Table Example

SQN LOAD_DTS NAME1 10-14-2000 Dan L1 11-01-2000 Dan Linedt1 12-31-2000 Dan Linstedt

SAT_CUST_CONTACT_NAMESQN LOAD_DTS CELL1 10-14-2000 999-555-12121 10-15-2000 999-111-12341 10-16-2000 999-252-28341 10-17-2000 999.257-28371 10-18-2000 999-273-5555

SAT_CUST_CONTACT_CELLSQN LOAD_DTS ADDR1 08-01-200026 Prospect1 09-29-2000 26 Prosp St.1 12-17-2000 28 November1 01-01-2001 26 Prospect St

SAT_CUST_CONTACT_ADDR

SQN LOAD_DTS SAT_NAME_LDTS SAT_CELL_LDTSSAT_ADDR_LDTS1 08-01-2000 NULL NULL 08-01-20001 09-01-2000 NULL NULL 08-01-20001 10-01-2000 NULL NULL 09-29-20001 11-01-2000 11-01-2000 10-18-2000 09-29-20001 12-01-2000 11-01-2000 10-18-2000 09-29-20001 01-01-2001 12-31-2000 10-18-2000 01-01-2001

Snapshot Date

Page 87: Operational Data Vault

87

BridgeTable Architecture

Hub Seller

Hub Product

Link

Satellite

Sat 1

Sat 2

Sat 3

Sat 4

Bridge

Hub Parts

Link

Satellite

UNIQUE SEQUENCELOAD DATE{Hub 1 Sequence #}{Hub 2 Sequence #}{Hub 3 Sequence #}{Link 1 Sequence #}{Link 2 Sequence #}{…}{Link N Sequence #}{Hub 1 Business Key}{Hub 2 Business Key}{…}{Hub N Business Key}

Satellite: BridgePrimary

Key

Page 88: Operational Data Vault

88

Bridge Table Data Example

SQN LOAD_DTS SELL_SQN SELL_ID PROD_SQN PROD_NUMPART_SQN PART_NUM1 08-01-2000 15 NY*1 2756 ABC-123-9K 525 JK*2*42 09-01-2000 16 CO*24 2654 DEF-847-0L 324 MN*5-23 10-01-2000 16 CO*24 82374 PPA-252-2A 9938 DD*2*34 11-01-2000 24 AZ*25 25222 UIF-525-88 7 UF*9*05 12-01-2000 99 NM*5 81 DAN-347-7F 16 KI*9-26 01-01-2001 99 NM*5 81 DAN-347-7F 24 DL*0-5

Snapshot Date

Bridge Table: Seller by Product by Part

Page 89: Operational Data Vault

89

What WASN’T Covered• ETL Automation• ETL Implementation• SQL Query Logic• Balanced MPP design• Data Vault Modeling on Appliances• Deep Dive on Structures (Hubs, Links, Satellites)• What happens when you break the rules?• Project management, Risk management &

mitigation, methodology & approach• Automation: Automated DV modeling, Automated

ETL production• Change Management• Temporal Data Modeling Concerns… And so on…

Page 90: Operational Data Vault

90

Conclusions

Page 91: Operational Data Vault

Who’s Using It?

Page 92: Operational Data Vault

The Experts Say…“The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” Bill Inmon

“The Data Vault is foundationally strong and exceptionally scalable architecture.”

Stephen Brobst

“The Data Vault is a technique which some industry experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” Doug Laney

Page 93: Operational Data Vault

More Notables…

“This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.”

Howard Dresner

“[The Data Vault] captures a practical body of knowledge for data warehouse development which both agile and traditional practitioners will benefit

from..”Scott Ambler

Page 94: Operational Data Vault

94

Where To Learn More• The Technical Modeling Book:

http://LearnDataVault.com

• The Discussion Forums: & eventshttp://LinkedIn.com – Data Vault Discussions

• Contact me:http://DanLinstedt.com - web [email protected] - email

• World wide User Group (Free)http://dvusergroup.com

• Certification Training:o Contact me, or learn more at: http://GeneseeAcademy.com

Page 95: Operational Data Vault

95

ODV – Case StudyOperational Data Vault – IN THE REAL WORLD!

Page 96: Operational Data Vault

96

E-Pedigree, Drug Track & Trace

Product Authenticator

Secure Integration Services

Secure Integration Services

CorporateSerializationVault

CorporateSerializationVault

SerializationAnalyticsEngine

SerializationAnalyticsEngine

Product ReturnsAnd Recalls

E-PedigreeManagement

ManufacturerProduct PackagerSupply Chain

3rd Party LogisticsDistribution Warehouse

PackagingOrders

ProductPackaging

CorpSiteServer

SSI Reporting, Analytics, and Data Mining

Page 97: Operational Data Vault

97

Label Serialization Vault

SerializationVault

SerializationVault

Cust Pkg Line

Cust Pkg Line

Warehouse(WMS)Warehouse(WMS)

ERPERP

E-PedigreeE-Pedigree

EPC GlobalStandards

WS/SOAP

ShippingReasons

Flat FilesWS/SOAP

ASN

Product Master Data

Data

Master Data•Products•Locations•Trading Partners•UsersShipping Data•Transactions

Serialization/Packaging Data•Serial #’s•Hierarchical Relationships•Containers

SerializationMarts

SerializationMarts

Corp DomainCorp

ApplicationsCorp

Applications

Serialization VaultGlobal – Master DataLocal – Private Data

Page 98: Operational Data Vault

98

Corporate Security

04/10/2023

Tracking #Machine InfoTracking #

Machine Info

Pros Unique Logins Limit Access Physical Data Separation in

Logical “Database” units No single login has 100% data

access. Customers can be CHARGED for

disk space, indexing, utilization

Pros Unique Logins Limit Access Physical Data Separation in

Logical “Database” units No single login has 100% data

access. Customers can be CHARGED for

disk space, indexing, utilization

Cons Maintenance, Backup and Restore Changes to the data model

ripple (larger impacts) as more customers are signed up.

Each “support call” requires separate login to see the data set.

Cons Maintenance, Backup and Restore Changes to the data model

ripple (larger impacts) as more customers are signed up.

Each “support call” requires separate login to see the data set.

ManufacturerData VaultData Vault

SQL View LayerSQL View Layer

Mart1

Mart1

Mart2

Mart2

Mart3

Mart3

CustomerLogin

CorpLogin

Encrypt KeyEncrypt Key

EmployeeValidation

AdminLogin

Encrypt KeyEncrypt Key

Web-Services and Flat File Delivery

ShipperData VaultData Vault

SQL View LayerSQL View Layer

Mart1

Mart1

Mart2

Mart2

Mart3

Mart3

CustomerLogin

CorpLogin

Encrypt KeyEncrypt Key

Data Exchange/Sharing Through Code Only

Global

Page 99: Operational Data Vault

99

Web Services File Delivery

Web-Services and Flat File DeliveryMachine

Global DBMachine

Local DBMachine

Local DBMachine

• Encryption at multiple levels• Multi-machine Utilization• RAM Based encryption decryption

through services

Page 100: Operational Data Vault

100

Secure Machine Transfers

Encrypt / Decrypt

Web-Services and Flat File Delivery Machine

Encrypt / Decrypt

https layer

Encrypted / Compressed

Storage

DBMSMachine

VPN Tunnel

Encrypted Local

Director Database

External IP Cards

Page 101: Operational Data Vault

101

Secure Client Data Interchange

CustomerLogin

CorpLogin

Corp Encrypt KeyCorp Encrypt Key

Web ServicesWeb Services

EncryptedFlat Files

Corp Managed / Owned Copy

• Decrypt using Corp Key, then Re-Encrypt with Customer Unique Key before storing

• Customer Owned Key (Dictated by Customer)

• Corporate Owned Key (Encrypts data internally)

Customer Local Copy

DecryptionKey

DecryptionKey

Web ServicesWeb Services

+HTTPS

+ SFTP

Customer Copy

Page 102: Operational Data Vault

102

Security: ODV Web Services

Global DB

CustomerLogin

CorpLogin

Corporate Encrypt KeyCorporate Encrypt Key

Web ServicesWeb ServicesJava Script

Or PHPJava Script

Or PHP

Web BrowserCorp Managed / Owned Copy

Web Site / Server

Corporate Owned Encryption Key

Page 103: Operational Data Vault

103

Inflow/Outflow Applications

Customer CorporationSource

MachineEncrypts Data

Using CustomerKey

Transmit Encrypted Data over HTTPS

Corp Decrypts Data

According to Customer Key

Corp Re-EncryptsData According to

Internal KeyFor Specific

Customer

Corp Re-EncryptsData According to

Internal KeyFor Specific

Customer

DB

Web Service Sender Web Service Collector

Customer Corporation

Corp DecryptsData According to

Internal KeyFor Specific

Customer

Corp DecryptsData According to

Internal KeyFor Specific

Customer

Corp Encrypts Data

According to Customer Key

Customer Decrypts

DataAccording to Customer Key

Transmit Encrypted Data over HTTPS DB

Page 104: Operational Data Vault

104

ODV: Secure File Request

Customer Corporation

Customer Decrypts

FileAccording to Customer Key

Transmit Encrypted Data over FTPS

Encrypted File

** Note: Each Customer DB is encrypted via an internally owned Corp key which is unique to EACH customer.

Page 105: Operational Data Vault

105

ODV: Front-End Ping Request

Customer Corporation

DBMS

Corp One-WayHash of key

NumberTo Execute Ping

Corp One-WayHash of key

NumberTo Execute Ping

Web-BasedPING

Validation

Web-BasedPING

Validation

Unencrypted Data Transfer

Login / Auth