Transcript
Page 1: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

© CGI Group Inc. CONFIDENTIAL

Data Vault &

Ladeperformance

Page 2: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

2

Über mich .....

Markus Kollas

BI Berater seit 1998

Bei CGI seit 01/2008

Executive Consultant

BI (Framework) Trainer

Page 3: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

Data Vault & Ladeperformance

No. 3

What is a Data Vault? Data Vault Modelling Basics Hubs, Satellites, Links and a construct Performance Tuning your Data Vault Loading data into your Data Vault Retrieving data from your Data Vault

2

3

4

5

1

6

Page 4: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

1.1 What is a Data Vault?

Author: Dan Linstedt

Real name:

Common Foundation of Data Warehouse modelling

“The Data Vault is a detail oriented, historical tracking and uniquely linked

set of normalized tables that support one or more functional areas of

business”.

“It is a hybrid approach encompassing the best of breed between 3rd

normal form (3NF) and star schema. The design is flexible, scalable,

consistent and adaptable to the needs of the enterprise. It is a data

model that is architected specifically to meet the needs of today’s

Enterprise Data Warehouse”.

No. 4

Page 5: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

1.3 Where does Data Vault fit in?

No. 5

Analytical Transaction

Data Flow

Data Warehouse

Page 6: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

1.8 Data modelling techniques applied!

No. 6

Station

Journey Ticket

Ticket

TypeClient

Line

Zone

Route

Line

Station

Calendar

Zone

Dimensional modelling Data Vault modelling

Link

Sat

Hub

Hub

Hub

Sat

Sat Link

Normalization

modelling (3NF)

Analytical Transaction Data Warehouse

Page 7: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

Data Vault & Ladeperformance

No. 7

What is a Data Vault? Data Vault Modelling basics Hubs, Satellites, Links and a construct Performance Tuning your Data Vault Loading data into your Data Vault Retrieving data from your Data Vault

2

3

4

5

1

6

Page 8: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

2.1 What are the Data Vault primary

components?

The Data Vault consists of three primary components:

Hubs are core business keys

Links form all associations between the Hubs

Satellites provide all detail information for

Hubs and Links

The Hubs and Links together form the skeletal structure of the model

while the satellites add all the descriptive details.

No. 8

Page 9: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

2.3 Separation of data types in a DV structure

Rule: Each component contains either Business Keys (HUB),

Associations (LINK), or Details (Satellite)

No. 9

H

S

L

H S

H

S L H

S

L

S

S

S

Product

Name

Address

Vendor

Customer

Delivery

Order

Orderline

Producttype

Page 10: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

2.4 In comparison: A dimensional data model

No. 10

Fact_Sale

Dim_Region

Dim_Time Dim_Product

Dim_Customer

Fact (Tables) contain all three types of data, Dimension (Tables)

contain Business Keys and Details…

Page 11: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

2.5 In comparison: A normalized data model

No. 11

Customer

Entities (Tables) typically contain all

three types of data…

Order Store Region

Vendor

Product

Order Line

Page 12: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

2.6 The advantage of data type separation

No. 12

• Each data component can be managed without impact on other

components.

• Changes in data constrains (relationships) of source data (often)

does not impact the Data Vault.

• All components are decoupled to make the Data Vault model

(easy) extendable.

• The load procedures of the components are uniform.

Page 13: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

Data Vault & Ladeperformance

No. 13

What is a Data Vault? Data Vault Modelling Basics Hubs, Satellites, Links and a construct Performance Tuning your Data Vault Loading data into your Data Vault Retrieving data from your Data Vault

2

3

4

5

1

6

Page 14: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

3.4 Hub characteristics

No. 14

• Primary Key: PK is a unique hash key

• Business Key: A Hub’s business key is the actual Hub value to

store and it is a unique index

• Load DTS: A Hub’s Load Date Time Stamp represents the first time

the EDW saw the data

• Record Source: A Hub Record Source represents the source

(system) of the Business key value

Business Key Hash (PK)

Business_Key

Load DTS

Record_Source

Page 15: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

3.10 Link characteristics

No. 15

Link_Key Hash (PK)

Business Key Hashes

Load DTS

Record_Source

• Primary Key: PK is a unique hash key.

• Foreign Keys: A Link has two or more Foreign keys (the Business

Hash Keys of the corresponding Hubs) implementing a n:n relation

between two or more Hubs. It is a composite unique index.

• Load DTS: A Link’s load date timestamp represents the first time the

EDW saw the data.

• Record Source: A Link Record Source represents the source

(system) of the Link associative value.

Page 16: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

3.16 Satellite characteristics

No. 16

• Detail(s): A Satellite can have one or more detail values for one record.

• The Detail Hash Diff helps to compare a new record to the older ones

• End DTS: A Satellite end date timestamp represents the time the EDW

saw the new data that replaces the old record.– column is optional

• Record Source: A Satellite Record Source represents the source

(system) of the detail value

• Note: Avoid outer joins, at least one row for every row in Hub or Link.

Business Key Hash (PK)

Load DTS (PK)

Details 1-n

Detail Hash Diff (optional)

End_DTS (optional)

Record_Source

• Business Key Hash: A Foreign key to the

unique key of the Hub or Link.

• Load DTS: A Satellites load date timestamp

represents the first time the EDW saw the

data (it is part of the Foreign Key).

• Both form the Primary Key of the Satellite

Page 17: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

1.4 Data Vault physical structure

No. 17

Business Key Hash

Business_Key

Load_DTS

Record_Source Link

Hub

Satellite

Business Key Hash

Business_Key

Load_DTS

Record_Source

Hub

Satellite

Business Key Hash

Load_DTS

Detail(s)

End_DTS

Record_Source

Business Key Hash

Load_DTS

Detail(s)

End_DTS

Record_Source

Link Key Hash

Business Keys

Hashs

Load_DTS

Record_Source

Page 18: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

Data Vault & Ladeperformance

No. 18

What is a Data Vault? Data Vault Modelling Basics Hubs, Satellites, Links and a construct Performance Tuning your Data Vault Loading data into your Data Vault Retrieving data from your Data Vault

2

3

4

5

1

6

Page 19: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

3.1 Performance tuning your Data Vault

No. 19

• After all the functional modelling is done, the performance of the Data

Vault can be tuned.

• Tuning the performance does not change the functionality.

• Performance is only tuned when necessary.

• There are two options for tuning:

1. Performance tuning that can be modelled into the Data Vault by the

Data Vault modeler.

2. Performance that is tuned by the DBA of the database (like table

spacing, indexing, partitioning, etc).

Page 20: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

Data Vault Modelling part 2

No. 20

What is a Data Vault? Data Vault Modelling Basics Hubs, Satellites, Links and a construct Performance Tuning your Data Vault Loading data into your Data Vault Retrieving data from your Data Vault

2

3

4

5

1

6

Page 21: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

4.1 Data Vault – Loading sequence?

No. 21

Data Mart

(Dim) Staging EDW

(DV) Transaction

Staging Loads Data Vault Loads Dimensional Loads

Page 22: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

4.2 Data Vault – Loading Source and Stage

No. 22

Staging Loads Data Vault 2.0 Loads Dimensional Loads

So

urc

es

Sta

ge

Parallel loading of the

Sources, followed by Staging

(staging can be virtual or non

existent…)

Page 23: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

4.3 Data Vault 1.0 – Loading Data Vault

No. 23

Staging Loads Data Vault Loads Dimensional Loads

So

urc

es

Sta

ge

Hu

bs

First up is parallel

loading the Hubs

DV

1.0

Page 24: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

4.4 Data Vault 1.0 – Loading Data Vault

No. 24

Staging Loads Data Vault Loads Dimensional Loads

So

urc

es

Sta

ge

Hu

bs

Hu

b-S

at

Lin

ks

And parallel

loading of the

Links between the

Hubs

Then parallel

loading the

Satellites belonging

to the Hubs

DV

1.0

Page 25: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

4.5 Data Vault 2.0 – Loading Data Vault

No. 25

Staging Loads Data Vault 2.0 Loads Dimensional Loads

So

urc

es

Sta

ge

Hu

bs

S

ats

L

inks

Parallel loading of all

DV structures as

hardware restrictions

allow

DV

2.0

Page 26: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

4.6 Data Vault – Loading Dimensions and Facts

No. 26

Staging Loads Data Vault 2.0 Loads Dimensional Loads

So

urc

es

Sta

ge

Dim

s

Facts

Hu

bs

S

ats

L

inks

Finally loading the

Dimensions, followed

by the Facts of the

dimensional model

Page 27: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

4.7 Data Vault – Loading

No. 27

• Starting the loading of the Hubs, Links and Satellites may be still major

synchronization points.

• All loading is done simultaneously – thanks to the use of Hash Keys.

• Sets of loading jobs “wait” for the previous set to complete.

• Loads are started as soon as data is ready.

• No other “waiting” time is required.

• Load dependencies are greatly reduced.

Page 28: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

Data Vault & Ladeperformance

No. 28

What is a Data Vault? Data Vault Modelling Basics Hubs, Satellites, Links and a construct Performance Tuning your Data Vault Loading data into your Data Vault Retrieving data from your Data Vault

2

3

4

5

1

6

Page 29: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

4.2 EDW: Data Vault requires an Architectural

shift

No. 29

Data Mart

(Dim) Staging

EDW

(DV) Transaction

source

Complex business rules

coming out of the EDW,

“the lens” filter

Complex business rules are only transformed downstream, allowing

traceability, auditability and uniform/homogeneous loading.

Only “hard”

rules.

Page 30: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

4.3 EDW: The Business Data Vault

No. 30

Data Mart

(Dim) Staging

EDW

(DV) Transaction

source

Business

DV

The Business Data Vault holds

transformed and calculated values:

It supports ”business transformations”

Raw DV

The Raw Data Vault is the vault as

described up till this point:

It supports “one version of the facts”

Page 31: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

4.4 Business Data Vault Definition

• The Business Data Vault stores data processed by (soft) business

rules.

• Data in the Business Data Vault is always derived from the Raw Data

Vault (also called “Operational Vault”).

• Preferred design choice: separate models (Raw/Business Vault).

• Practical choices: Business Hubs, Links and Satellites are added to

the Raw Data Vault model as needed.

No. 31

Page 32: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

4.5 Business Data Vault Example:

SAT_INV_CUR

No. 32

HUB_INVOICE

SAT_INV_DT

Invoice_Hash

Load_DTS

Amount_Billed

Amount_Payed

End_DTS

Record_Source

SAT_INV_AMT

Invoice_Hash

Load_DTS

Billed_Date

Paid_Date

End_DTS

Record_Source

SAT_INV_CUR

Invoice_Hash

Load_DTS

Currency

Exchange_Rate

Amount_Payed

End_DTS

Record_Source

Derived calculation

based on

Amount_Payed from

SAT_INV_AMT and

Exchange_Rate

Invoice_Hash

Invoice_Number

Load_DTS

Record_Source

Page 33: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

4.6 Business Data Vault Performance

Optimal choices for performance or real-time loading:

• Integrated Raw Data Vault and Business Data Vault.

• Business Hubs, Links or Satellites added to Raw Hubs, Links or

Satellites.

• Example:

Customer Hub has two address Satellite tables; one for each of two

separate source systems. After loading the raw data, business rules

are used to calculate the active address and stores this result in a

Business Satellite attached to the Customer Hub.

No. 33

Page 34: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

4.7 Retrieving data is to “know your data”

No. 34

Example:

•The relation between Customer and Product has always

been 1:n.

•Then, on 01-01-2010, the transaction system changes and

the relation between Customer and Product becomes n:1.

•The Link can handle this change, therefore no problem.

•How can the reporting environment know about this

change? It is invisible in the Data Vault model that has not

been changed…

Conclusion: Hence the necessity of Meta Data!

Page 35: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

• Tracking complete history on detailed level.

• 100% versioning and audit trail.

• Implicit implementation of MDM

• Parallel processing of satellite loading, optimizing performance.

• True “Single Source of Facts”

CDR in the EDW model – Storing time variant data

Link Call Detail

Hub Phone Number

received

making

charged

Satellite Call Detail

defining

Hub Facilities

Hub Customer

Hub Exchange

writing

Satellite Exchange

Satellite Facilities

Link Contract

Satellite Phone

Number

Satellite Contract

Satellite Customer

used

defining defining

defining

defining

defining

owns

part of

Page 36: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

• Automated identification of candidate dimensions.

• A dimension originates from a hub.

• Combine with related links and satellites based on information requirements.

CDR in the Subject Areas or Data Marts – Identifying dimensions

Dimension Customer

Dimension Contracts

Dimension Exchanges

Dimension Facilities

Link Call Detail

Hub Phone Number

received

making

charged

Satellite Call Detail

defining

Hub Facilities

Hub Customer

Hub Exchange

writing

Satellite Exchange

Satellite Facilities

Link Contract

Satellite Phone

Number

Satellite Contract

Satellite Customer

used

defining defining

defining

defining

defining

owns

part of

Subject Area

Dimension Customer

Dimension Facilities

Dimension Exchanges

Dimension Contracts

Page 37: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

Fact CDR

• Automated identification of candidate facts.

• A facts originates from a link with related hubs.

• Combine with related satellites based on information requirements.

• Optimized for analytical query performance.

CDR in Subject Areas or Marts – Identifying facts

Link Call Detail

Hub Phone Number

received

making

charged

Satellite Call Detail

defining

Hub Facilities

Hub Customer

Hub Exchange

writing

Satellite Exchange

Satellite Facilities

Link Contract

Satellite Phone

Number

Satellite Contract

Satellite Customer

used

defining defining

defining

defining

defining

owns

part of

Subject Area

Dimension Customer

Dimension Facilities

Dimension Exchanges

Dimension Contracts

Fact CDR

Data Vault – Deutsche Bank Juni 2012

Page 38: Data Vault & Ladeperformance - DOAG Deutsche ORACLE ... · Data Vault & Ladeperformance No. 7 What is a Data Vault?1 Data Vault Modelling basics Hubs, Satellites, Links and a construct

Thank you

de.cgi.com/BI


Top Related