ps-3c data modelling zone berlin

80
PS-3C A new ensemble modelling technique

Upload: rogier-werschkull

Post on 18-Jan-2017

279 views

Category:

Data & Analytics


0 download

TRANSCRIPT

PS-3CA new ensemble modellingtechnique

About Me

lsquoHead of BIrsquo Spilgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Ensemble data Modellinghellip

rwerschkull

nllinkedincominrogierwerschkull

WHY

Another Ensemble

Wersquove got loads already

httptopofmindssewp2014ensemble-modeling-forelasningar-och-lars-ronnbackrwerschkull

nllinkedincominrogierwerschkull

13600

9970

37073 5

0

2000

4000

6000

8000

10000

12000

14000

16000

Data Vault modeling+data warehouse

Anchor modeling+data warehouse

Hyper agility +datawarehouse

Focal point modeling+data warehouse

Head version modeling+data warehouse

Search Hits on Google 31-8-2016

Ensemble

Popularity

Data Vault

Ran into Problems using

Head-Version

Anchor Modelling

rwerschkull

nllinkedincominrogierwerschkull

WHAT problemS

Photo My ownhelliphelliprwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

Not Build for

BiG data lake

Data CentrICITY

Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull

nllinkedincominrogierwerschkull

lsquoData may first be stored in a

data lake so that it can be explored cleaned and prepared

If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a

data warehouse

If it stops being used frequently it may go back to a HDFS

(Hadoop Distributed File System)-based archiversquo

Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015

httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull

nllinkedincominrogierwerschkull

Systems Like

rwerschkull

nllinkedincominrogierwerschkull

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

About Me

lsquoHead of BIrsquo Spilgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Ensemble data Modellinghellip

rwerschkull

nllinkedincominrogierwerschkull

WHY

Another Ensemble

Wersquove got loads already

httptopofmindssewp2014ensemble-modeling-forelasningar-och-lars-ronnbackrwerschkull

nllinkedincominrogierwerschkull

13600

9970

37073 5

0

2000

4000

6000

8000

10000

12000

14000

16000

Data Vault modeling+data warehouse

Anchor modeling+data warehouse

Hyper agility +datawarehouse

Focal point modeling+data warehouse

Head version modeling+data warehouse

Search Hits on Google 31-8-2016

Ensemble

Popularity

Data Vault

Ran into Problems using

Head-Version

Anchor Modelling

rwerschkull

nllinkedincominrogierwerschkull

WHAT problemS

Photo My ownhelliphelliprwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

Not Build for

BiG data lake

Data CentrICITY

Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull

nllinkedincominrogierwerschkull

lsquoData may first be stored in a

data lake so that it can be explored cleaned and prepared

If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a

data warehouse

If it stops being used frequently it may go back to a HDFS

(Hadoop Distributed File System)-based archiversquo

Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015

httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull

nllinkedincominrogierwerschkull

Systems Like

rwerschkull

nllinkedincominrogierwerschkull

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Ensemble data Modellinghellip

rwerschkull

nllinkedincominrogierwerschkull

WHY

Another Ensemble

Wersquove got loads already

httptopofmindssewp2014ensemble-modeling-forelasningar-och-lars-ronnbackrwerschkull

nllinkedincominrogierwerschkull

13600

9970

37073 5

0

2000

4000

6000

8000

10000

12000

14000

16000

Data Vault modeling+data warehouse

Anchor modeling+data warehouse

Hyper agility +datawarehouse

Focal point modeling+data warehouse

Head version modeling+data warehouse

Search Hits on Google 31-8-2016

Ensemble

Popularity

Data Vault

Ran into Problems using

Head-Version

Anchor Modelling

rwerschkull

nllinkedincominrogierwerschkull

WHAT problemS

Photo My ownhelliphelliprwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

Not Build for

BiG data lake

Data CentrICITY

Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull

nllinkedincominrogierwerschkull

lsquoData may first be stored in a

data lake so that it can be explored cleaned and prepared

If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a

data warehouse

If it stops being used frequently it may go back to a HDFS

(Hadoop Distributed File System)-based archiversquo

Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015

httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull

nllinkedincominrogierwerschkull

Systems Like

rwerschkull

nllinkedincominrogierwerschkull

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

WHY

Another Ensemble

Wersquove got loads already

httptopofmindssewp2014ensemble-modeling-forelasningar-och-lars-ronnbackrwerschkull

nllinkedincominrogierwerschkull

13600

9970

37073 5

0

2000

4000

6000

8000

10000

12000

14000

16000

Data Vault modeling+data warehouse

Anchor modeling+data warehouse

Hyper agility +datawarehouse

Focal point modeling+data warehouse

Head version modeling+data warehouse

Search Hits on Google 31-8-2016

Ensemble

Popularity

Data Vault

Ran into Problems using

Head-Version

Anchor Modelling

rwerschkull

nllinkedincominrogierwerschkull

WHAT problemS

Photo My ownhelliphelliprwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

Not Build for

BiG data lake

Data CentrICITY

Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull

nllinkedincominrogierwerschkull

lsquoData may first be stored in a

data lake so that it can be explored cleaned and prepared

If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a

data warehouse

If it stops being used frequently it may go back to a HDFS

(Hadoop Distributed File System)-based archiversquo

Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015

httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull

nllinkedincominrogierwerschkull

Systems Like

rwerschkull

nllinkedincominrogierwerschkull

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

13600

9970

37073 5

0

2000

4000

6000

8000

10000

12000

14000

16000

Data Vault modeling+data warehouse

Anchor modeling+data warehouse

Hyper agility +datawarehouse

Focal point modeling+data warehouse

Head version modeling+data warehouse

Search Hits on Google 31-8-2016

Ensemble

Popularity

Data Vault

Ran into Problems using

Head-Version

Anchor Modelling

rwerschkull

nllinkedincominrogierwerschkull

WHAT problemS

Photo My ownhelliphelliprwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

Not Build for

BiG data lake

Data CentrICITY

Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull

nllinkedincominrogierwerschkull

lsquoData may first be stored in a

data lake so that it can be explored cleaned and prepared

If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a

data warehouse

If it stops being used frequently it may go back to a HDFS

(Hadoop Distributed File System)-based archiversquo

Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015

httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull

nllinkedincominrogierwerschkull

Systems Like

rwerschkull

nllinkedincominrogierwerschkull

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Data Vault

Ran into Problems using

Head-Version

Anchor Modelling

rwerschkull

nllinkedincominrogierwerschkull

WHAT problemS

Photo My ownhelliphelliprwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

Not Build for

BiG data lake

Data CentrICITY

Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull

nllinkedincominrogierwerschkull

lsquoData may first be stored in a

data lake so that it can be explored cleaned and prepared

If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a

data warehouse

If it stops being used frequently it may go back to a HDFS

(Hadoop Distributed File System)-based archiversquo

Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015

httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull

nllinkedincominrogierwerschkull

Systems Like

rwerschkull

nllinkedincominrogierwerschkull

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

WHAT problemS

Photo My ownhelliphelliprwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

Not Build for

BiG data lake

Data CentrICITY

Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull

nllinkedincominrogierwerschkull

lsquoData may first be stored in a

data lake so that it can be explored cleaned and prepared

If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a

data warehouse

If it stops being used frequently it may go back to a HDFS

(Hadoop Distributed File System)-based archiversquo

Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015

httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull

nllinkedincominrogierwerschkull

Systems Like

rwerschkull

nllinkedincominrogierwerschkull

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

Not Build for

BiG data lake

Data CentrICITY

Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull

nllinkedincominrogierwerschkull

lsquoData may first be stored in a

data lake so that it can be explored cleaned and prepared

If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a

data warehouse

If it stops being used frequently it may go back to a HDFS

(Hadoop Distributed File System)-based archiversquo

Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015

httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull

nllinkedincominrogierwerschkull

Systems Like

rwerschkull

nllinkedincominrogierwerschkull

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Not Build for

BiG data lake

Data CentrICITY

Photo credit Lake Public Domain httpwwwwriteupsorgstar-trek-brent-spiner-datarwerschkull

nllinkedincominrogierwerschkull

lsquoData may first be stored in a

data lake so that it can be explored cleaned and prepared

If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a

data warehouse

If it stops being used frequently it may go back to a HDFS

(Hadoop Distributed File System)-based archiversquo

Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015

httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull

nllinkedincominrogierwerschkull

Systems Like

rwerschkull

nllinkedincominrogierwerschkull

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

lsquoData may first be stored in a

data lake so that it can be explored cleaned and prepared

If it can be structured in a relational format (basically rows and columns) and needs to be used frequently and kept highly secure it may go into a

data warehouse

If it stops being used frequently it may go back to a HDFS

(Hadoop Distributed File System)-based archiversquo

Data Centric data first THOMAS H DAVENPORT WALL STEET JOURNAL OF 3-6-2015

httpblogswsjcomcio20150603the-shift-to-a-new-data-architecture rwerschkull

nllinkedincominrogierwerschkull

Systems Like

rwerschkull

nllinkedincominrogierwerschkull

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Systems Like

rwerschkull

nllinkedincominrogierwerschkull

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Data Flood

Photo credit Kurayba (httpswwwflickrcomphotos48503330N0828564454666 )

under cc licence (httpscreativecommonsorglicensesby-sa20)

rwerschkull

nllinkedincominrogierwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

The possible resulthellip

Photo credit httpshighfiveexportswordpresscom201006253000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-

parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur

rwerschkull

nllinkedincominrogierwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

But isNrsquot Data Vault v2

lsquomade for

Big data centric

systemsrsquorwerschkull

nllinkedincominrogierwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

In

DV2you still

do thisin one go

Subject Oriented

Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Coding =

A lot like modelling

Being Data Centric

conflictswith thecomplex Data

MODELLINGwork

httpxkcdcom844

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Less Mature

JOINoptimizers

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Key-Value

Document

Column Family

NoSQL Databases

+SQL on Hadoop solutions

Do NOT like joins

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

This

REALLYcomplicates

AnchoRModeling

rwerschkull

nllinkedincominrogierwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

And Personally

HUB

SAT

LINK

This one

too(Link Satellite)

HUB

HUB

SAT

SAT

rwerschkull

nllinkedincominrogierwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

problem

rwerschkull

nllinkedincominrogierwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

a) HASHING OF Business keys

Rolling

Stock Nr Datetime Sensor Id Value Concatenated Business Key

Key

Len MD5 Hash

Key

Len

8739

2015-01-22

013427 72A1_FINV 123 8739|2015-01-22 013427|72A1_FINV 34 86ae4c6b0e2e2d5a13a0d11440529aeb 32

8739

2015-01-22

013427 72A1_SLDET 100 8739|2015-01-22 013427|72A1_SLDET 35 51ce9bc292eef407bd7c91a52eebcf2e 32

8739

2015-01-22

013432 72A1_FINV 126 8739|2015-01-22 013432|72A1_FINV 34 9482a41c1fecc4c64b8c437af6cc85e8 32

8739

2015-01-22

013432

13A8_MW_UB

AT_VT 5 8739|2015-01-22 013432|13A8_MW_UBAT_VT 42 e4160914ee55ce0b93f87b23366a0ce3 32

8674

2015-01-22

013426 72A1_FINV 6 8674|2015-01-22 013426|72A1_FINV 34 fcb3e7c8c91e44ce396d908a4948ca65 32

8674

2015-01-22

013426

16A1_HSVER

OND 7 8674|2015-01-22 013426|16A1_HSVEROND 38 fe9098c8c291ad56af5c8afae5169196 32

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Loses statistical Informationregarding the data distributionQuery optimizers do not like thishellip

Column family Document and Key-value databases need a

good (natural) sharding key for (partial) key-

lookups

Hashinghelliphellip

httpwwwebaytechblogcom20120814cassandra-data-modeling-best-practices-part-2rwerschkull

nllinkedincominrogierwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Surrogates keys require

centralized coordination

hellipand thus can impact the overall systemrsquos scalability and availability

A lot of MPP NoSQL databases simply do not have themhellip

B) Surrogate BuSINESS keys

rwerschkull

nllinkedincominrogierwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Then Some Inspiration

httproelantvoscomblogp=1119

rwerschkull

nllinkedincominrogierwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

lsquoIn my opinion the answer lies in the adoption of the

persistent (Historical) Staging Area concept

(also known as Historical Staging or the History Area)

This basically adopts the fundamentals of a Data Warehousersquo

lsquoThe Historical Staging Area effectively lsquoactsrsquo as

Data Lake but in a better defined form as data deltas and

event datetimes are taken into accountrsquo

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

So

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

Subject Oriented Integrated

Time Variant Non-Volatile

EDW

rwerschkull

nllinkedincominrogierwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Could be a

Data LAKE

VirtualisedEnsemble

Tier

EDW

Time Variant

amp

Non Volatile

Subject Oriented

amp

Integrated

EDW

rwerschkull

nllinkedincominrogierwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

How

Does PS-3C Work

rwerschkull

nllinkedincominrogierwerschkullPhoto credit Public Domain

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

StagingArea

EDWInformation

Marts

Focus of Current ensemble EDWrsquos

rwerschkull

nllinkedincominrogierwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Persistent StagingArea HSA =

Data LibraryEDW

Information Marts

Splitting the work

rwerschkull

nllinkedincominrogierwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Persitent

Staging

-

Concept

Context

Connector

Business

Concept

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Identify source event stream Primary or Unique KeyUse source metadata for this

Automate the building of a PS lsquoaround this keyrsquo Take all columns

Historize using SCD-2 approach

Persistent Staging - how

rwerschkull

nllinkedincominrogierwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Entity levelUnique key

Functional Description

Delivering party

Owner Responsible

MULTIPLE tags describing POTENTIAL business domains (sales support marketing operationhellip)

hellip

Persistent Staging Metadata-1

rwerschkull

nllinkedincominrogierwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Column level [Load Date Timestamp]

[Load End Date Timestamp]

[Deleted Flag] OR delete as new record

[Source system] on table file level (lowest possible)

Load End Date Timestamp possible but difficulthellipRequires updates

Persistent Staging Metadata-2

rwerschkull

nllinkedincominrogierwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

ACID is possible in HIVE

ACID Makes Updates possibleBy registering updates as lsquonew datarsquo

Reconciliation compacting when idle at user command

Use ORC files

PLUS changing the HIVE configurationhellip

UPDATES IN HIVE (iSNrsquot HDFS APPEND ONLY)

rwerschkull

nllinkedincominrogierwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

HivePut semi structured data = variable columns in MAP data type

OR use Data storage type that supports schema-evolutionAVRO (ORC in development)

Or HBASEhellip It only has one data type (byte) schema is lsquoappliedrsquo

Schema can be different for every row

What about SEMI-STRUCTURED Data

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

3C - how

rwerschkull

nllinkedincominrogierwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Always starts with Conceptual data modeling

NOT the primary location of Data amp History Virtualised (only if performance allows) Should be deterministic

No Link Satellites

No Surrogate or Hash Keys only lsquoContatenated Natural Business Keysrsquo

Explicit Helper entities

Like Data Vault(2) BUT

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

a UNIQUE Domain specific point of integration

hellipa business entity

hellipwithin itrsquos own domain

hellipdoes not necessarily need to be Enterprise Wide

Business Concept (BC)

rwerschkull

nllinkedincominrogierwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Why not lsquoenterprise widersquoCompany

Customer

Sales Customer

International

Sales Customer

Local

Sales Customer

Marketing Customer

Customer hellip

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Entity level [Description]

[Owner Responsible]

Column level [Load Date Timestamp]

[Source system] on table file level (lowest possible)

Business Concept Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Example-Data

NSR-Station

NS-

Travelcard

NS-

Trainseries

Business Key

IC|855

IC|8852

Sp|7455

St|16050

hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

NS-

Traveller

Business Key

3528 0234 2073 1234

3528 0234 2073 5678

hellip

Business Key

CRM-RW123456

CRM-LAS224466

hellip

rwerschkull

nllinkedincominrogierwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Most easy entity to be virtualised(if performance allows)

No Hashing amp No surrogateBUSINESSKEYS

(not by default at least)

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Containts Context about a Concept In a historical way

hellipLike a Data Vault Satellite

Every CC belongs to only one BC

Seperate entity per source system table stream

Concept Context (CC)

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Concept Context Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Example-dataBK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts META_Deleted_Ind

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 5-6-2015 215959 0

Ut 3511 CE 5237269 489299 hellip NSS1_y 5-6-2015 220000 31-12-9999 000000 0

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000 0

Asa hellip hellip hellip hellip hellip hellip

NSR-Station

[adres]Source NSS1 table y

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

More difficult to be virtualisedDepends on semantic gap with source

But do make virtual when lsquostreaming datarsquo is necessary

Because we have PS layerExposing all columns not necessary

Refactoring is more easyhellip

BC important notes

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Relations between Concepts + Context

In a historical way

hellipMerger of Data Vault Link + Link Satellite

Must ALWAYS have a driving key defined = a (sub)set of keys that make a Connector unique at one point

in time

Connector (C)

rwerschkull

nllinkedincominrogierwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Explicitly defining a driving key as metadatahellip

Gives business understanding

Makes it possible Connector can correctly handle delta data deliverieshellipbull so that a change (on the driving key)

bull is not registered as a new lsquoconnectionrsquo

Connector Driving key

rwerschkull

nllinkedincominrogierwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Entity level [Description] [Owner Responsible]

Column level [Load Date Timestamp] [Source system] on table file level (lowest possible)

Not mandatory for streaming databull [Load End Date Timestamp]bull [Deleted Flag] OR register a delete as a new record

Connector Metadata

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Example

NSR-StationNS-

Travelcard

NS-

TrainseriesNS-

Traveller

[valuation]Source NSS2 table p

[description]SourceNSS1 table x

[adres]Source NSS1 table y

[description]Source NTR table q

[ovchip_

personal]Source NSR table r

[ovchip_

on-usageSource NSR table s

[personal_

details]Source NSR table t

[adres_

data]Source NSR table t rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

Driving Key

NS-Travelcard

+Checkin timestamp

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Example-Data

NSR-StationNS-

Travelcard

NS-

Trainseries

rwerschkull

nllinkedincominrogierwerschkull

NSR-

TravelmovementCheckin timestamp

from

to

BK_NSR-

Station-from

BK_NSR-

Station-to

BK_NS_

Trainseries

BK_NS-Travelcard Checkin

timestamp

Checkout

timestamp

META_Load_dts META_Load_end_dts

Asd Ut IC 855 3528 0234 2073

1234

5-4-2016 84932 5-4-2016 94012 6-4-2016

220000

31-12-9999

000000

Ut Asd IC 855 3528 0234 2073

1234

5-4-2016 181009 5-4-2016 185520 6-4-2016

220000

31-12-9999

000000

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

We add Two mandatory

HELPEREntities

Are we there yet No

rwerschkull

nllinkedincominrogierwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

To help switching from sources that are Tied together by technical (surrogate) keyshellip

To a Business Key based model

Itrsquos a LOOKUP table that translates the technical to the

Business Key

Business Alias

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[description]SourceNSS1 table x[adres]

Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

BA-NSS2

Key Lookup voor NSS1 source tables

Key Lookup voor NSS2 source tables

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Example-data

NSR-Station

rwerschkull

nllinkedincominrogierwerschkull

BA-NSS1

Key Lookup voor NSS1 source tables

Business Key NSS1_Surrogate_key

Ut 123522

Asd 666323

Asa 222443

hellip hellip

Business

Key

META_Source META_Load_dts

Ut NSS1_y 5-6-2015 220000

Asd NSS1_y 5-6-2015 220000

Asa NSS2_p 6-6-2015 220000

hellip hellip hellip

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Has a 1 on (01) relation with a Business Concept

More difficult to be virtualised Lookup table should be kept small

Therefore DO NOT do key lookup in Concept Context entity

Load generate together with BC

Preferably lsquoin memoryrsquo somehowhellip

BA important Details

rwerschkull

nllinkedincominrogierwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

rwerschkull

nllinkedincominrogierwerschkull

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

3C - Details

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Integrate the validity timelines of Concept Contextsbelonging to a Business Concept

Like a Data Vault Point-in-time construct

But Mandatory

And with a clearly defined and performantapproach

BC-Timeline

rwerschkull

nllinkedincominrogierwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Example

NSR-Station

[valuation]Source NSS2 table p

[adres]Source NSS1 table y

rwerschkull

nllinkedincominrogierwerschkull

BK_NSR-

Station

WOZ

waarde

Waarde

Ratingbureau X

META_Laad_dts META_Laad_eind_dts

Ut 20 milj 18 milj 1-1-2014 220000 1-1-2015 215959

Ut 22 milj 18 milj 1-1-2015 220000 1-3-2016 215959

Ut 22 milj 23 milj 1-3-2016 220000 31-12-9999 000000

BK_NSR-

Station

Combined_Load_dts Combined_Load_end_dts

Ut 5-6-2013 220000 1-1-2014 215959

Ut 1-1-2014 220000 1-1-2015 215959

Ut 1-1-2015 220000 4-7-2015 215959

Ut 4-7-2015 220000 1-3-2016 215959

Ut 1-3-2016 220000 31-12-9999 000000

Asd 5-6-2013 220000 31-12-9999 000000

BK_NSR-

Station

Postadres_

postcode

GPS hellip META_

source

META_Load_dts META_Load_end_dts

Ut 3500GJ 5208954 511064 hellip NSS1_y 5-6-2013 220000 4-7-2015 215959

Ut 3511 CE 5237269 489299 hellip NSS1_y 4-7-2015 220000 31-12-9999 000000

Asd 1012 AB 5237269 489299 hellip NSS1_y 5-6-2013 220000 31-12-9999 000000

Asa hellip hellip hellip hellip hellip hellip

BCT

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

modelling

rwerschkull

nllinkedincominrogierwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

What

Makes PS-3Ca Different Ensemble

Business

Concept

X

Concept

ContextX-A

Concept

ContextX-B

Business

Concept

Y

Concept

Context Y-A

Concept

ContextY-B

Concept

ContextY-C

Connector

Business Alias A

Business Alias B

BC-Timeline X

rwerschkull

nllinkedincominrogierwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

1) Explicitly Splitting The work

Data

+History

Subjects

+Integration

rwerschkull

nllinkedincominrogierwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

2) NO HASHED BUSINeSS KEYS

or surrogate keys

httpwwwcannabisculturecomfilesimages6hashbrickJPG

Only

Concatenatedones

rwerschkull

nllinkedincominrogierwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

3) Less

joinsRelation

+Technical validity timeline

+ Relation context

Together in one entityPhoto credit Public Domain

rwerschkull

nllinkedincominrogierwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

4) Explicit

HelpEREntities

Business Alias

Business Component Timeline

+ explicitly define Driving key(s)

Photo credit Public Domainrwerschkull

nllinkedincominrogierwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

Hope to

AVOIDThishellip

httpxkcdcom927

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

PS-3CA new PROPOSEDensemble modelling technique

Help

needed

Questions

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull

About Me

lsquoHead of BIrsquo Spillgames

Certified Data Vault modeler since 2009

Contact details nllinkedincominrogierwerschkull

rogierwerschkullgmailcom

rwerschkull