data vault moduling

Upload: rajnish-kumar-ravi

Post on 08-Apr-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 Data Vault moduling

    1/14

  • 8/7/2019 Data Vault moduling

    2/14

    Contents

    y Introduction

    y History and philosophy of Data Vault

    y Implementationy Loading practices

    y Data Vault and dimensional modelling

    y Data Vault scheme as neural network

    y Origin

    y References

  • 8/7/2019 Data Vault moduling

    3/14

    Introductiony Data Vault Modeling is a method of designing a

    database to provide historical storage of data coming

    in from multiple operational systems with completetracing of where all the data in the database came from

    y The method is designed to be resilient to change in theenvironment.

    yThis purpose is mainly achieved by taking the businessorganisation as starting point for the datamodel, sinceit is assumed that this will change less often than theoperational systems used to support the business.

  • 8/7/2019 Data Vault moduling

    4/14

    s ory an p osop y o a a

    Vault

    y In datawarehouse modelling there are two well-knowncompeting options for modelling the layer where the

    data is stored. Either you model according to Kimball,with conformed dimensions and an enterprisedatabus, or you model according to Inmon with thedatabase in Third normal form. Both techniques haveissues when dealing with changes in the systemsfeeding the datawarehouse. For conformeddimensions you also have to cleanse data (to conformit) and this is undesirable in a number of cases.

  • 8/7/2019 Data Vault moduling

    5/14

    y Data Vault is designed to avoid or minimize the impactof those issues.

    y Dan Linstedt, the creator of the method, describes theresulting database as follows:

    y The Data Vault is a detail oriented, historical trackingand uniquely linked set of normalized tables that supportone or more functional areas of business. It is a hybridapproach encompassing the best of breed between 3rdnormal form (3NF) and star schema. The design is

    flexible, scalable, consistent and adaptable to the needsof the enterprise

  • 8/7/2019 Data Vault moduling

    6/14

    y An alternative name for the method is "CommonFoundational Integration Modelling Architecture."

    y Data Vault's philosophy is that all data is relevant data,even if it is "wrong". Data being wrong is a businessproblem and usually not a technical problem. Thismeans you have to be able to capture all the data.Another issue to which Data Vault is a response is thatmore and more there is a need for completeauditability and traceability of all the data in the

    datawarehouse. Due to Sarbanes-Oxley in the USAand similar measures in Europe this is a relevant topicfor many business intelligence implementations

  • 8/7/2019 Data Vault moduling

    7/14

    Impleme tatiy Data Vault attempts to solve the problem of dealing with

    change in the environment by separating the business keys(that do not mutate as often, because they uniquely

    identify a business entity) from the attributes of those keys.Attributes change at different rates, so you can groupattributes together in small tables called Satellites and linkthose to the business keys that are in tables called Hubs.Associations or transactions between business keys

    (relating Hubs such as Customer and Product through thePurchase transaction) are modelled using Link tables, thatalso have satellites describing the attributes of the relation.In other words, the satellites provide the context for thebusiness processes that are captured in Hubs and Links.

  • 8/7/2019 Data Vault moduling

    8/14

    Impleme tati c ti ey Links can link to other Links, to deal with changes in granularity (for instance, adding a

    new key to a database table would change the grain of the database table). Adding a Linkto two Hubs to another Link to a Hub is similar to adding those 3 Hubs to a single Link.See the section on Loading Practices why the latter method is preferred.

    y Links sometimes link to only one Hub, a construct called 'peg-legged Link' by Dan

    Linstedt. This occurs when one of the business keys associated by the Link is not a realbusiness key. As an example, take an order form with "order number" as key, and orderlines that are keyed with a semi-random number to make them unique. Let's say, "uniquenumber". The latter key is not a real business key, so it is no Hub. However, we do need touse it in order to guarantee the correct granularity for the Link. In this case, we do not usea Hub with surrogate key, but add the business key "unique number" itself to the Link.This is done only when there is no possibility of ever using the business key for anotherLink or as key for attributes in a Satellite.

    y A

    ll the tables contain metadata, minimally describing at least the source system and thedate on which this entry became valid, giving a complete historical view of the data as itenters the data warehouse. Data is never deleted, unless you have a technical error whileloading data.

    y The Data Vault does not maintain referential integrity between tables, but instead allLink tables are many-to-many relationships. This means that missing data is not an issueand also that Hubs, Links and Satellites can be loaded independent of each other, inparallel.

  • 8/7/2019 Data Vault moduling

    9/14

    Loa i g practicesy The ETL for updating a Data Vault model is fairly straightforward (see [3]). First

    you have to update all the Hubs. Having done that, you can now resolve allbusiness keys to surrogate ID's. The second step is to add all attributes on thekey to the Hub satellites and at the same time create and update all Linksbetween Hubs. This resolves all the Link surrogate keys, enabling you to thenadd all the Link satellites in step three.

    y It is easy to verify that the updates inside each step are unrelated and can bedone in parallel. The ETL is quite straightforward and lends itself to easyautomation or templating. Problems occur only with Links relating to otherLinks, because resolving the business keys in the Link only leads to anotherLink that has to be resolved as well. Due to the equivalence of this situationwith a Link to multiple Hubs, this difficulty can be avoided by remodelling

    such cases.

  • 8/7/2019 Data Vault moduling

    10/14

    Data a lt a ime sional

    modellingy The Data Vault modelled layer is normally used to store all

    the data, all the time. A lot of end-user computing toolsexpect their data to be contained in a dimensional model,

    so a conversion is needed. However, the Hubs and relatedsatellites can be considered as Dimensions and the Linksand related Satellites as Fact tables in a dimensional model.This enables you to quickly prototype a dimensional modelout of a Data Vault model using views.

  • 8/7/2019 Data Vault moduling

    11/14

    Data a lt sc eme as ne ral

    networky The Data Model is patterned off a simplistic view of neurons,

    dendrites, and synapses - where neurons are associated withHubs and Hub Satellites, Links are dendrites (vectors of

    information), and other Links are synapses (vectors in theopposite direction). By utilizing a data mining set of algorithms,links can be scored with confidence and strength ratings. Theycan be created and dropped on the f ly in accordance withlearning about relationships that currently don't exist. Themodel can be automatically morphed, adapted, and adjusted as

    it is used and fed new structures.

  • 8/7/2019 Data Vault moduling

    12/14

    Originy Data Vault Modeling was originally concepted by Dan

    Linstedt in 1990 and was released in 2000. Data VaultModeling is public domain and has been freely available

    since 2000.y In a series of five articles on The Data Administration

    Newsletter the basic rules of the Data Vault method areexpanded and explained. These contain a generaloverview[4], an overview of the components[5], adiscussion about end dates and joins[6], link tables[7] and

    an article on loading practices

  • 8/7/2019 Data Vault moduling

    13/14

    y References

    http://en.wikipedia.org/wiki/Data_Vault_Modeling

  • 8/7/2019 Data Vault moduling

    14/14

    Thank you