further in agility with data vault - trivadis.com · [email protected] . . info-tel. 0800 87 482...
TRANSCRIPT
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . Datum 07.07.2013 . Page 2 / 12
Contents
Why agility? ................................................................................................................. 3 1.
Where is the data vault modeling located in the global BI Architecture ................. 4 2.
Understanding Data Vault Modeling ......................................................................... 5 3.
Why this split? ............................................................................................................. 7 4.1) The core concept of the model is the hub! ...................................................................... 7
2) Business Associations ..................................................................................................... 7
3) Descriptive Data? ........................................................................................................... 8
How can this split increase agility of an EDW structure? .......................................... 9 5.4) Source system is changing .............................................................................................. 9
5) New descriptive information needed by the business .................................................... 10
6) Need historization only on some information ............................................................... 11
Conclusion ................................................................................................................. 11 6. Source and Links… ...................................................................................................... 12 7.
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . Datum 07.07.2013 . Page 3 / 12
Why agility? 1.
Agility became one of the most important criteria of reporting information systems in
recent years. Data analysis has more and more importance to pilot a business the right
way and give the possibility to adapt the trends very fast. To do so, the reporting
information systems needs to be in-line with the present, be able to restitute the past
and the analysis of those moments give the ability to drive the future.
The concept of agility is linked to the present moment.
The source information systems are evolving very fast to answer the development of
business. For this reason the information systems have to adapt their structure to
restitute those changes with minimum latency.
For Humans, the adaptability is called Intelligence.
For Information systems, it’s called Agility.
I will try to explain in this article, how Data Vault can have an impact on the Agility of
an information system.
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . Datum 07.07.2013 . Page 4 / 12
Where is the data vault modeling located in the global BI 2.
Architecture
Here is the most common architecture with the different modeling approaches on each
step.
Everything is based on the “operational systems (OPR)”, containing the real-time data:
all the transactions of the organization. Those systems are volatile and generally
modeled with 3rd normal form architecture.
The concept of the Enterprise Data Warehouse describes a way for information systems
to globalize and organize data across a complete organization. The Enterprise data
warehouse is the place were global enterprise data is stored.
Data Vault is an alternative to the dimensional modeling or 3NF in the Enterprise
Data warehouse (sometimes also called the “DWH Core”).
Data marts are used to answer a particular punctual business need for one or more
entities of an organization. This is not a global view of an enterprise but only a specific
business view. This part of the architecture is not impacted by the Data vault Modeling.
In Data Vault architecture, this layer is built on the top of the Data Vault EDW and
keeps its dimensional structure.
Schema1: represents the most common Data Warehouse
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . Datum 07.07.2013 . Page 5 / 12
The Data vault modeling technique applies to the enterprise Data Warehouse not
to the Data Marts. Data marts keep dimensional modeling methodology.
Understanding Data Vault Modeling 3.
For a better understanding of data vault techniques, I believe that starting from
Dimensional design gives a better explanation of the concept. The same exercise can
be done from third normal form architecture.
The traditional Dimensional (Kimball,) way to design the data warehouse is to split the
business in 2 types of entities:
- Fact containing the numbers to measure the trends ( amounts, number of
clients, number of contracts)
- Dimensions containing the descriptive information (client, product, country, …)
In a dimension, we can find 2 types of information:
- Business key of the entity (in blue)
- Descriptive data and the history of the changes across time (in yellow)
Client_ID
Client_BK
Client_Name
Client_Address
History_Start_date
History_End_Date
Schema2: represents a dimension customer
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . Datum 07.07.2013 . Page 6 / 12
In a fact table, we can find 3 types of information:
- Descriptive data (measures)
- Relations to the entities (in red)
- Business fact keys (degenerated dimension attributes in blue)
The idea of data vault modeling is to split those 3 concepts:
- Business keys are becoming HUBS
- Relations are becoming LINKS
- Descriptive Data are becoming SATELLITES
To understand how Data Vault can impact the Agility, I need to dive deeper into those
3 types of entity.
Client_FK
Product_FK
Time_FK
Amount
Sales_Order_Number
Schema3: represents the content of a fact table
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . Datum 07.07.2013 . Page 7 / 12
Why this split? 4.
The idea of data vault is to split physically data that is not changing at the same
frequency.
The split of the business keys, links and descriptive data is done because those
concepts are not changing at the same frequency and for the same reasons.
1) The core concept of the model is the hub!
The Business keys stored in hubs have to be, not technical surrogate keys coming from
source systems, but the most describing and enterprise-wide business “code” for one
business entity. The physical structure of a hub doesn’t contain any link or any
descriptive information but only the business key!
Hubs are the core of Data Vault modeling. Every core business concept has his hub.
(sale, product, customer,..). The existence of a Hub is purely driven by the business!
Those core Business concepts are the pillars of an enterprise and if you change or
remove all of them, you simply change or remove the existence reason of the
enterprise.
If sale, customer, product are core business concepts for an enterprise, those Hubs can
add content (you will add new sales, new customers and new products) but very rarely
being delete or modified.
For example: If we talk about the product “AXYZ-1223”, all business entities will understand it and use it as unique business key to isolate this particular product. This Hub-product will always have a meaning for a particular enterprise that is selling … products. Even, if they add product types and they change sale strategy,
2) Business Associations
Business associations are modeled in physical structures called links.
Each business association is following business reality and represents natural business
associations between business core concepts. This association allows many to many
relations.
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . Datum 07.07.2013 . Page 8 / 12
For example:
The product Hub contains only the Product business keys
The sale hub contains only the sale business keys
The customer hub contains only customer business keys
The link told us that there is a natural relationship between a sale, product and a
customer
3) Descriptive Data?
Descriptive data is stored in satellites.
Each Satellite contains descriptive information about one and only one Hub.
A satellite is always linked to a hub (a business key) and has no meaning without this
particular hub.
The descriptive information is:
- The most changing type of data (add, remove, change,…)
- The data on which we need to track changes for certain columns
Product HUB Sales HUB
Customer HUB
Link
Design of 3 Hubs joined by a link into the databases
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . Datum 07.07.2013 . Page 9 / 12
For example:
This customer satellite contains the name, address, age and more of a certain customer
that is represented by one business key into his HUB.
So if we look at the figure above, the customer with the business key “CUS3457” has all
its description in the satellite customer. In this example, there is only one satellite but
there can be more than one. The content attributes of each Satellite are defined based
on the frequency of change of attributes (for ex: grouping in one customer satellite
SCD1 attributes, grouping in one customer satellite SCD2 attribute) or on the
functional meaning (customer address, customer Description).
How can this split increase agility of an EDW structure? 5.
Data Vault gives the possibility to adapt the model easier without impacting the historical
model.
The best way to explain Data Vault is to give examples.
CONCRETE CASES
4) Source system is changing
If your business needs are changing and you need to add a new concept “Shop” into
your EDW, you simply add a new HUB into your model and create a new link without
impacting the existing model and the existing reporting system. You will not have to
recreate and remap your old structure on the new one, the old structure stay in place
and keep history in its original structure. You will not have to retest you old structures
and there is no direct impact on your downstream processes, you have the time to
adapt them.
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . Datum 07.07.2013 . Page 10 / 12
5) New descriptive information needed by the business
This request is very frequent; a business entity needs additional fields in the reporting.
Data Vault gives you the possibility to minimize impact on your existing reporting� for
example for the other entities. Instead of altering the existing satellite customer
structure, you will add a new satellite on the same customer Hub with the new
information. It’s very agile and has low impact.
Product HUB Sale HUB
Cutomer HUB
Old
Link
Shop HUB
New
Link
Customer HUB
Satellite
Customer
Satellite 2
Customer
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . Datum 07.07.2013 . Page 11 / 12
6) Need historization only on some information
In this case, you will only historize the attribute that needs to be and store the other
without history. Imagine that you need different types of historization in parallel, you
will multiply your satellites.
Conclusion 6.
I believe that Data Vault Modeling is more agile then traditional way to think the
enterprise Data Warehouse.
It gives the opportunity to modify your EDWH without touching the existing way to
store history and gives easier solutions for the integration of the changes. The impact
on the downstream processes is low due to the easy possible ways to add or change
information without touching to the existing model.
This method has a real meaning when it is applied on changing environments. I mean,
when you have a stable infrastructure with low change request, or a single source to
integrate, it is a non-sense to add this type of layer between your source and data
marts. However, if your business is constantly evolving and agility is a requirement to
your information system, Data Vault can be a good choice for your EDWH Design.
Before the implementation of this kind of architecture, I think that time has to be spent
on defining Standards and global naming conventions; the idea of the split is more
agile but generates more objects (tables). When there is an evolution, a real thinking
Customer HUB
Satellite Customer
With history
Satellite Customer
Without history
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . Datum 07.07.2013 . Page 12 / 12
about the “HOW to implement it” (considering the factors impact, business needs and
design) has to append, data vault gives you a lot of possibilities but I believe that in
each case, one way has more sense than others.
Source and Links… 7.
- Hans Hultgren : “Modeling the Agile datawarehouse with data vault“
- http://hanshultgren.wordpress.com/
- www.trivadis.com