data vault introduction
Embed Size (px)
TRANSCRIPT
- 1. Data Vault Fundamentals & Best Practices 1 Erik Fransen, managingconsultant +31 6 159 444 76 @erikfransen
- 2. Agenda Introduction Data Vault Basics Benefits & Challenges Best practices: Automation & Data Virtualization Recommended reading 2
- 3. Founded in 1998, The Hague, NL 40+ consultants Business Intelligence, Data Vault, Datawarehousing, Datawarehouse Automation, Big Data, Data Virtualization Business & technical consultancy, end-to-end implementation projects of Data Vault EDW, audits, training, certification Wide range of customers (profit, non-profit) across various industries Since 2009 Genesee Academy partner for Data Vault Day and Data Vault Certification in NL, B & D Implementation partner of Cisco, MapR, Qlik & Tableau
- 4. The Data Vault modeling approach Data Vault is a data modeling approach so it fits into the family of modeling approaches: 4 3rd NormalForm Ensemble Modeling Dimensional While 3rd Normal Form is optimal for Operational Systems and Dimensional is optimal for Data Marts the Ensemble Modeling is optimal for the Datawarehouse And Data Vault is the leading form of Ensemble Modeling
- 5. Forms of Ensemble Modeling 5
- 6. Why do we use Data Vault for DWH? 6 When we need a DWH that supports: Integration Traceability History Incremental Build Agility Gracefully Adapts to New Sources Full Auditability - Source to Mart Enterprise View of Central Data Ready for Automation DataVault isspecifically designed for modelling the EDW
- 7. The Data Vault Ensemble 7 The Data Vault Ensemble conforms to a single key embodied in the Hub construct The parts for the Data Vault Ensemble only include: Hubs The Natural Business Keys Links The Natural Business Relationships Satellite s All Context, Descriptive Data and History of Links and Hubs Separating thingsthat change from things that dont change
- 8. The Data Vault modeling approach As the scope of the EDW is expanded and new data sources added, the Data Vault can adapt to these changes without impacting the existing model This is what allows the EDW to be built incrementally and to adapt to change without the need for re-engineering. NewAreaabsorbed 8 H_Cust H_Sale H_Empl H_Store H_Car Toolsfor DWHAutomationupdatetheDataVault EDW(model+data)inafast,agile&consistentway
- 9. Business benefits Ability to adapt quickly to new business needs Data is traceable allowing for a fully auditable, integrated data store Allows the EDW to absorb all data all of the time Easily adapts to new data sources and changing business rules without expensive re- engineering Results in an Data Warehouse with lower total cost of ownership (TCO) Automation: short time to market, consist quality Project/development benefits Ideal for agile development techniques resulting in lower project risk and more frequent deliverables Can be built incrementally without compromising the core architecture Automation: fast and incremental sprints, predictable costs Architectural benefits Parallel loading Data architecture that supports future expanded scope Can scale to virtually any size Ready for Automation: forces standardization Data Vault Benefits 9
- 10. Data Vault Modeling Process The Modeling Process for creating a Data Vault model includes three primary steps: 1) Identify and Model the Core Business Concepts Business Interviews is at the heart of this step What do you do? What are the main things you work with? Also find best/target Natural Business Key 2) Identify and Model the Natural Business Relationships Specific Unique Relationships 3) Analyze and Design the Context Satellites Consider Rate of Change, Type of Data and also the Sources of your data during design process 10 Ideallythedatavaultismodelledbased onbusinessprocessesandbusiness concepts
- 11. Getting data out of the Data Vault Problem: The Data Vault EDW is about data decomposition, data registration and data integration Data Vault is not intended, nor designed or optimized for data distribution and data consumption downstream the EDW Leads typically to many complex physical data marts (high maintenance, high cost) Solution: Start thinking differently: focus on creating functional data products for the business Stop loading and replicating data physically, start using data virtualization 11
- 12. Eliminate the need for physical data marts No data replication needed Real-time data refreshment No redundant data storage Simple updates of data models Simple queries Short Time to Market Automatic updates Lower storage costs High performance Ready for Big Data DataVault EDW CRM ERP Weblog s Productio n Data DataCopy Steering information SQL Data Virtualization Tool + Data Abstraction Layers NoDataCopy atall 12
- 13. Virtual 13 SuperNova DataModel Operational DataModel UniformDataModel DataVirtualization PhysicalModel Virtual Application Layer Virtual Physical Layer Virtual Business Layer Webservices Views Any other sourcedata Data Layers for Data Virtualization DataVault datawarehouse Automated step!
- 14. Wrap up Data Vault Basics: Hubs, Links, Satellites Integration, history, incremental modelling, agility Benefits: Business, project, architecture Make use of automation tools for fast, agile and consistent delivery Challenges: Data downstream the data vault EDW Solution: use virtual data marts and automate SuperNova data models for reporting & analytics 14
- 15. Recommendedreading onSuperNova Freedownloadhttp://www.cisco.com/web/services/enterprise-it-services/data- virtualization/documents/whitepaper-cisco-datavaul.pdf 15
- 16. RecommendreadingonDataVault Freedownloadshttp://hanshultgren.wordpress.com/ 16
- 17. RecommendreadingonEnsemble&DataVault ModelingtheAgileDataWarehousewithDataVault DataVaultModeling AgileDataWarehousingBI EnterpriseDataWarehousing DataIntegrationandDWBIArchitecture UnifiedDecomposition EnsembleModeling AcompletebookonDataVault AnIntroduction,aGuideandaReference Modeling,Architecture&theDataWarehousingProgram Data&SemanticIntegrationforEnterpriseCentralMeaning ApplyingConceptstoasuccessfulAgileDWBIProgram 17
- 18. RecommendreadingonDataVirtualization DataVirtualizationinBusinessIntelligenceArchitectures Firstindependentbook ondatavirtualization that explains inaproduct-independentwayhow data virtualization technology works. Illustrates concepts using examples developed with commercially available products. Showsyou how to solve commondataintegration challenges such asdataquality,system interference,and overallperformanceby following practicalguidelines onusing datavirtualization. Apply datavirtualization rightaway with three chapters fullofpracticalimplementation guidance. Understandthebigpictureofdatavirtualization and its relationship with datagovernance and informationmanagement. 18
- 19. Data Vault Training & Certification CDVDM: March 31, April 1 2016 Amsterdam DVD: March 2, 2016 Diegem www.centennium-opleidingen.nl For all questions: [email protected] 19
- 20. A short history on Data Vault 2002: First papers published by Dan Linstedt 2006: Start CDVDM certification program by Genesee Academy 2007: Start of Data Vault EDW implementations Primarily in Europe (NL, S), some in USA 2008-2015: Several books published on DataVault by Dan Linstedt, Hans Hultgren and others 2013: Data Vault on the radar in B, DACH, UK, USA, AUS, NZ, Asia 2013: Data Vault EDW implementations going worldwide 2015: Over 900 CDVDM professionals and 750+ Data Vault EDW worldwide 20