original: lean data model storming for the agile enterprise
TRANSCRIPT
Lean Data Modelstorming For The Agile Enterprise
Daniel Uptonwww.DecisionLab.Net
[email protected]/in/danielupton
The Business Intelligence Promise: Smarter, more fact-based decision-making as an everyday routine.
Traditional Architectural Trade-Off: Do you want it quickly, fully featured,
or with high quality? Pick two.
Daniel Upton DecisionLab [email protected]
Q: Why does a Data Warehouse Take So Long?
A: Many functional interdependencies result in mostly sequential tasking
Daniel Upton DecisionLab [email protected]
Deliver Data
Especially these tasks: Nothing delivered
for multiple sprints. Non-agile.
Daniel Upton DecisionLab [email protected]
Deliver Data
Sequential Development
Daniel Upton DecisionLab [email protected]
Gantt View: End to End ModelStormed
Lean Data Hubs Enable Fast Delivery
Daniel Upton DecisionLab [email protected]
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Lean Core Principles• Focus on customer• Eliminate waste• Deliver as fast as possible• Decide as late as possible• Optimize the whole– Sub-optimize some parts• Modularize, automate, and re-use
Daniel Upton [email protected] DecisionLab
End to End ModelStorming• Dimensional Modelstorming• Goal: Quickly express a complete logical star schema data model proven to fully
satisfy a user data story’s narrative and it’s detailed acceptance criteria.– Reference: Agile Data Warehouse Design: from Whiteboard to Star Schema (book by Lawrence
Corr & James Stagnitto)
• Lean Data Modelstorming• Goal: Quickly express a complete lean data model, mapped upstream directly from
actual source data and proven to then supply this source data downstream directly into a data presentation layer such as a Star Schema.– An Original Concept by Daniel Upton (Presenter: 10/15/2016, SoTec Conference)
Daniel Upton [email protected] DecisionLab
Dimensional ModelStorming
In stakeholder meeting…• Question: Who does what?– Who, what, where, when, how, why, how
much?• Answers: User Information Stories• On a whiteboard, draw and refine…– Event Model Storm– Dimension ModelStorm– Hours Later… Event Matrix
Source: Agile Data Warehouse Design: from Whiteboard to Star Schema (book by Corr & Stagnitto)Daniel Upton [email protected] DecisionLab
User Information Stories• Alumni Development specialists must know which
donors donate how much and when, in order to ensure donors receive recognition and benefits.
• Institutional Research must know how many students enrolled weekly, with what program and standing, to ensure University meets qualifications for student loans.
• Auditors must know what family relationships exist between donors and students, and student standing, to monitor for conflicts of interest and uphold reputation.
Daniel Upton [email protected] DecisionLab
Event ModelStorms
Daniel Upton [email protected] DecisionLab
Dimension ModelStorm
Daniel Upton [email protected] DecisionLab
Events Matrix
Daniel Upton [email protected] DecisionLab
End to End ModelStorm: Lean Data Overview
Daniel Upton DecisionLab.Net [email protected]
Traditional Enterprise DW: Entities = Relations with
Dependencies
Daniel Upton DecisionLab [email protected]
Star Schema: Facts w/ Business Rules Fully Dependent on Dimensions w/ Business
Rules
Daniel Upton DecisionLab [email protected]
Perfect World Data Flow: Direct to Star Schema
Daniel Upton DecisionLab [email protected]
Perfect World Data Flow: EDW and Data Marts
Daniel Upton DecisionLab [email protected]
Daniel Upton, DecisionLab
Real World Data Flow: EDW and Data Marts
Daniel Upton DecisionLab
Real World Data Flow: EDW and Data Marts
Daniel Upton, DecisionLab
Real World Data Flow: EDW and Data Marts
Daniel Upton DecisionLab
Real World: Even Big Ships Are At Risk
Daniel Upton DecisionLab
EDW’s are at Risk from Business Volatility.
EDW Development Takes a Long Time
Daniel Upton DecisionLab [email protected]
Long ago, many decided not to wait for an EDW
Daniel Upton DecisionLab [email protected]
Traditional Data Mart (Star Schema)
Daniel Upton DecisionLab [email protected]
Perfect World Data Flow: Direct to Star Schema
Daniel Upton DecisionLab [email protected]
Standardizing Directly on Star Schema Forces Tight Restrictions on Incoming
DataDaniel Upton DecisionLab [email protected]
Real World Data Flow: Direct to Star Schema
Daniel Upton DecisionLab [email protected]
What’s wrong here?
Daniel Upton DecisionLab [email protected]
Direct to Star: Begins well, but harder to sustain over time.
Daniel Upton DecisionLab [email protected]
Real World Data Flow: Direct to Star Schema
Daniel Upton, DecisionLab.Net
Single Version of the Truth (SVOT)
Source data is reinterpreted and massaged into a new data model that fixes core truths about the data, it’s relationships, and the business, so that one record, one field, one table, contains THE authoritative data.• Is SVOT easy? …achievable? …aspirational?• Tasks for SVOT: Lengthy requirements analysis and successful
negotiation with many stakeholders across an enterprise, then intensive data modeling, then custom ETL coding, while hoping SVOT remains fixed.Daniel Upton
DecisionLab [email protected]
Single Version of the Truth (SVOT)
Assumptions vs. Experience• Perfect World Assumption:
• SVOT is universally accurate and stable. Given the amount of work to achieve SVOT, it needs to be.
• Real World Experience: • SVOT accuracy can vary from ‘fair to excellent’, it’s
scope is often far from universal across an entire business, and business changes occur faster than the DW can keep up.
Daniel Upton DecisionLab [email protected]
Single Version of the Truth (SVOT)
Disclosure• Seemingly trivial changes in the business may require non-
trivial changes to the ETL code and reports.
• Non-trivial business changes often require changes to the underlying SVOT data model, the ETL code, and reports.
• Major business changes, or just many small ones over time often leave a SVOT DW in a perpetual state of disarray, with a never-ending list of critical issues and an excessive focus on ‘break – fix – redeploy’.
Daniel Upton DecisionLab [email protected]
Single Version of the Truth (SVOT)
SVOT is worth pursuing, but with a different playbook.
Resolution: The DW structure must not be dependent on a static interpretation of truth. It must not break when new rules and analyses need to be applied to data, when inaccuracies are discovered late, or when data sources or business processes change.
Lean Data Warehouse overcomes this big challenge directly by loosely coupling diverse source data to insulate the DW from changes, by easily storing all data as it changes over time, and by delaying decisions on business rules, SVOT, data quality, reporting or analytics until after the horses are in the barn. Daniel Upton DecisionLab [email protected]
Questions?
Daniel Upton DecisionLab [email protected]
Lean Data Principles• Eliminate waste: For in-scope source tables, instantiate and load
all records from all attribute and key fields.• Deliver Fast: Generic design pattern for quickly historizing
source data.• Decide Late:
– Write code for business rules just downstream of Lean Data Hubs, in order to avoid hard-coding business rules into the core data load.
• Focus on Customer (Pragmatic Design): Scope, design, and load tables purely based on business needs, regardless of functional constraints in data sources.
Daniel Upton DecisionLab [email protected]
Lean Data PrinciplesHere’s How: Optimize the Whole: The Lean Data Model must have…
• High Cohesion: Hubs have no functional dependencies to other Hubs, thus can be scoped, design, loaded simultaneously or months or years later.
• Loose Coupling: Hubs link to other Hubs by association, never by functional dependency (foreign key in a dependent table).
• Accept some Suboptimized Components to Achieve it: Models are larger, associative links require an added 1-2 table joins for querying across Hubs.
Daniel Upton DecisionLab [email protected]
Lean Data HubsThe Lean Data Hub is a critical architectural component in End-to-End ModelStorming. It is fundamentally based on Data Vault architecture, with the following specific Data Vault references:
– Super Charge Your Data Warehouse, by Dan Linstedt, co-edited by Kent Graziano (2008-2011) http://LearnDataVault.Com
– Modeling the Agile Data Warehouse with Data Vault, by Hans Hultgren (2012) New Hamilton Press
– Agile Data Warehousing for the Enterprise, by Ralph Hughes (2016) Elsevier / Morgan Kaufman
Daniel Upton DecisionLab [email protected]
Lean Data HubsDefinition:
– “Pattern-based, history-tracking, modular data assets sourced from highly disparate data, loosely coupled by common business keys to join core business concepts (ensembles), and leaving source data otherwise unchanged. It remains flexible and cohesive, easily configured to support urgent changes in (a) data sources, (b) business rules, and (c) reporting or analytics requirements, and it’s loose-coupled design pattern inherently supports fast, highly parallelized loading by eliminating dependencies among core business concepts.”
- Daniel Upton, 10/15/2016, SoTec ConferenceDaniel Upton DecisionLab [email protected]
Lean Data HubsDefinition (continued):
– Core Business Concept (CBC): Equivalent to an Entity in a 3rd normal form normalized data model.
– Ensemble: Storage of a CBC in one Hub and all associated Satellites.
– Modularity and Cohesion: Attained on two levels:
• Between Ensembles (Hubs): an associative (loosely-coupled) ensemble modeling pattern eliminates all functional dependencies between ensembles (Hubs).
• Isolation of Business Rules from Core Data Layer: The virtualization of analytic or business-rule transformations, downstream of the ensembles, and preferably as views, prevents changes in those analytics or business rules from compromising the core ensembles and their loading process. Daniel Upton
DecisionLab [email protected]
A Lean Data Model Protects The Repository from Volatile Business Rules That Cause
ETL to Break
Daniel Upton DecisionLab [email protected]
Lean Data Hubs: High Level Architecture
Daniel Upton DecisionLab [email protected]
High Level View: Lean Hubs Design is Modular
Daniel Upton DecisionLab [email protected]
End to End ModelStorm: Lean Data Warehouse:Detail
Daniel Upton DecisionLab.Net [email protected]
High Level Summary of Lean Data Modeling Steps: Watch for the following details in the upcoming visual diagrams
* Addition of control fields
* Modification of primary keys
* Temporary duplication of tables
* Removal of excess fields
* Establishment of Hub vs. Satellite and the Hub-Satellite link
* Creation of Links
* Creation of Hub-Link relationships Daniel Upton, DecisionLab
Lean Data ModelStorming: Step 1: Source Data – Fast Profile
Daniel Upton, DecisionLab
Lean Data ModelStormingStep 2: Prepare New Tables
a. Add two fields to top, set new PK, add two fields to bottom, then duplicate tables
Daniel Upton, DecisionLab
Lean Data ModelStorming Step 3: Modify for New Ensembles
Daniel Upton, DecisionLab
Lean Data ModelStorming Step 4: Working Data Model
Daniel Upton, DecisionLab
Credits:Lean Data Hubs model is based on Data Vault (aka. Hyper Normalized) design pattern, with credits to these authors…
Super Charge Your Data Warehouse, Dan Linstedt, 2008 LearnDataVault.com (and other books by Mr. Linstedt)
Modeling the Agile Data Warehouse with Data Vault, Hans Hultgren, 2012, New Hamilton
Agile Data Warehouse Design for the Enterprise, Ralph Hughes, 2016, Elsevier Inc. (Mr. Hughes originated the term “Hyper Normalalized”.)
In 11th Hour, Two User Stories Change
• Institutional Research must know how many students enrolled weekly, with what program and standing and relationship to a Counselor, to ensure University meets qualifications for student loans.
• Auditors must know what family relationships exist between donors and students and Counselors, and student standing, to monitor for conflicts of interest and uphold reputation.
Daniel Upton [email protected] DecisionLab
Lean Data ModelStorming Revisit Step 1: Source Data – Fast
Profile
Daniel Upton, DecisionLab
Lean DW ModelStormingRevisit Step 2: Prepare New Tables
Daniel Upton, DecisionLab
11th Hour Lean ModelStorm CompleteHigh Cohesion and Loose Coupling
No functional dependencies between free-standing
ensembles. Delivered in ½ the normal time. Load in parallel with existing loads.
Daniel Upton, DecisionLab
DecisionLab LLC
All Rights Reserved
Extensions and adaptations are all pattern-based. No dependencies between Hubs (core business concepts), so
existing tables never need refactoring of existing tables, therefore multiple teams can co-develop simultaneously
within one Lean Data Warehouse without interference with each other.
Daniel Upton, DecisionLab
“Warehouse Your Data Now. Add Rules and Relationships As Needed”
Daniel Upton DecisionLab [email protected]
End to End ModelStorm: Lean Data Hubs
Daniel Upton DecisionLab.Net [email protected]
Lean Data Hubs: Detailed Architecture
Daniel Upton, DecisionLab
With Lean Data Hubs as Infrastructure, we can easily keep up with ongoing change, delivering quickly…
Daniel Upton DecisionLab [email protected]
…and building data stability just beneath
the changes.
Waiting for the EDW
Daniel Upton DecisionLab [email protected]
Gantt: Nothing delivered for multiple sprints. Non-agile.
Daniel Upton DecisionLab [email protected]
Deliver Data
Critical Path Delays
Daniel Upton DecisionLab [email protected]
Traditional Method vs. ModelStormed Lean Data Hubs: Assume same resources, same levels of skill
and effort.Abbreviations denote chunks of work in Lean Data
Hubs.
Daniel Upton DecisionLab [email protected]
Daniel Upton DecisionLab [email protected]
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
How should we sequence these little chunks of work? This way?Let’s take some hints from Lean Data Principles… * Loose coupling. Many dependencies are gone now. * Deliver as fast as possible.
…Ideas?
This way is faster and reflects actual dependencies. Smaller chunks of
work due to fewer data dependencies.
Daniel Upton DecisionLab [email protected]
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data Every Sprint: With same resources, skill levels and effort,
Modelstormed Lean Data Hubs support more rapid data delivery
Daniel Upton DecisionLab [email protected]
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
Deliver Data
With Lean Data Hubs as Infrastructure, we can easily keep up with ongoing change, delivering quickly…
Daniel Upton DecisionLab [email protected]
…and building data stability just beneath
the changes.
End to End ModelStorming and Lean Data Hubs
Daniel Upton DecisionLab.Net [email protected]
Lean Data Modelstorming For The Agile Enterprise
Thank you!
Daniel Uptonwww.DecisionLab.Net,
[email protected]/in/danielupton