holy warehouse, holy grail the quest for the single version of truth august 2, 2006 michael covert
TRANSCRIPT
Holy Warehouse, Holy Grail
The Quest for the Single Version of Truth
August 2, 2006
Michael Covert
2
Agenda
Warehouse scenarios– Why are they built?– What can we expect when building one?
Guiding principles– Leadership and Communication– Modeling your business– Incremental Delivery and Construction– Technology Decisions
Building a Data Management road map A Case Study
3
Warehouse Scenarios
Why do people build data warehouses?– To allow people to focus on analysis instead of collection and
assembly of data. This is in response to growing needs to understand a business holistically.
Finding it – where is that data? Integrating it – merging data from multiple sources. This is often a
source of error and inconsistency. Reporting on it – spreadsheets and expensive tools. Spreadsheets
continue to be where most analysis occurs. – To produce a single version of truth
Believing it – people and data errors have been corrected. Differing business meanings have been reconciled so that aggregations and variances make sense.
– To reduce internal friction and improve decision making Reports from non-integrated systems often produce inconsistent and
contradictory information. This leads to slowed decision making and in some cases, infighting.
4
Warehouse Scenarios
What can we expect when building a data warehouse?
– Data warehouses typically integrate data from multiple areas:
Finance, HR, Sales, Inventory, etc.
– This invariably leads to larger than normal decision making processes and to multi-departmental involvement
Politics will occur. Big picture (enterprise) issues will arise and they will be difficult
to solve.– Definitions, processes, frequency, and reliability
5
Warehouse Scenarios
Business data can vary greatly in complexity and in definition– Variation in attributes associated with a common entity type leads
to an explosion in number of tables Loans – auto loan, revolving credit, overdraft protection, mortgage, ad
infinitum Apparel – a myriad of categories, subcategories, and lower level
hierarchies of product-specific attributes– Integration and aggregation of these entities requires cross-
departmental alignment of the most basic definitions Risk elements – probability of default, loss given default, collateral
codes, service charge allocation Sales and marketing – promotion and ring code usage, campaign
phase definition, category definition, even SKU and package encoding
6
Warehouse Scenarios
The great technology debates– How will I organize my warehouse?
Should I build a star schema or a 3NF database?– How many technologies will I use?
The database itself– Warehouse, warehouse farms– Operational Data Store– Downstream data marts
The ETL Layer A metadata repository Master data management How will I cleanse my data? Usage of data marts and down stream systems How will I implement my analytics and reporting
layers?
7
Guiding Principles
Leadership– The best functioning data warehouse teams are cross-
departmental and collaborative. Leadership is essential.– A shared expense model is preferred, but again requires
leadership and cultural adoption. Project based expenses are tempting, but can lead to departmental development of an enterprise asset.
Communication– Communicate frequently. Advise on new data areas as they
become available. Strive for smaller but more frequent additions.– Track adoption and use communication to increase usage.– When sufficient adoption as been achieved, if possible TURN OFF
THE OLD SYSTEM(S)!!!
8
Guiding Principles
Model your business and maintain these models.
Use a data modeling tool to maintain your models.
9
Guiding Principles
Incremental delivery– Break the warehouse into subject areas that can be developed and
evolved incrementally.– Think multi-dimensionally.
Devise a multi-dimensional structure for each subject area. Identify overlaps where shared dimensions exist.
3NF versus Star schemas– In most cases, 3NF versus star schema decisions should be based
on: Skill sets In-place technology and processes Technology governance
– Many times, 3NF schemas have views that emulate star schemas. Beware of view join overhead.
* Build on successes
10
Guiding Principles
Technology selection– Limit technology through IT stewardship and
governance. Stay within product families where possible. Implement an ETL layer. Use it to reuse data
interfaces and reduce point-to-point data complexity. Use multi-dimensional systems to offload data
aggregation complexity.– Maintain conforming dimensions.
Strive for enterprise reporting, but realize that it is very difficult to achieve.
11
Guiding Principles
12
Guiding Principles
Layer your database to provide:– A data “landing zone” (also referred to as a staging area)– A data cleansing and integration layer
Assign data ownership to the owning line of business. Build in “data lineage” traceability. Program defensively from the very beginning. Put cleaning rules here. Avoid intermingling them into operational code at
all costs. Strive to reverse engineer them out of surviving systems.– A cleansed, integrated layer that is used to:
Feed downstream systems. Provide the primary reporting interface for end user systems. Resist allowing access to any other layer, specifically the landing zone! ***
– Use this layered technique to build in audit controls and “restartability”.
13
Building a Data Management Road Map
A data management roadmap defines all data management processes and control objectives. – CoBiT, ITIL, et al
14
A Case Study
This case study involves a large financial institution with s significant portion of their business involving collateral-secured loans. Loan reporting environment was manually intensive and heavily
driven off of Microsoft Access databases (approx. 400 Access databases)
End user queries were run against transactional systems. The data that was retrieved was then integrated in redundant, and often inconsistent processes
Limited scalability for future growth (Microsoft Access databases) Similar queries and duplicated analyses were performed by
Business Analysts Business Analysts spent approximately 80% of their time
gathering, re-keying and developing reports Major inconsistencies in these reports were producing political
pressures between sales and the financial analysts.
15
Case Study – Initial EnvironmentData Sources
DRA Loan Originators Dealers Locations Employees DSS
ILS TSER RMS EDS Other Sales Goals
Reporting(PDF , Prompt Rpts, Excel)
Indirect Dealer Finance AccessDatabase Environment
AccessDatabase
AccessDatabase
AccessDatabase
AccessDatabase
AccessDatabase
AccessDatabase
AccessDatabase
FinancialSpreadsheetsand Reports
FinancialSpreadsheetsand Reports
FinancialSpreadsheetsand Reports
FinancialSpreadsheetsand Reports
AccessDatabaseAccess
DatabaseAccessDatabaseAccess
Database
Excel
Additional 390+Access
Databases
AccessDatabase
*
*
FinancialSpreadsheetsand Reports *
Key:
* Various “Group By” Report views
AccessDatabase
Data Sources Origination Dealers Locations Employees Servicing Risk Securitization
16
Case Study – New Environment
Landing ZoneCleanse and Integrate
17
Case Study - Results
Excellent user acceptance of new system Consolidated database now reduces departmental time required to
access data Simplified and pre-integrated data has reduced many inconsistencies Excellent load and response times Microsoft Reporting Services is being used to produce increasingly
complex reports used by end users Microsoft Analysis Services produces cubes that are easy to access
and have nearly instantaneous response time The new system architecture is much easier to change since it is
simplified. New data sources have been added incrementally.
18
Conclusion
Leadership, Stewardship, and Communication Business modeling and mapping Data quality and ownership Technology Governance and Control
– Adoption of a data management roadmap– Data architecture– Technological risk management
Incremental delivery