education seminar: self-service bi, logical data warehouse and data lakes
TRANSCRIPT
Speakers
Chuck DeVries
VP, Enterprise Architecture
Vizient
Ravi Shankar
CMO
Denodo
Chris Walters
Sr. Solutions Consultant
Denodo
Charles Yorek
VP, Business Analytics
iOLAP
Agenda1.Customer Use Case: Powering Self-Service BI with Logical
Data Warehouse and Operationalizing Logical Data Lakes
2.Logical Data Lakes/ Warehouse: Architectural Patterns and
Performance Considerations
3.Demo: Building Logical Data Lakes/ Warehouse using Data
Virtualization
4.Best Practices: Big Data Virtualization Deployment and Management
5.Panel: Self-Service BI, Logical Data Warehouse, Data Lakes
Powering Self Service BI with Logical Data Warehouses and Operationalizing Data Lakes
Chuck DeVries
December 2016
Vizient Presentation │ Date │ Confidential Information6
Who is Vizient?
• Combination of VHA, University HealthSystem
Consortium, Novation, MedAssets Spend and
Clinical Resource Management and Sg2
• Experts with the purchasing power, insights
and connections that accelerate performance
for members
Vizient Presentation │ Date │ Confidential Information7
Purpose, mission, strategic aspirations
Purpose
To ensure our members
deliver exceptional, cost-
effective care
Mission
To connect members
with the knowledge,
solutions and expertise
that accelerate
performance
Strategic Aspirations
• Become an
indispensable partner
to health care
organizations
• Become a leader
in health care
innovation
• Accelerate our
growth rate
Vizient Presentation │ Date │ Confidential Information8
Vizient members span the care continuum
Vizient serves thousands of health care organizations
across the nation, from independent, community-based
organizations to large, integrated systems including
• Acute care hospitals
• Academic medical centers
• Non-acute community health care providers
• Pediatric facilities
Vizient Presentation │ Date │ Confidential Information9
Member-owned, member-driven
MEMBERSHIP BENEFITS
• Harness powerful insights
• Accelerate performance
• Achieve scale and efficiency
• Make innovative connections
• Be more agile
• Build knowledge
• Gain advocates on important policy issues
We measure our success by our members’ success. We fuel
powerful connections that help members focus on what they
do best: deliver exceptional, cost-effective care.
We deliver brilliant, data-driven resources and
insights — from benchmarking and predictive analytics
to cost-savings — to where they’re needed most.
Empowering brilliant connections
Vizient Presentation │ Date │ Confidential Information11
Unmatched insight and expertise
9 out of 10of the U.S. News & World Report Best
Hospitals 2014-2015 Honor Roll
utilized our contracts and services.
~$100BVizient represents approximately
$100 billion in annual purchasing
volume — the largest in the
industry.
200+Vizient member hospitals have
achieved remarkable
improvements in quality and
patient safety through our
Hospital Engagement Network.
More than 1/3Vizient provides services
to more than one-third of the
nation’s hospitals.
Information is inclusive of MedAssets Spend and Clinical Resource
Management segment, including Sg2.
Examples of powering self service discovery with a Logical Data Warehouse approach
Vizient Presentation │ Date │ Confidential Information13
Financial Data Mart
Primary Use Case: Unify disparate accounting and finance data marts
across various legacy organizations into a logical data warehouse
Secondary Use Cases
• Provide a unified source for key BI initiatives like the GPO Dashboard
• Support reporting needs as legacy systems are migrated or replaced during
integration of Vizient and L-MDAS (dbVision, etc.)
• Provide a final resting place for archived legacy sources like Solomon, Epicor,
etc.
Vizient Presentation │ Date │ Confidential Information14
VHA
MedAssets
UHC
Financial Data Mart
Architectural Approach
• Denodo was selected as the data platform in
order to utilize the following features of the
software:
–Data Virtualization allows sources in various mediums and
locations to be integrated without physically moving the data
–Data Abstraction allows data to be represented consistently within
the datamart while data sources are moved or replaced behind
the scenes
–Data Integration allows for a single seamless view to be created
across a subject area (e.g. “Supplier Sales”) with varied data
transformation rules for each data source within the subject area
(PRS, dbVision) allowing a logical data warehouse to be created
without the need to instantiate a physical on
Vizient Presentation │ Date │ Confidential Information15
GPO Dashboard
Primary Use Case: Provide a consolidated view of supplier sales data
across all customers of legacy Vizient & Med Assets organizations.
Architectural Approach
• Financial Datamart (on Denodo) for data source
• Denodo TDE Exporter Tool for daily data extracts to Tableau:
– Report Data
– Report User Security
• Tableau for report development and distribution
Vizient Presentation │ Date │ Confidential Information16
Over 400 active users across 6
departments
GPO Dashboard
Key Challenges
• Balance between data timeliness and report performance
– Tableau reports performed best utilizing the TDE format
(cached/extracted dataset) as opposed to a live connection
– This meant that the report caches required daily refreshes, and data
extraction had to be appropriately tuned
– Denodo features such as dataset statistics and indexing greatly
contributed to this performance tuning
• Provisioning user security at cell level
– The requirement for some internal report users to be restricted to the
members/customers to which they are assigned meant that a new
report security approach was needed
– Reliance on TDEs for report data necessitated the integration of security
in the reporting layer
– Tableau’s “data blending” feature allows user security to be specified
within a separate dataset
– This also supports reuse of the security view across logical data
warehouse views
Vizient Presentation │ Date │ Confidential Information17
Contract Sales Actualizer Dashboard
Primary Use Case: Integrate Member Spend and Supplier Sales
data from all Vizient organizations to identify opportunities for
increasing contract utilization
Other Use Cases:
• Maintain consistency (Single Source Of Truth) with GPO
dashboard regarding:
– Supplier Sales Data
– Dimension Data
– User Security
Architectural Approach
• Data source utilizes Denodo to reuse overlapping datasets (sales,
dimensions, security) while allowing separate virtualized views to
be created for new datasets (member spend) which can be also be
reused by future projects via a logical data warehouse
• Reporting components match approach used by GPO Dashboard
Vizient Presentation │ Date │ Confidential Information18
Contract Sales Actualizer Dashboard
Key Challenges
• Successful integration of Exadata RDM as a data source for Denodo
– Approach utilizes the strength of Exadata RDBMS for aggregating
large quantities of data quickly
– Denodo to integrate the data with similar legacy SQL Server data
sources to create a comprehensive view of Vizient member spend
• Scalability/Configuration Management
– Advances were made to support parallel development of this
project and continued efforts on GPO dashboard
– Compartmentalization features within Denodo allow for code
changes in each project to be version controlled and assessed for
dependencies
– Process guidelines are being authored to allow for multiple
development efforts on the same datasets
Vizient Presentation │ Date │ Confidential Information19
Virtual warehouse
Modern Data Architecture
Vizient Presentation │ Date │ Confidential Information21
Open
Data
Purchase
Data
RDBMS
Rules
RDBMS
ODS
Datawarehouse
Our central focus is helping members
apply data and insights in new ways to
achieve sustainable results. Our
success is ultimately defined by the
success of our members in serving their
patients and communities.
Curt Nonomaque, President and CEO, Vizient
Logical Data Lakes/ Warehouse:Architectural Patterns and Performance Considerations
Ravi Shankar, CMO
December 2016
Agenda1.The Logical Data Warehouse
2.Architectural Patterns
3.Performance Considerations
4.Customer Success Studies
Logical Data Warehouse
Description:
A semantic layer on top of the data warehouse that keeps the business data definition.
Allows the integration of multiple data sources including enterprise systems, the data warehouse, additional processing nodes (analytical appliances, Big Data, …), Web, Cloud and unstructured data.
Publishes data to multiple applications and reporting tools.
27
Logical Data Warehouse
Description:
“The Logical Data Warehouse (LDW) is a new data management architecture for analytics combining the strengths of traditional repository warehouses with alternative data management and access strategy. The LDW will form a new best practice by the end of 2015.”
“The LDW is an evolution and augmentation of DW practices, not a replacement”
“A repository-only style DW contains a single ontology/taxonomy, whereas in the LDW a semantic layer can contain many combination of use cases, many business definitions of the same information”
“The LDW permits an IT organization to make a large number of datasets available for analysis via query tools and applications.”
28
Gartner Definition
Gartner Hype Cycle for Enterprise Information Management, 2012
29
Data Virtualization as the Data Integration Layer
Data Virtualization as Data Integration/Semantic Layer
Data Virtualization
EDW ODS
• Move data integration and semantic layer to
independent Data Virtualization platform
• Purpose built for supporting data access
across multiple heterogeneous data sources
• Separate layer provides semantic models for
underlying data
• Physical to logical mapping
• Enforces common and consistent security
and governance policies
• Gartner’s recommended approach
Logical Data Warehouse
30
EDW Hadoop Cluster
SalesHDFSFiles
Document Collections
NoSQLDatabase
ERP
Database Excel
What about the Logical Data Lake?
A Data Lake will not have a star or snowflake schema, but rather a more
heterogeneous collection of views with raw data from heterogeneous
sources
The virtual layer will act as a common umbrella under which these
different sources are presented to the end user as a single system
However, from the virtualization perspective, a Virtual Data Lake shares
many technical aspects with a LDW and most of these contents also
apply to a Logical Data Lake
33
Common Patterns for a Logical Data Warehouse
1. The Virtual Data Mart
2. DW + MDM
3. DW + Cloud
4. DW + DW
5. DW historical offloading
34
1. Virtual Data Marts
Business friendly models defined on top of one or multiple systems,
often “flavored” for a particular division
Motivation
Hide complexity of star schemas for business users
Simplify model for a particular vertical
Reuse semantic models and security across multiple reporting engines
Typical queries
Simple projections, filters and aggregations on top of curated “fat tables” that
merge data from facts and many dimensions
Simplified semantic models for business users
35
1. Virtual Data Marts
Time Dimension Fact table(sales)
Product
Retailer Dimension
Sales
EDW Others
Product
Prod. Details
36
2. DW + MDM
Slim dimensions with extended information maintained in an external
MDM system
Motivation
Keep a single copy of golden records in the MDM that can be reused across
systems and managed in a single place
Typical queries
Join a large fact table (DW) with several MDM dimensions, aggregations on
top
Example
Revenue by customer, projecting the address from the MDM
37
2. DW + MDM dimensions
Time Dimension Fact table(sales) Product Dimension
Retailer Dimension
EDW MDM
38
3. DW + Cloud dimensional data
Fresh data from cloud systems (e.g. SFDC) is mixed with the EDW, usually
on the dimensions. DW is sometimes also in the cloud.
Motivation
Take advantage of “fresh” data coming straight from SaaS systems
Avoid local replication of cloud systems
Typical queries
Dimensions are joined with cloud data to filter based on some external attribute
not available (or not current) in the EDW
Example
Report on current revenue on accounts where the potential for an expansion is
higher than 80%
39
3. DW + Cloud dimensional data
Time Dimension Fact table(sales) Product Dimension
Customer Dimension
CRM
SFDC Customer
EDW
40
4. Multiple DW integration
Motivation
Merges and acquisitions
Different DWs by department
Transition to new EDW Deployments (migration to Spark, Redshift, etc.)
Typical queries
Joins across fact tables in different DW with aggregations before or after the JOIN
Example
Get customers with a purchases higher than 100 USD that do not have a fidelity
card (purchases and fidelity card data in different DW)
Use of multiple DW as if it was only one
41
4. Multiple DW integration
Time Dimensi
on
Sales fact
Product Dimension
Region
Finance EDW
City
Marketing EDW
Customer Fidelity factsProduct Dimension
*Real Examples: Nationwide POC, IBM tests
Store
42
5. DW Historical Partitioning
Only the most current data (e.g. last year) is in the EDW. Historical data is offloaded to a Hadoop cluster
Motivations
Reduce storage cost
Transparently use the two datasets as if they were all together
Typical queries
Facts are defined as a partitioned UNION based on date
Queries join the “virtual fact” with dimensions and aggregate on top
Example
Queries on current date only need to go to the DW, but longer timespans need to merge with Hadoop
Horizontal partitioning
43
5. DW Historical offloading Horizontal partitioning
Time Dimension Fact table(sales) Product Dimension
Retailer Dimension
Current Sales Historical Sales
EDW
45
It is a common assumption that a virtualized solution will
be much slower than a persisted approach via ETL:
1. There is a large amount of data moved through the
network for each query
2. Network transfer is slow
But is this really true?
46
Denodo has done extensive testing using queries from the standard benchmarking test
TPC-DS* and the following scenario
Compares the performance of a federated approach in Denodo with an MPP system where
all the data has been replicated via ETL
Customer Dim.2 M rows
Sales Facts290 M rows
Items Dim.400 K rows
* TPC-DS is the de-facto industry standard benchmark for measuring the performance of decision support solutions including, but not limited to, Big Data systems.
vs.Sales Facts290 M rows
Items Dim.400 K rows
Customer Dim.2 M rows
Performance ComparisonLogical Data Warehouse vs. Physical Data Warehouse
47
Performance Comparison
Query DescriptionReturned
RowsTime Netezza
Time Denodo (Federated Oracle,
Netezza & SQL Server)
Optimization Technique (automatically selected)
Total sales by customer 1,99 M 20.9 sec. 21.4 sec. Full aggregation push-down
Total sales by customer and year between 2000 and 2004
5,51 M 52.3 sec. 59.0 sec Full aggregation push-down
Total sales by item brand 31,35 K 4.7 sec. 5.0 sec. Partial aggregation push-down
Total sales by item where sale price less than current
list price17,05 K 3.5 sec. 5.2 sec On the fly data movement
Logical Data Warehouse vs. Physical Data Warehouse
48
Performance and optimizations in DenodoFocused on 3 core concepts
Dynamic Multi-Source Query Execution Plans
Leverages processing power & architecture of data sources
Dynamic to support ad hoc queries
Uses statistics for cost-based query plans
Selective Materialization
Intelligent Caching of only the most relevant and often used information
Optimized Resource Management
Smart allocation of resources to handle high concurrency
Throttling to control and mitigate source impact
Resource plans based on rules
49
Performance and optimizations in DenodoComparing optimizations in DV vs ETL
Although Data Virtualization is a data integration platform, architecturally speaking it is more similar to a RDBMs
Uses relational logicMetadata is equivalent to that of a databaseEnables ad hoc querying
Key difference between ETL engines and DV:ETL engines are optimized for static bulk movements
Fixed data flowsData virtualization is optimized for queries
Dynamic execution plan per query
Therefore, the performance architecture presented here resembles that of a RDBMS
Autodesk Overview
• Founded 1982 (NASDAQ: ASDK)
• Annual revenues (FY 2015) $2.5B
Over 8,800 employees
• 3D modeling and animation software
Flagship product is AutoCAD
• Market sectors:
Architecture, Engineering, and Construction
Manufacturing
Media and Entertainment
Recently started 3D Printing offerings
51
Business Drivers for Change
• Software consumption model is changing
Perpetual licenses to subscriptions
User want more flexibility in how they use software
• Autodesk needed to transition to subscription pricing
2016 – some products will be subscription only
• Lifetime revenue higher with subscriptions
Over 3-5 years, subscriptions = more revenues
• Changing a licensing model is disruptive
52
Technology Challenges
• Current ‘traditional’ BI/EDW architecture not designed for data streams from online apps
Weblogs, Clickstreams, Cloud/Desktop apps, etc.
• Existing infrastructure can’t simply ‘go away’
Regulatory reporting (e.g. SEC)
Existing ‘perpetual’ customers
• ‘Subscription’ infrastructure work in parallel
Extend and enhance existing systems
With single access point to all data
• Solution – ‘Logical Data Warehouse’
53
58
Problem Solution Results
Case Study Autodesk Successfully Changes Their Revenue Model and Transforms Business
Autodesk was changing their business revenue model from a conventional perpetual license model to subscription-based license model.
Inability to deliver high quality data in a timely manner to business stakeholders.
Evolution from traditional operational data warehouse to contemporary logical data warehouse deemed necessary for faster speed.
General purpose platform to deliver data through logical data warehouse.
Denodo Abstraction Layer helps live invoicing with SAP.
Data virtualization enabled a culture of “see before you build”.
Successfully transitioned to subscription-based licensing.
For the first time, Autodesk can do single point security enforcement and have uniform data environment for access.
Autodesk, Inc. is an American multinational software corporation that makes software for the architecture, engineering, construction, manufacturing, media, and entertainment industries.
IOLAP, INC. - PROPRIETARY AND CONFIDENTIAL 62
“Good work building ETL jobs this year”
- No CEO Ever…
IOLAP, INC. - PROPRIETARY AND CONFIDENTIAL 66
BIGGER SURE ISN’T EASIER
• SKILLS
• EASY IN/HARD OUT
• ALL DATA SOURCES AREN’T EQUAL
IOLAP, INC. - PROPRIETARY AND CONFIDENTIAL 68
VIRTUALIZATION PROVIDES EASE OF USE
How the data goes in… How it gets back out…
IOLAP, INC. - PROPRIETARY AND CONFIDENTIAL 69
SOMEBODY BOUGHT SOMETHING BACK IN THE DAY
• WE HAVE TO DEAL WITH LEGACY
• HOMOGENEITY ISN’T REALISTIC
• ALL DATA SOURCES AREN’T EQUAL
IOLAP, INC. - PROPRIETARY AND CONFIDENTIAL 70
WHAT NOW?
• POC USING DENODOEXPRESS OR AWS
• IOLAP CAN HELP BUILD A ROADMAP
Founded in 2000 16 years Delivering Success
Headquartered in Frisco, Texas National Customer Base
Extended Workforce
U.S. Company with Offshore Capabilities 60 consultants in the U.S. (full-time, salaried)
50 consultants in Europe (Offshore – BIDC)
IOLAP, INC. - PROPRIETARY AND CONFIDENTIAL
IOLAP OVERVIEWFocused solely on big data, data strategy, advanced analytics, and reporting
71
Onsite
Near Shore
Offshore
Speakers
Chuck DeVries
VP, Enterprise Architecture
Vizient
Ravi Shankar
CMO
Denodo
Chris Walters
Sr. Solutions Consultant
Denodo
Charles Yorek
VP, Business Analytics
iOLAP
Next Steps
Attend the webinar “Realizing the Promise of Data Lakes” on December 15Register at: www.denodo.com
Access Denodo on AWSVisit: www.denodo.com/en/denodo-platform/denodo-platform-for-aws
Download Denodo ExpressThe free way to Data Virtualization!Download from: www.denodo.com
Thanks!
www.denodo.com [email protected]
© Copyright Denodo Technologies. All rights reservedUnless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.