Data Integration AlternativesPaul Moxon, Senior Director, Product Management
Agenda1.Three Key Trends Affecting IT
2.The Logical Data Warehouse
3.Data Integration Layer Alternatives
4.The Logical Data Warehouse Revisited
Three Key Trends Affecting IT
4
1. Reduce corporate data silos to
gain efficiency and
productivity
2. Towards a common data
backbone for operational and
informational use
3. Enterprises going with
bimodal IT in their
modernization efforts
Three Key Trends
5
1. Reduce corporate data silos to
gain efficiency and
productivity
2. Towards a common data
backbone for operational and
informational use
3. Enterprises going with
bimodal IT in their
modernization efforts
• Organizational structures create
specialized data and application
silos
• The proliferation of silos has
inhibited access to and the sharing
of data across the organization
• Consolidating and opening up
these silos (while retaining
ownership and control) will
promote efficiency and productivity
Trend I - Consolidation
6
1. Reduce corporate data silos to
gain efficiency and
productivity
2. Towards a common data
backbone for operational and
informational use
3. Enterprises going with
bimodal IT in their
modernization efforts
• Access to data via logical layer for common and consistent view of data assets
• Example: Customer Data
• All analytics, reports, processes, applications (web, mobile, desktop) should see same customer data
• Is this a Data Lake?
• In reality there will be more than one data lake (separate or refined)
Trend II – Common Data Backbone
7
1. Reduce corporate data silos to
gain efficiency and
productivity
2. Towards a common data
backbone for operational and
informational use
3. Enterprises going with
bimodal IT in their
modernization efforts
• Bimodal IT has two IT ‘flavors’
• Type 1 – focused on stability and efficiency (traditional IT)
• Type 2 – experimental and agile focused on TTM and rapid app evolution. Aligned with business.
• Some have compared to ‘SoR’ and ‘SoE’ differentiation
• Two need to live side-by-side and interact
• New apps still need data from ‘SoR’
Trend III – Bimodal IT
8
What Does This Mean?
• A data access layer is needed to ‘open up’ data silos
But retaining local ownership and control of the data
• The access layer must provide access to all data sources and support different
modes of access
Reporting/analytics, real-time applications access (mobile/web and ‘traditional’), etc.
• New technologies will be an important part of the information infrastructure
Hadoop ecosystem, NoSQL, streaming data, “Data Lakes”
• The traditional IT infrastructure is not going away soon
‘Systems of Record’ still needed
• The new and the old need to work together
Newer systems still needs to interact with ‘Systems of Record’
How does this affect the ‘Information Architecture’?
Logical Data Warehouse
10
Logical Data Warehouse
Definition:
“The Logical Data Warehouse (LDW) is a new data management architecture for analytics combining the strengths of traditional repository warehouses with alternative data management and access strategy.”
“The LDW is an evolution and augmentation of DW practices, not a replacement”
“A repository-only style DW contains a single ontology/taxonomy, whereas in the LDW a semantic layer can contain many combination of use cases, many business definitions of the same information”
“The LDW permits an IT organization to make a large number of datasets available … via query tools and applications”
Gartner Hype Cycle for Enterprise Information Management, 2012.
11
Architecture of the Logical Data Warehouse
Data Warehouse
Sensor Data
Machine Data (Logs)
Social Data
Clickstream Data
Internet Data
Image and Video
Enterprise Content (Unstructured)
Big Data
Enterprise Applications
Traditional Enterprise
Data
Cloud
Cloud Applications
Metadata Management, Data Governance, Data Security
NoSQL
EDWIn-Memory
(SAP Hana, …)Analytical
Appliances
Cloud DW(Redshift,..)
ODS
Big DataETL
CDC
Sqoop
(Flume, Kafka, …)
Real-Time Data Access (On-Demand / Streaming)
Batch
YARN / Workload Management
HDFS
HiveSparkDrill
Impala
Storm HBase SolrHunk
DW Streams NoSQL SearchSQL
Hadoop
TezMapRed.
Data
In
teg
rati
on
/S
em
an
tic L
ayer
Real-TimeDecision
Management
Alerts
ScorecardsDashboards
Reporting
Data DiscoverySelf-Service
Search
Predictive Analytics
Statistical Analytics (R)
Text Analytics
Data Mining
12
Autodesk Data Architecture
Data
In
teg
rati
on
/S
em
an
tic L
ayer
Data Integration/Semantic Layer Alternatives
14
Three Integration/Semantic Layer Alternatives
Application/BI Tool as Data Integration/Semantic Layer
EDW as Data Integration/Semantic Layer
Data Virtualization as Data Integration/Semantic Layer
Application/BI Tool Data Virtualization
EDW
EDW
ODS ODS EDW ODS
15
Application/BI Tool as the Data Integration Layer
Application/BI Tool as Data Integration/Semantic Layer
Application/BI Tool
EDW ODS
• Integration is delegated to end user tools
and applications
• e.g. BI Tools with ‘data blending’
• Results in duplication of effort – integration
defined many times in different tools
• Impact of change in data schema?
• End user tools are not intended to be
integration middleware
• Not their primary purpose or expertise
16
EDW as the Data Integration Layer
EDW as Data Integration/Semantic Layer
EDW
ODS
• Access to ‘other’ data (query federation) via EDW
• Teradata QueryGrid, IBM FluidQuery, SAP Smart Data Access, etc.
• Often coupled with traditional ETL replication of data into EDW
• EDW ‘center of data universe’
• Provides data integration and semantic layer
• Appears attractive to organizations heavily invested in EDW
• More than one EDW? EDW costs?
17
Data Virtualization as the Data Integration Layer
Data Virtualization as Data Integration/Semantic Layer
Data Virtualization
EDW ODS
• Move data integration and semantic layer to
independent Data Virtualization platform
• Purpose built for supporting data access
across multiple heterogeneous data sources
• Separate layer provides semantic models for
underlying data
• Physical to logical mapping
• Enforces common and consistent security
and governance policies
• Gartner’s recommended approach
Logical Data Warehouse Revisited
19
Architecture of the Logical Data Warehouse
Real-TimeDecision
Management
Alerts
ScorecardsDashboards
Reporting
Data DiscoverySelf-Service
Search
Predictive Analytics
Statistical Analytics (R)
Text Analytics
Data MiningData Warehouse
Sensor Data
Machine Data (Logs)
Social Data
Clickstream Data
Internet Data
Image and Video
Enterprise Content (Unstructured)
Big Data
Enterprise Applications
Traditional Enterprise
Data
Cloud
Cloud Applications
NoSQL
EDWIn-Memory
(SAP Hana, …)Analytical
Appliances
Cloud DW(Redshift,..)
ODS
Big DataETL
CDC
Sqoop
(Flume, Kafka, …)
Data Virtualization
Real-Time Data Access (On-Demand / Streaming)
Data Caching
Data
Serv
ices
Data Search & Discovery
Governance
Security
Optimization
Data
Abstr
action
Data
Tra
nsfo
rmation
Data
Federa
tionBatch
YARN / Workload Management
HDFS
HiveSparkDrill
Impala
Storm HBase SolrHunk
DW Streams NoSQL SearchSQL
Hadoop
TezMapRed.
20
Autodesk Data Architecture
21
1. The 3 trends will change your
‘information architecture’
2. Logical Data Warehouse (LDW) is a key
architectural pattern to address many of
the challenges of the new information
architecture
3. LDW requires a data
integration/semantic layer
4. Data Virtualization is the recommended
approach for this critical layer
Summary
Thanks!
www.denodo.com [email protected]
© Copyright Denodo Technologies. All rights reservedUnless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.