data ninja webinar series: data virtualization as the enterprise data fabric

30
Data Virtualization as the Enterprise Data Fabric webinars Data Ninja Webinar Series Sessions covering data virtualization solutions for driving business value

Upload: denodo

Post on 14-Apr-2017

52 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

Data Virtualization as the

Enterprise Data Fabric

webinars

Data Ninja Webinar SeriesSessions covering data virtualization solutions for driving business value

Page 2: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

2

Data Ninja Webinars

Five webinars over the next few months…

Page 3: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

Speakers

Senior Engineer

Pablo Alvarez

Page 4: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

Agenda1.The Data Fabric

2.Evolution of the Data Fabric: A Historical Perspective

3.Benefits

4.Performance and Scalability

5.Going Beyond

6.Q&A

Page 5: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

5

In computing, a Fabric is a system of interconnected nodes

that looks like a "weave" when viewed collectively from a

distance.

In this context, a Data Fabric is a system that allows global

access to all your data assets, and leverages storage and

processing power from multiple heterogeneous nodes.

Page 6: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

6

Data Virtualization as the Data Fabric

Offers a common access point for consumers

Allows specialized data stores to be used for what

they are best at

With other approaches, like Data Lakes, that are

based on replication to a single large target system,

this ability is lost.

Data virtualization’s architecture is based on the usage of underlying sources whenever possible.This can be seen as a network of different specialized processing and storage nodes that form the Data Fabric under the umbrella of a common virtual data model:

Page 7: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

7

Successful Customer Use Cases

AGILE BUSINESS INTELLIGENCE

Replaced traditional BI with the Logical Data Warehouse that integrates multiple sources around a central EDW

360 VIEW APPLICATIONS

‘Unified Desktop’ that provides integrated customer information

CLOUD INTEGRATION

Virtual layer to abstract access to SaaS applications and enable integration with data center

DATA SERVICES

Services Layer (REST, OData) on top of Denodo’s data model with access to any data

Page 8: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

Evolution of the Data Fabric: A Historical Perspective

Page 9: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

9

The Old Days: EDW Reporting

Simple WYSIWYG reporting tools

One-to-One reporting on top a tailor-

made Data Warehouse and Data

Marts

Problems:

Poor reusability

Reports built on top of Data Mart

data model

Excessive replication

OperationalData

Staging EDW

SQL

Data Mart

Page 10: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

10

The Dawn: Reporting with Semantic Layers

OperationalData

Staging EDW

SQL

More advanced reporting tools with

a built-in semantic layer for easier

use and better reusability

One-to-One reporting on top a

tailor-made Data Warehouse

Problems:

Limited to a single source

Limited to a single reporting tool

Page 11: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

11

Reporting with Federation

OperationalData

Staging EDW

SQL

Reporting tools add a built-in

federation engine that allows for

multi-source reporting

Problems:

Bad Performance

Limited cross-source security

Limited to a single reporting tool

Other RDBMS

Page 12: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

12

Early Data Virtualization

OperationalData

Staging EDW

SQL

Data Virtualization as an

independent semantic abstraction

layer

Reusable semantic model can be

used by multiple reporting tools

Engine specialized in federation

(optimizer, caching, etc)

Integrated security

Other RDBMS

IntegratedSecurity

Other Sources

Cache

Page 13: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

13

Mature Data Virtualization

OperationalData

EDW

SQL

IntegratedSecurity

Other Sources

Cache

In-memoryFabric

BigData

SaaS

RESTOData

Catalog &Data Exploration

Monitoring Auditing

Page 14: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

Benefits

Page 15: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

Benefits

15

Data Virtualization as the Enterprise Data Fabric

Abstracts access to disparate data sources

• Homogeneous data access regardless of back-end technology

• No need to deal with new languages and APIs: access to SFDC, Excel,

Redshift, Oracle, Hadoop, other SaaS APIs, etc.

15

Acts as a single semantic repository

• Definition of a consistent business data model across all consumers and

reporting tools

• Combination of data regardless of locations and nature

• Avoids unnecessary replication

Page 16: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

Benefits

16

Data Virtualization as the Enterprise Data Fabric

16

Centralized security layer

• Role-based authorization to all tables in the virtual layer

• Integration with AD/LDAP and Kerberos

• Security is moved outside the reporting layer to avoid security bypasses

• Centralized access point simplifies operations and auditing

Real-time fabric execution model

• Advanced optimizer designed specifically for virtualization

• Execution push-down to leverage source computing capabilities

• Data comes straight from the sources

• Cache layer to improve performance when needed

Page 17: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

Performance & Scalability

Page 18: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

18

A mature virtualization engine like Denodo offers

results comparable with single source executions.

Let’s see how this is possible…

Page 19: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

19

PerformanceDenodo’s unique query optimizer

Denodo’s optimizer borrows many techniques from traditional RDBMs

Cost-base query plans based on statistics and indexes

Multiple JOIN methods

Query rewriting to generate more optimal SQL

However, given the distributed execution of a query in a processing

fabric, Denodo has designed unique techniques to maximize

performance in this environment

Dynamic rewriting focused on maximizing execution at source and reduction of

network traffic

Cost estimates also factor-in:

Processing power of the sources (e.g. number of nodes in a Hadoop cluster)

Network and transfer rates

Page 20: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

20

PerformanceDV Overhead: Direct vs Denodo with single source

TPCDS Benchmark Tests using JDBC with IBM Netezza as data source with 10 Gbps LAN networkResults in seconds

When queries only hit an individual source, the data virtualization layer pushes the processing completely to the source with minimal overhead

As a note, since data needs to flow through the DV layer, the network between sources and DV should be broad to avoid network bottlenecks

Page 21: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

21

Performance

Denodo has done extensive testing using queries from the standard benchmarking test

TPC-DS* and the following scenario that compares the performance of a federated

approach in Denodo with an MPP system where all the data has been replicated via ETL

Benchmarks: Federating large data sets

Customer Dim.2 M rows

Sales Facts290 M rows

Items Dim.400 K rows

* TPC-DS is the de-facto industry standard benchmark for measuring the performance of decision support solutions including, but not limited to, Big Data systems.

vs.Sales Facts290 M rows

Items Dim.400 K rows

Customer Dim.2 M rows

Page 22: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

22

Performance

Query DescriptionReturned

RowsNetezza Time

Denodo Time (Federated Oracle,

Netezza & SQL Server)

Denodo Optimization Technique (automatically

selected)

Total sales by customer 1.99 M 20.9 sec. 21.4 sec. Full aggregation push-down

Total sales by customer and year between 2000 and 2004

5.51 M 52.3 sec. 59.0 sec. Full aggregation push-down

Total sales by item brand 31.35 K 4.7 sec. 5.0 sec. Partial aggregation push-down

Total sales by item where sale price less than current

list price17.05 K 3.5 sec. 5.2 sec. On the fly data movement

Benchmarks: Federating large data sets

Execution times are comparable with single source executions based only on automatic

optimizer decisions

Page 23: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

23

Performance

SELECT c.id, SUM(s.amount) as total

FROM customer c JOIN sales s

ON c.id = s.customer_id

GROUP BY c.id

Reporting Tools are not optimized for federation across sources

System Execution Time Data Transferred

Optimization Technique

(automatically selected)

Denodo 9 sec. 4 M Aggregation push-down

Tableau 125 sec. 292 M None: full scan

Join

Group By

290 M 2 M

Sales Customer

Group By

Join

2 M

2 M

Sales Customer

Page 24: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

24

Scalability

SQL Cluster:Denodo1:9999Denodo2:9999Denodo3:9999Denodo4:9999

Web Cont. Cluster:Denodo1:9090Denodo2:9090Denodo3:9090Denodo4:9090

Virtual ServerSQL Cluster: 192.168.0.10:9999Web Container Cluster: 192.168.0.10:9090

Load Balancer Shared Cache Server

Denodo can be deployed in a

cluster for HA and horizontal

scaling

“Shared-nothing” execution

engine ensures linear

scalability

Based on the use of an

external load balancer

Supports auto-scaling for cloud

deployments (like AWS)

Page 25: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

Going Beyond

Page 26: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

Going Beyond

26

What’s cooking in the virtualization space

26

Holistic Operations Console

• Common operations web console to orchestrate monitoring,

notifications, diagnosis, auditing, migration, license management, etc.

Web-based Self Service

• Advanced catalog enables a centralized “data marketplace”

• Keyword base search

• Collaboration (tags, comments, request for access, etc.)

Next-gen “Fabric” Execution Engine

• Tight integration with in-memory and data grids to move processing

from the virtual layer to specialized execution engines

Page 27: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

Q&A

Page 28: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

Next Steps

Get Started!Download Denodo Express: www.denodoexpress.comAccess Denodo Platform on AWS: www.denodo.com/en/denodo-platform/denodo-platform-for-aws

Denodo Platform 6.0 WhitepaperDownload & Read: http://www.denodo.com/en/document/whitepaper/denodo-platform-60-whitepaper

Data Virtualization for Data ServicesVisit: http://www.denodo.com/en/solutions/horizontal-solutions/data-services

Page 29: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

Data Ninja Webinar SeriesSessions covering data virtualization solutions for driving business value

Next Session:

Realizing the Promise of Data LakesThursday, December 15th , 2016

Page 30: Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

Thanks!

www.denodo.com [email protected]

© Copyright Denodo Technologies. All rights reservedUnless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.