gain insights from unstructured data using pivotal hd platform for big data flexible scalable...

31
1 © Copyright 2013 EMC Corporation. All rights reserved. Gain Insights From Unstructured Data Using Pivotal HD

Upload: votuyen

Post on 29-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

1 © Copyright 2013 EMC Corporation. All rights reserved.

Gain Insights From Unstructured Data Using Pivotal HD

Page 2: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

2 © Copyright 2013 EMC Corporation. All rights reserved.

Traditional Enterprise Analytics Process

Page 3: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

3 © Copyright 2013 EMC Corporation. All rights reserved.

The Fundamental Paradigm Shift

Internet age and exploding data growth

Enterprises leverage new data sources to identify emerging trends and opportunities

Traditional database tools not able to cope

Page 4: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

4 © Copyright 2013 EMC Corporation. All rights reserved.

Hadoop: Platform for Big Data

Flexible

Scalable

Inexpensive

Fault-toleran

Rapidly Adopted

Gain Insights from

Unstructured Data

Page 5: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

5 © Copyright 2013 EMC Corporation. All rights reserved.

The Analytics Process with Hadoop

Page 6: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

6 © Copyright 2013 EMC Corporation. All rights reserved.

$-

$20,000

$40,000

$60,000

$80,000

2008 2009 2010 2011 2012 2013

Big Data Platform Price/TB

Big Data DB Hadoop

Economics Have Changed the Game

Big Data RDBMS

pricing will ultimately

converge with

Hadoop pricing

Page 7: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

7 © Copyright 2013 EMC Corporation. All rights reserved.

Our Big Bets With Hadoop

1. HDFS becomes the data substrate for the next generation of data infrastructures

2. A set of integrated, enterprise-scale services will evolve on top of HDFS

1. Provisioning flexibility and elasticity become critical capabilities for this data infrastructure

Page 8: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

8 © Copyright 2013 EMC Corporation. All rights reserved.

Pivotal and Hadoop

Page 9: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

9 © Copyright 2013 EMC Corporation. All rights reserved.

Analytical Query Operational Intelligence

In-Memory DB

Run-Time Applications

In-Memory Objects

Enterprise Data Warehouse

RDBMS

Continues to serve as system of record

HDFS

Data Staging Platform

Data Mgmt. Services

Data Visualization

Compliance and financial reporting

Traditional BI/Reporting

Pivotal Data Fabric

Data Visualization

Stream Ingestion

Streaming Services

Page 10: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

10 © Copyright 2013 EMC Corporation. All rights reserved.

Flexible Deployment Model

deploy

Public Cloud On Premise Private Cloud

Page 11: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

11 © Copyright 2013 EMC Corporation. All rights reserved.

PIVOTAL HD The World’s Most Powerful Hadoop Distribution

Page 12: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

12 © Copyright 2013 EMC Corporation. All rights reserved.

What Is Pivotal HD?

World’s first true SQL processing for enterprise-

ready Hadoop

100% Apache Hadoop-based platform

Virtualization and cloud ready with VMWare and

Isilon

Available as a software-only or appliance-based

solution

Page 13: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

13 © Copyright 2013 EMC Corporation. All rights reserved.

Pivotal Hadoop Distributions

100% Open Source Compatible

Current Release Apache Hadoop 1.x

Upcoming Release Apache Hadoop 2.x

Page 14: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

14 © Copyright 2013 EMC Corporation. All rights reserved.

Pivotal HD Architecture: Apache

HDFS

HBase

Pig, Hive, Mahout

Map Reduce

Sqoop Flume

Resource Management & Workflow

Yarn

Zookeeper

Apache

Page 15: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

15 © Copyright 2013 EMC Corporation. All rights reserved.

Pivotal HD Architecture: Enterprise

HDFS

HBase

Pig, Hive, Mahout

Map Reduce

Sqoop Flume

Resource Management & Workflow

Yarn

Zookeeper

Command

Center

Hadoop Virtualization (HVE)

Data Loader

Pivotal HD Enterprise

Apache Pivotal HD Enterprise

Page 16: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

16 © Copyright 2013 EMC Corporation. All rights reserved.

Data Loader Architecture

Cloud Infrastructure Platform Cloud Infrastructure Platform

.

.

.

Streams

Push

Pull

Connectors

Flume

HDFS

Data Loader

Data Source Registration

Copy Strategy

Optimization

Web GUI and CLI

Data Destination Registration

Data Copy

Job Management

Data Processing

REST APIs

Files

HDFS

NFS

HTTP

FTP

Local

Page 17: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

17 © Copyright 2013 EMC Corporation. All rights reserved.

Cluster Management With Command Center

Configure

Monitor

Manage

Analyze

Deploy

Page 18: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

18 © Copyright 2013 EMC Corporation. All rights reserved.

Pivotal HD Architecture: HAWQ

HDFS

HBase

Pig, Hive, Mahout

Map Reduce

Sqoop Flume

Resource Management & Workflow

Yarn

Zookeeper

Command

Center

Data Loader

Pivotal HD Enterprise

Apache Pivotal HD Enterprise HAWQ

Xtension Framework

Catalog Services

Query Optimizer

Dynamic Pipelining

ANSI SQL + Analytics

HAWQ– Advanced Database Services

Hadoop Virtualization (HVE)

Page 19: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

19 © Copyright 2013 EMC Corporation. All rights reserved.

HAWQ: A True SQL Engine for Hadoop

Scale and Performance

Fault Tolerance

Transaction Support

Data Management and Analysis

Page 20: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

20 © Copyright 2013 EMC Corporation. All rights reserved.

Leveraging Greenplum DB On Top of Hadoop

HAWQ

Query Engine Catalog Service

HDFS

Resourc

e

Managem

ent

GPXF

Planner Optimizer

Executor Transaction

Manager

Page 21: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

21 © Copyright 2013 EMC Corporation. All rights reserved.

GPXF: Xtension Framework

Enable custom connector

development for other data

sources HDFS HBase Hive

Xtension Framework

Page 22: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

22 © Copyright 2013 EMC Corporation. All rights reserved.

HDFS Datanode

HAWQ Segment Host

HDFS Datanode

HAWQ Segment Host

HDFS Datanode

HAWQ Segment Host . . . Query Executor Query Executor Query Executor

Clients

JDBC/ODBC

SQL Console

SELECT beer, price FROM Bars b, Sells s WHERE b.name = s.bar AND b.city = ‘San Francisco’

HDFS Namenode

HAWQ Master Host

Query Optimizer

Query Parser

How HAWQ Works: Submit Query

Page 23: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

23 © Copyright 2013 EMC Corporation. All rights reserved.

HDFS Datanode

HAWQ Segment Host

HDFS Datanode

HAWQ Segment Host

HDFS Datanode

HAWQ Segment Host . . . Query Executor Query Executor Query Executor

Clients

JDBC/ODBC

SQL Console HDFS Namenode

HAWQ Master Host

Query Optimizer

Query Parser

How HAWQ Works: Optimizer

Cost Model

Resources

Parse Tree

Metadata

Page 24: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

24 © Copyright 2013 EMC Corporation. All rights reserved.

HDFS Datanode

HAWQ Segment Host

HDFS Datanode

HAWQ Segment Host

HDFS Datanode

HAWQ Segment Host . . . Query Executor Query Executor Query Executor

Clients

JDBC/ODBC

SQL Console HDFS Namenode

HAWQ Master Host

Query Optimizer

Query Parser

HAWQ Query Plan

ScanBars

b

HashJoinb.name = s.bar

ScanSells

s Filterb.city = 'San Francisco'

 

Projects.beer, s.price

MotionGather

MotionRedist(b.name)

Page 25: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

25 © Copyright 2013 EMC Corporation. All rights reserved.

HDFS Datanode

HAWQ Segment Host

HDFS Datanode

HAWQ Segment Host

HDFS Datanode

HAWQ Segment Host . . . Query Executor Query Executor Query Executor

Clients

JDBC/ODBC

SQL Console HDFS Namenode

HAWQ Master Host

Query Optimizer

Query Parser

Query Plan Sent To HAWQ Segments

ScanBars

b

HashJoinb.name = s.bar

ScanSells

s Filterb.city = 'San Francisco'

Projects.beer, s.price

MotionGather

MotionRedist(b.name)

ScanBars

b

HashJoinb.name = s.bar

ScanSells

s Filterb.city = 'San Francisco'

Projects.beer, s.price

MotionGather

MotionRedist(b.name)

ScanBars

b

HashJoinb.name = s.bar

ScanSells

s Filterb.city = 'San Francisco'

Projects.beer, s.price

MotionGather

MotionRedist(b.name)

Page 26: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

26 © Copyright 2013 EMC Corporation. All rights reserved.

HDFS Datanode

HAWQ Segment Host

HDFS Datanode

HAWQ Segment Host

HDFS Datanode

HAWQ Segment Host . . . Query Executor Query Executor Query Executor

Clients

JDBC/ODBC

SQL Console HDFS Namenode

HAWQ Master Host

Query Optimizer

Query Parser

HAWQ Leverages Dynamic Pipelining

D y n a m i c P i p e l i n i n g ™

Page 27: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

27 © Copyright 2013 EMC Corporation. All rights reserved.

HDFS Datanode

HAWQ Segment Host

HDFS Datanode

HAWQ Segment Host

HDFS Datanode

HAWQ Segment Host . . . Query Executor Query Executor Query Executor

Clients

JDBC/ODBC

SQL Console HDFS Namenode

HAWQ Master Host

Query Optimizer

Query Parser

Aggregate Data: Sent To The Master & Client

Page 28: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

28 © Copyright 2013 EMC Corporation. All rights reserved.

HAWQ Deployment Model

Dynamic Pipelining

... ...

... ... Master

Servers & Name Nodes

Query planning & dispatch

Segment Servers &

Data Nodes

Query processing & data storage

External Sources

Loading, streaming, etc.

HDFS

ODBC/JDBC Driver

Page 29: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

29 © Copyright 2013 EMC Corporation. All rights reserved.

HAWQ Benchmarks

User inteligence 4.2 198

Sales analysis 8.7 161

Click analysis 2.0 415

Data exploration 2.7 1,285

BI drill down 2.8 1,815

47X

19X

208X

476X

648X

Page 30: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data

30 © Copyright 2013 EMC Corporation. All rights reserved.

HAWQ: The Foundation of Big Data

Analytical Query Operational Intelligence

In-Memory DB

Run-Time Applications

In-Memory Objects

HDFS

Data Staging Platform

Data Mgmt. Services

Pivotal Data Fabric

Stream Ingestion

Streaming Services

Page 31: Gain Insights From Unstructured Data Using Pivotal HD Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Rapidly Adopted Gain Insights from Unstructured Data