pivotal digital transformation forum: journey to become a data-driven enterprise

29
Journey To Becoming a Data-Driven Enterprise: Pivotal Big Data Suite Technical Overview Feras Alsamawi Senior Field Engineer Pivotal

Upload: pivotal

Post on 24-Jan-2017

1.003 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

Journey To Becoming a Data-Driven Enterprise:

Pivotal Big Data Suite Technical Overview

Feras Alsamawi Senior Field Engineer Pivotal

Page 2: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

2 © Copyright 2015 Pivotal. All rights reserved.

Agenda

•  Big Data Challenges

•  The Value of Data

•  Pivotal Big Data Suite

•  The Open Data Platform

•  Big Data Architectures

Page 3: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

© Copyright 2015 Pivotal. All rights reserved.

THE POWER OF 1

R X

Increasing Freight Utilization Rail

Predictive Maintenance Healthcare

Predictive Diagnostics Power

Driving Outcomes That Matter

One Percent Improvement Equals

$27B Industry Value by Reducing System

Inefficiency

$63B Industry Value by Reducing Process

Inefficiency

$66B Industry Value with

Efficiency Improvements In Gas-fired Power

Plant Fleets Source: General Electric

Page 4: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

4 © Copyright 2015 Pivotal. All rights reserved.

BIG DATA CHASM

70% of data

generated by customers

80% of data stored

3% prepared for

analysis

0.5% being

analyzed

<0.5% being

operationalized

4

THE DATA DIVIDE

Page 5: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

5 © Copyright 2015 Pivotal. All rights reserved.

Software Is Eating The World

Data Is Fueling Software

SOFTWARE IS EATING THE WORLD

Page 6: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

6 © Copyright 2015 Pivotal. All rights reserved.

JOURNEY TO A DATA-DRIVEN ENTERPRISE

Deploy analytic apps and automate at scale

Perform advanced analytics Discover insights

Modernize data infrastructure

Page 7: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

7 © Copyright 2015 Pivotal. All rights reserved.

The value of Data

Time

Value of Information

µs ms s hour day month year yr+

Page 8: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

8 © Copyright 2015 Pivotal. All rights reserved.

Traditional Systems

The value of Data

Time

Value of Information

µs ms s hour day month year yr+

Page 9: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

9 © Copyright 2015 Pivotal. All rights reserved.

Speed Layer

Traditional Systems

Pivotal’s λ Architecture

Serving Layer “Big Data”

Time

Value of Information

µs ms s hour day month year yr+

Spring XD

Pivotal HD

Pivotal GemFire

Batch Layer

Page 10: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

10 © Copyright 2015 Pivotal. All rights reserved.

WHY PIVOTAL FOR BIG DATA ? Complete platform

SQL on Hadoop leadership

Deployment options

Open source

Flexible licensing

Advanced data services

Pivotal Data Engineering Pivotal Labs Pivotal Data Science

Page 11: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

11 © Copyright 2015 Pivotal. All rights reserved. 11 © Copyright 2013 Pivotal. All rights reserved.

Big Data Suite Components

Page 12: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

12 © Copyright 2015 Pivotal. All rights reserved.

HDFS Data Lake Expert System /

Machine Learning

In-Memory Real-Time Data

Continuous Learning Continuous Improvement

Continuous Adapting

Data Stream Pipeline

Multiple Data Sources Real-Time Processing Store Everything

Pro-Active, Self-Improving, Machine Learning Systems

Page 13: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

13 © Copyright 2015 Pivotal. All rights reserved.

Ingest Transform Sink

SpringXD GemFire

Data Stream Needs an Agile, Scalable and Fast Solution

HAWQ GPDB

Data Lake

Page 14: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

14 © Copyright 2015 Pivotal. All rights reserved.

Ingest Transform Sink

SpringXD

Distributed Computing

In-Memory Real-Time Data

Spring XD Orchestrates and Automates all the Steps on Data Stream Pipelining

Expert System / Machine Learning

Extensible Open-Source Fault-Tolerant Horizontally Scalable HAWQ GPDB

Data Lake

Page 15: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

15 © Copyright 2015 Pivotal. All rights reserved.

Ingest / SINK Process Analyze

•  No coding required

•  Dozens of built-in connectors

•  Seamless integration with Kafka, Sqoop

•  Create new connectors easily using Spring

•  Call Spark, Reactor or RxJava

•  Built-in configurable filtering, splitting and transformation

•  Out-of-box configurable jobs for batch processing

•  Import and invoke PMML jobs easily

•  Call Python, R, Madlib and other tools

•  Built-in configurable counters and gauges

Spring XD State of the Art Data Pipeline Automation

Page 16: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

16 © Copyright 2015 Pivotal. All rights reserved.

Ingest Transform Sink

SpringXD

Distributed Computing

GemFire Provides Scalable, Low-Latency Data Access, Storage and Event Processing

Expert System / Machine Learning

GemFire

Extensible Open-Source Fault-Tolerant Horizontally Scalable HAWQ GPDB

Data Lake

Page 17: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

17 © Copyright 2015 Pivotal. All rights reserved.

GemFire

•  In-Memory Enterprise Data Grid •  Horizontally Scalable, Consistent, Highly

Available

•  Event handling •  Continuous Queries •  Enterprise Data Geo Distribution

In-memory Real Time Data

Page 18: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

18 © Copyright 2015 Pivotal. All rights reserved.

Ingest Transform Sink

SpringXD

Distributed Computing

Pivotal Provides SQL Based Advanced Analytics

GemFire

Extensible Open-Source Fault-Tolerant Horizontally Scalable

Data Lake

HAWQ GPDB

Page 19: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

19 © Copyright 2015 Pivotal. All rights reserved.

HAWQ

•  Massively Parallel Processing RDBMS on HADOOP

•  ANSI SQL on Hadoop •  Extremely high performance for

analytics (not like Hive) •  Stores all data directly on

HDFS

•  Functions in MADlib, R/Python/Java, Perl, pgsql or C languages

Advanced SQL analytics in Hadoop

Combining SQL with Hadoop is key for analytics

SQL remains #1 choice for Data Science

Page 20: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

20 © Copyright 2015 Pivotal. All rights reserved.

Data Streaming Reference Architecture Data Feeds Transactional Apps Analytic Apps

Data Stream Pipeline

Distributed Computing Real-Time Data

Expert Systems & Machine Learning

Advanced Analytics

HDFS Data Lake

Page 21: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

21 © Copyright 2015 Pivotal. All rights reserved.

Data Streaming Reference Architecture Data Feeds Transactional Apps Analytic Apps

Data Stream Pipeline

HDFS Data Lake

GemFire HAWQ GPDB

SpringXD

Page 22: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

BUILT FOR THE SPEED OF BUSINESS

Page 23: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

23 © Copyright 2015 Pivotal. All rights reserved.

Financial Compliance

BUSINESS PROBLEM •  Ensure compliance with Dodd-Frank and Basel

Committee regulations

•  Identify underlying risk and fraud while reducing the compliance department’s overburdened

Emails Chats Trades

Transactions Policy Securities

Phone Calls Watch Lists …

Financial compliance Data Lake

Data integration

Data clean up Modeling

Classification and ranking

Analyst user interfaces Feedback

Analytics

Analyst feedback Data integration: e.g., append trade information with email and chat communications

Data cleanup: e.g., identify newsletters and spam emails

Modeling: •  Predictive modeling to flag

messages and trades •  Graph and cohort analysis

Analyst feedback Reviewed fraud instances included in periodic model refreshes

SOLUTION �  A data lake platform coupled with cutting edge data

science techniques

�  Flexible user interface to promote an adaptive, continuously learning compliance framework

Page 24: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

24 © Copyright 2015 Pivotal. All rights reserved.

Pivotal Topic & Sentiment Analysis Engine

External Tables

PXF

HDFS

Source: http Sink: hdfs

Parallel Parsing of JSON

(PL/Python)

HAWQ

Nightly Cron Jobs

Topic Analysis through MADlib pLDA

Unsupervised Sentiment Analysis

(PL/Python)

D3.js

Spring XD

Twitter Decahose (~55 million tweets/day)

Page 25: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

25 © Copyright 2015 Pivotal. All rights reserved. 25 © Copyright 2013 Pivotal. All rights reserved.

IoT Architectures – The connected Car

Page 26: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

26 © Copyright 2015 Pivotal. All rights reserved.

The Connected Car Architecture INGESTION

JSON / HTTP

STREAM PROCESSING

Spring XD Transform Enrich

DATA LAKE

Pivotal HD Sink

ADVANCED ANALYTICS

HAWQ

REAL-TIME DATA INSIGHTS

GemFire

MOBILE SERVICES

MICROSERVICES

Pivotal CF Dashboard Analytics App Simulator

IoT APPS

Rabbit MQ

PUSH

Page 27: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

27 © Copyright 2015 Pivotal. All rights reserved.

Horizontally Scalable Fault Tolerant Extensible Open-Source

STREAM PROCESSING

Spring XD

Rabbit MQ

DATA LAKE

Pivotal HD

ADVANCED ANALYTICS

HAWQ

ENRICHER PREDICTIVE ANALYTICS

+ Timestamp & GUID

+ MPG, rangE & route

MOBILE APP

JSON

REAL-TIME DATA INSIGHTS

GemFire

CAR SENSOR

Sink

Tap

DASHBOARD

Page 28: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

28 © Copyright 2015 Pivotal. All rights reserved.

FOR FURTHER INFO…

•  Pivotal Data Product Info, Docs and Downloads @ http://pivotal.io/big-data

•  Pivotal Blog @ http://blog.pivotal.io

•  Pivotal Data Science Blog @ http://blog.pivotal.io/data-science-pivotal

•  Pivotal Academy @ https://pivotal.biglms.com

•  Or reach out to your local Pivotal Account Executive…

Page 29: Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise

Digital Transformation Forum

Disrupt or Be Disrupted 19 OCTOBER · BMW WELT EVENT CENTRE · MUNICH