pivotal digital transformation forum: journey to become a data-driven enterprise

Post on 24-Jan-2017

1.003 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Journey To Becoming a Data-Driven Enterprise:

Pivotal Big Data Suite Technical Overview

Feras Alsamawi Senior Field Engineer Pivotal

2 © Copyright 2015 Pivotal. All rights reserved.

Agenda

•  Big Data Challenges

•  The Value of Data

•  Pivotal Big Data Suite

•  The Open Data Platform

•  Big Data Architectures

© Copyright 2015 Pivotal. All rights reserved.

THE POWER OF 1

R X

Increasing Freight Utilization Rail

Predictive Maintenance Healthcare

Predictive Diagnostics Power

Driving Outcomes That Matter

One Percent Improvement Equals

$27B Industry Value by Reducing System

Inefficiency

$63B Industry Value by Reducing Process

Inefficiency

$66B Industry Value with

Efficiency Improvements In Gas-fired Power

Plant Fleets Source: General Electric

4 © Copyright 2015 Pivotal. All rights reserved.

BIG DATA CHASM

70% of data

generated by customers

80% of data stored

3% prepared for

analysis

0.5% being

analyzed

<0.5% being

operationalized

4

THE DATA DIVIDE

5 © Copyright 2015 Pivotal. All rights reserved.

Software Is Eating The World

Data Is Fueling Software

SOFTWARE IS EATING THE WORLD

6 © Copyright 2015 Pivotal. All rights reserved.

JOURNEY TO A DATA-DRIVEN ENTERPRISE

Deploy analytic apps and automate at scale

Perform advanced analytics Discover insights

Modernize data infrastructure

7 © Copyright 2015 Pivotal. All rights reserved.

The value of Data

Time

Value of Information

µs ms s hour day month year yr+

8 © Copyright 2015 Pivotal. All rights reserved.

Traditional Systems

The value of Data

Time

Value of Information

µs ms s hour day month year yr+

9 © Copyright 2015 Pivotal. All rights reserved.

Speed Layer

Traditional Systems

Pivotal’s λ Architecture

Serving Layer “Big Data”

Time

Value of Information

µs ms s hour day month year yr+

Spring XD

Pivotal HD

Pivotal GemFire

Batch Layer

10 © Copyright 2015 Pivotal. All rights reserved.

WHY PIVOTAL FOR BIG DATA ? Complete platform

SQL on Hadoop leadership

Deployment options

Open source

Flexible licensing

Advanced data services

Pivotal Data Engineering Pivotal Labs Pivotal Data Science

11 © Copyright 2015 Pivotal. All rights reserved. 11 © Copyright 2013 Pivotal. All rights reserved.

Big Data Suite Components

12 © Copyright 2015 Pivotal. All rights reserved.

HDFS Data Lake Expert System /

Machine Learning

In-Memory Real-Time Data

Continuous Learning Continuous Improvement

Continuous Adapting

Data Stream Pipeline

Multiple Data Sources Real-Time Processing Store Everything

Pro-Active, Self-Improving, Machine Learning Systems

13 © Copyright 2015 Pivotal. All rights reserved.

Ingest Transform Sink

SpringXD GemFire

Data Stream Needs an Agile, Scalable and Fast Solution

HAWQ GPDB

Data Lake

14 © Copyright 2015 Pivotal. All rights reserved.

Ingest Transform Sink

SpringXD

Distributed Computing

In-Memory Real-Time Data

Spring XD Orchestrates and Automates all the Steps on Data Stream Pipelining

Expert System / Machine Learning

Extensible Open-Source Fault-Tolerant Horizontally Scalable HAWQ GPDB

Data Lake

15 © Copyright 2015 Pivotal. All rights reserved.

Ingest / SINK Process Analyze

•  No coding required

•  Dozens of built-in connectors

•  Seamless integration with Kafka, Sqoop

•  Create new connectors easily using Spring

•  Call Spark, Reactor or RxJava

•  Built-in configurable filtering, splitting and transformation

•  Out-of-box configurable jobs for batch processing

•  Import and invoke PMML jobs easily

•  Call Python, R, Madlib and other tools

•  Built-in configurable counters and gauges

Spring XD State of the Art Data Pipeline Automation

16 © Copyright 2015 Pivotal. All rights reserved.

Ingest Transform Sink

SpringXD

Distributed Computing

GemFire Provides Scalable, Low-Latency Data Access, Storage and Event Processing

Expert System / Machine Learning

GemFire

Extensible Open-Source Fault-Tolerant Horizontally Scalable HAWQ GPDB

Data Lake

17 © Copyright 2015 Pivotal. All rights reserved.

GemFire

•  In-Memory Enterprise Data Grid •  Horizontally Scalable, Consistent, Highly

Available

•  Event handling •  Continuous Queries •  Enterprise Data Geo Distribution

In-memory Real Time Data

18 © Copyright 2015 Pivotal. All rights reserved.

Ingest Transform Sink

SpringXD

Distributed Computing

Pivotal Provides SQL Based Advanced Analytics

GemFire

Extensible Open-Source Fault-Tolerant Horizontally Scalable

Data Lake

HAWQ GPDB

19 © Copyright 2015 Pivotal. All rights reserved.

HAWQ

•  Massively Parallel Processing RDBMS on HADOOP

•  ANSI SQL on Hadoop •  Extremely high performance for

analytics (not like Hive) •  Stores all data directly on

HDFS

•  Functions in MADlib, R/Python/Java, Perl, pgsql or C languages

Advanced SQL analytics in Hadoop

Combining SQL with Hadoop is key for analytics

SQL remains #1 choice for Data Science

20 © Copyright 2015 Pivotal. All rights reserved.

Data Streaming Reference Architecture Data Feeds Transactional Apps Analytic Apps

Data Stream Pipeline

Distributed Computing Real-Time Data

Expert Systems & Machine Learning

Advanced Analytics

HDFS Data Lake

21 © Copyright 2015 Pivotal. All rights reserved.

Data Streaming Reference Architecture Data Feeds Transactional Apps Analytic Apps

Data Stream Pipeline

HDFS Data Lake

GemFire HAWQ GPDB

SpringXD

BUILT FOR THE SPEED OF BUSINESS

23 © Copyright 2015 Pivotal. All rights reserved.

Financial Compliance

BUSINESS PROBLEM •  Ensure compliance with Dodd-Frank and Basel

Committee regulations

•  Identify underlying risk and fraud while reducing the compliance department’s overburdened

Emails Chats Trades

Transactions Policy Securities

Phone Calls Watch Lists …

Financial compliance Data Lake

Data integration

Data clean up Modeling

Classification and ranking

Analyst user interfaces Feedback

Analytics

Analyst feedback Data integration: e.g., append trade information with email and chat communications

Data cleanup: e.g., identify newsletters and spam emails

Modeling: •  Predictive modeling to flag

messages and trades •  Graph and cohort analysis

Analyst feedback Reviewed fraud instances included in periodic model refreshes

SOLUTION �  A data lake platform coupled with cutting edge data

science techniques

�  Flexible user interface to promote an adaptive, continuously learning compliance framework

24 © Copyright 2015 Pivotal. All rights reserved.

Pivotal Topic & Sentiment Analysis Engine

External Tables

PXF

HDFS

Source: http Sink: hdfs

Parallel Parsing of JSON

(PL/Python)

HAWQ

Nightly Cron Jobs

Topic Analysis through MADlib pLDA

Unsupervised Sentiment Analysis

(PL/Python)

D3.js

Spring XD

Twitter Decahose (~55 million tweets/day)

25 © Copyright 2015 Pivotal. All rights reserved. 25 © Copyright 2013 Pivotal. All rights reserved.

IoT Architectures – The connected Car

26 © Copyright 2015 Pivotal. All rights reserved.

The Connected Car Architecture INGESTION

JSON / HTTP

STREAM PROCESSING

Spring XD Transform Enrich

DATA LAKE

Pivotal HD Sink

ADVANCED ANALYTICS

HAWQ

REAL-TIME DATA INSIGHTS

GemFire

MOBILE SERVICES

MICROSERVICES

Pivotal CF Dashboard Analytics App Simulator

IoT APPS

Rabbit MQ

PUSH

27 © Copyright 2015 Pivotal. All rights reserved.

Horizontally Scalable Fault Tolerant Extensible Open-Source

STREAM PROCESSING

Spring XD

Rabbit MQ

DATA LAKE

Pivotal HD

ADVANCED ANALYTICS

HAWQ

ENRICHER PREDICTIVE ANALYTICS

+ Timestamp & GUID

+ MPG, rangE & route

MOBILE APP

JSON

REAL-TIME DATA INSIGHTS

GemFire

CAR SENSOR

Sink

Tap

DASHBOARD

28 © Copyright 2015 Pivotal. All rights reserved.

FOR FURTHER INFO…

•  Pivotal Data Product Info, Docs and Downloads @ http://pivotal.io/big-data

•  Pivotal Blog @ http://blog.pivotal.io

•  Pivotal Data Science Blog @ http://blog.pivotal.io/data-science-pivotal

•  Pivotal Academy @ https://pivotal.biglms.com

•  Or reach out to your local Pivotal Account Executive…

Digital Transformation Forum

Disrupt or Be Disrupted 19 OCTOBER · BMW WELT EVENT CENTRE · MUNICH

top related