simplifying big data analytics: unifying batch and … · simplifying big data analytics: unifying...

20
Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli, VP Product In-Memory Compute Summit June 30, 2015

Upload: dotuyen

Post on 13-May-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Simplifying Big Data Analytics: Unifying Batch and … · Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli, ! VP Product! In-Memory Compute Summit!

Simplifying Big Data Analytics:Unifying Batch and Stream ProcessingJohn Fanelli, !VP Product!In-Memory Compute Summit!June 30, 2015!!

Page 2: Simplifying Big Data Analytics: Unifying Batch and … · Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli, ! VP Product! In-Memory Compute Summit!

© 2015 DataTorrent Confidential – Do Not Distribute

S S S

B   B   B  

D   G GG D  D  D  

Streaming  Analy.cs  

General-­‐purpose  data  processing  cluster  

Scale-­‐up  Database  

Data  And  Compute  Grid  

Clustered  Database  

Page 3: Simplifying Big Data Analytics: Unifying Batch and … · Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli, ! VP Product! In-Memory Compute Summit!

DataTorrent enables enterprises to process data in motion and take action in real-time !

Trend: Batch and streaming use cases!

Faster Time to Insight!Faster Time to Action!

Page 4: Simplifying Big Data Analytics: Unifying Batch and … · Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli, ! VP Product! In-Memory Compute Summit!

© 2015 DataTorrent Confidential – Do Not Distribute

Data Processing Categories in Big Data Use Cases!

Known   Unknown  

Ques.ons  known?  

Data  

Velocity  

Sta.c  

Streaming  

Batch    Processing  

Stream    Processing  

Ad-­‐hoc    Query  

N/A  

Page 5: Simplifying Big Data Analytics: Unifying Batch and … · Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli, ! VP Product! In-Memory Compute Summit!

© 2015 DataTorrent Confidential – Do Not Distribute

Transactional Data!Web Click Stream!Mobile Devices!Operational Log Files!Public Data!Sensor Data!!

Data Sources!

Page 6: Simplifying Big Data Analytics: Unifying Batch and … · Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli, ! VP Product! In-Memory Compute Summit!

© 2015 DataTorrent Confidential – Do Not Distribute

Real-Time Advertising!Customer Service !Operational !Fraud Detection!Predictive Maintenance!

Customer Uses!

Page 7: Simplifying Big Data Analytics: Unifying Batch and … · Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli, ! VP Product! In-Memory Compute Summit!

© 2015 DataTorrent Confidential – Do Not Distribute

Processing Data In Motion!

Ingest !Archive !

Transform Normalize !

Analyze Business Logic!

Alert !Action !

Visualize !Persist !

Page 8: Simplifying Big Data Analytics: Unifying Batch and … · Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli, ! VP Product! In-Memory Compute Summit!

© 2015 DataTorrent Confidential – Do Not Distribute

Online advertising dynamic inventory purchases!

High volume auto-scaling fault tolerant event stream.

Dimensional computing to identify performing ads.!Ad Server 1!

Ad Server 800!

Real-time Dashboard!

Ad Placement Strategy!

Oracle DB!

Fault-Tolerant Flume!

In-memory analytic cube!

Campaign Analysis!

Page 9: Simplifying Big Data Analytics: Unifying Batch and … · Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli, ! VP Product! In-Memory Compute Summit!

© 2015 DataTorrent Confidential – Do Not Distribute

SmartGrid Connected Home!

Smart Grid provider with many partners has heterogeneous network sources, provides analytics to utilities

& customers and provide ISV platform!

SmartGrid Sensors!

Home Sensor(s)!

Enrichment Data!

Consumer Energy Audit!

ISV Applications!

Operational Safety/Costs!

Normalization! Analytics! Alert on error!

Tableau!Visualizations!

Page 10: Simplifying Big Data Analytics: Unifying Batch and … · Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli, ! VP Product! In-Memory Compute Summit!

© 2015 DataTorrent Confidential – Do Not Distribute

Batch Data!

Customer Information!

Historical Sales!

Support Data!

Product Configuration!

Corporate Info!

Page 11: Simplifying Big Data Analytics: Unifying Batch and … · Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli, ! VP Product! In-Memory Compute Summit!

© 2015 DataTorrent Confidential – Do Not Distribute

Batch Processing Data!

Ingest !Archive !

Transform Normalize !

Visualize !Persist !

Analyze Business Logic!

Alert !Action !

Page 12: Simplifying Big Data Analytics: Unifying Batch and … · Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli, ! VP Product! In-Memory Compute Summit!

© 2015 DataTorrent Confidential – Do Not Distribute

The Enterprise Data Processing Problem!

ETL!

Business Analytics!

ETL!

Complex Event/!Event Streaming!

BI & Analytics! Platform!

Ingest !Archive !

Transform Normalize !

Analyze Business Logic!

Alert !Action !

Visualize !Persist !

Page 13: Simplifying Big Data Analytics: Unifying Batch and … · Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli, ! VP Product! In-Memory Compute Summit!

© 2015 DataTorrent Confidential – Do Not Distribute

The Enterprise Data Processing Problem!

ETL!

Business Analytics!

ETL!

Complex Event/!Event Streaming!

BI & Analytics! Platform!

Ingest !Archive !

Transform Normalize !

Analyze Business Logic!

Alert !Action !

Visualize !Persist !

Page 14: Simplifying Big Data Analytics: Unifying Batch and … · Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli, ! VP Product! In-Memory Compute Summit!

© 2015 DataTorrent Confidential – Do Not Distribute

The Enterprise Data Processing Problem!

ETL!

Business Analytics!

ETL!

Complex Event/!Event Streaming!

BI & Analytics! Platform!

Ingest !Archive !

Transform Normalize !

Analyze Business Logic!

Alert !Action !

Visualize !Persist !

Transformation team - Parse, Dedup, Transform, Encrypt !

Transmission team - Credit, Debit, ACH over Secure FTP !

Distribution team - Hadoop, MPP, DB !

Reports team – !Dashboards & Alerts!

●  Separate applications for each step in the end to end process.!o  4 to 5 batch jobs to complete the process end to end!o  1 to 2 runs a day. So typical time to value is around 12 hours!

Page 15: Simplifying Big Data Analytics: Unifying Batch and … · Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli, ! VP Product! In-Memory Compute Summit!

© 2015 DataTorrent Confidential – Do Not Distribute

Financial services big data fabric!

Secure, fault tolerant, data ingestion, formatting & archiving. Data access layer for application

processing!

Financial Data!

SMTP Logs!

Historical!

Application n!

Application 1!

Persistent!

Encrypt! Compliance! Alert on error!

Archive!

Page 16: Simplifying Big Data Analytics: Unifying Batch and … · Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli, ! VP Product! In-Memory Compute Summit!

© 2015 DataTorrent Confidential – Do Not Distribute

Satellite Television Provider!

Automated, faster time to insight, driving accurate payment,

auditing and business planning!

Audit Reporting!Rate Rules!

Package Data!

Subscription!

Payment System!

Join! Data Prep! Data Compliance!

Dated Archive!

Rate Rules!

Package Data!

Subscription!Business Projection!

Dated Archive!

Page 17: Simplifying Big Data Analytics: Unifying Batch and … · Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli, ! VP Product! In-Memory Compute Summit!

© 2015 DataTorrent Confidential – Do Not Distribute

Ingestion & Distribution for Hadoop!        Graphical Application Assembly!        Real-Time Data Visualization!

Re-Usable Java Operator Library!

Scalable, High Performance, Fault Tolerant In-Memory Data Processing Platform!

Hadoop 2.0 —YARN + HDFS!

Physical Virtual Cloud!

Ingest !Archive !

Transform Normalize !

Analyze Business Logic!

Alert !Action !

Visualize !Persist !

Man

agem

ent &

Mon

itorin

g !

DataTorrent RTS Architectural Overview!

Re-Usable Java Operator Library!

Scalable, High Performance, Fault Tolerant In-Memory Data Processing Platform!

Page 18: Simplifying Big Data Analytics: Unifying Batch and … · Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli, ! VP Product! In-Memory Compute Summit!

© 2015 DataTorrent Confidential – Do Not Distribute

DataTorrent - Project Apex!•  Industry’s first open source enterprise-class unified stream and !

batch processing platform!•  DataTorrent RTS 3 Core engine!•  Key features requested by open source developer community!ᵒ  Event processing guarantees!ᵒ  In-memory performance & scalability!ᵒ  Fault tolerance and state management!ᵒ  Native rolling and static window support!ᵒ  Hadoop-native YARN & HDFS implementation!

•  Apache 2.0 License!•  Complemented by open source Malhar operator library!

https://www.datatorrent.com/product/project-apex/!

Page 19: Simplifying Big Data Analytics: Unifying Batch and … · Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli, ! VP Product! In-Memory Compute Summit!

DataTorrent enables enterprises to process data in motion and take action in real-time !

Trend: Batch and streaming use cases!

Faster Time to Insight!Faster Time to Action!

Page 20: Simplifying Big Data Analytics: Unifying Batch and … · Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli, ! VP Product! In-Memory Compute Summit!

Big Data. Big Actions. Now.!