imcsummit 2015 - day 2 general session - simplifying big data analytics: unifying batch and stream...

20
Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli, VP Product In-Memory Compute Summit June 30, 2015

Upload: 2015-in-memory-computing-summit

Post on 17-Aug-2015

379 views

Category:

Technology


0 download

TRANSCRIPT

Simplifying Big Data Analytics:Unifying Batch and Stream ProcessingJohn Fanelli, !VP Product!In-Memory Compute Summit!June 30, 2015!!

© 2015 DataTorrent Confidential – Do Not Distribute

S S S

B   B   B  

D   G GG D  D  D  

Streaming  Analy.cs  

General-­‐purpose  data  processing  cluster  

Scale-­‐up  Database  

Data  And  Compute  Grid  

Clustered  Database  

DataTorrent enables enterprises to process data in motion and take action in real-time !

Trend: Batch and streaming use cases!

Faster Time to Insight!Faster Time to Action!

© 2015 DataTorrent Confidential – Do Not Distribute

Data Processing Categories in Big Data Use Cases!

Known   Unknown  

Ques.ons  known?  

Data  

Velocity  

Sta.c  

Streaming  

Batch    Processing  

Stream    Processing  

Ad-­‐hoc    Query  

N/A  

© 2015 DataTorrent Confidential – Do Not Distribute

Transactional Data!Web Click Stream!Mobile Devices!Operational Log Files!Public Data!Sensor Data!!

Data Sources!

© 2015 DataTorrent Confidential – Do Not Distribute

Real-Time Advertising!Customer Service !Operational !Fraud Detection!Predictive Maintenance!

Customer Uses!

© 2015 DataTorrent Confidential – Do Not Distribute

Processing Data In Motion!

Ingest !Archive !

Transform Normalize !

Analyze Business Logic!

Alert !Action !

Visualize !Persist !

© 2015 DataTorrent Confidential – Do Not Distribute

Online advertising dynamic inventory purchases!

High volume auto-scaling fault tolerant event stream.

Dimensional computing to identify performing ads.!Ad Server 1!

Ad Server 800!

Real-time Dashboard!

Ad Placement Strategy!

Oracle DB!

Fault-Tolerant Flume!

In-memory analytic cube!

Campaign Analysis!

© 2015 DataTorrent Confidential – Do Not Distribute

SmartGrid Connected Home!

Smart Grid provider with many partners has heterogeneous network sources, provides analytics to utilities

& customers and provide ISV platform!

SmartGrid Sensors!

Home Sensor(s)!

Enrichment Data!

Consumer Energy Audit!

ISV Applications!

Operational Safety/Costs!

Normalization! Analytics! Alert on error!

Tableau!Visualizations!

© 2015 DataTorrent Confidential – Do Not Distribute

Batch Data!

Customer Information!

Historical Sales!

Support Data!

Product Configuration!

Corporate Info!

© 2015 DataTorrent Confidential – Do Not Distribute

Batch Processing Data!

Ingest !Archive !

Transform Normalize !

Visualize !Persist !

Analyze Business Logic!

Alert !Action !

© 2015 DataTorrent Confidential – Do Not Distribute

The Enterprise Data Processing Problem!

ETL!

Business Analytics!

ETL!

Complex Event/!Event Streaming!

BI & Analytics! Platform!

Ingest !Archive !

Transform Normalize !

Analyze Business Logic!

Alert !Action !

Visualize !Persist !

© 2015 DataTorrent Confidential – Do Not Distribute

The Enterprise Data Processing Problem!

ETL!

Business Analytics!

ETL!

Complex Event/!Event Streaming!

BI & Analytics! Platform!

Ingest !Archive !

Transform Normalize !

Analyze Business Logic!

Alert !Action !

Visualize !Persist !

© 2015 DataTorrent Confidential – Do Not Distribute

The Enterprise Data Processing Problem!

ETL!

Business Analytics!

ETL!

Complex Event/!Event Streaming!

BI & Analytics! Platform!

Ingest !Archive !

Transform Normalize !

Analyze Business Logic!

Alert !Action !

Visualize !Persist !

Transformation team - Parse, Dedup, Transform, Encrypt !

Transmission team - Credit, Debit, ACH over Secure FTP !

Distribution team - Hadoop, MPP, DB !

Reports team – !Dashboards & Alerts!

●  Separate applications for each step in the end to end process.!o  4 to 5 batch jobs to complete the process end to end!o  1 to 2 runs a day. So typical time to value is around 12 hours!

© 2015 DataTorrent Confidential – Do Not Distribute

Financial services big data fabric!

Secure, fault tolerant, data ingestion, formatting & archiving. Data access layer for application

processing!

Financial Data!

SMTP Logs!

Historical!

Application n!

Application 1!

Persistent!

Encrypt! Compliance! Alert on error!

Archive!

© 2015 DataTorrent Confidential – Do Not Distribute

Satellite Television Provider!

Automated, faster time to insight, driving accurate payment,

auditing and business planning!

Audit Reporting!Rate Rules!

Package Data!

Subscription!

Payment System!

Join! Data Prep! Data Compliance!

Dated Archive!

Rate Rules!

Package Data!

Subscription!Business Projection!

Dated Archive!

© 2015 DataTorrent Confidential – Do Not Distribute

Ingestion & Distribution for Hadoop!        Graphical Application Assembly!        Real-Time Data Visualization!

Re-Usable Java Operator Library!

Scalable, High Performance, Fault Tolerant In-Memory Data Processing Platform!

Hadoop 2.0 —YARN + HDFS!

Physical Virtual Cloud!

Ingest !Archive !

Transform Normalize !

Analyze Business Logic!

Alert !Action !

Visualize !Persist !

Man

agem

ent &

Mon

itorin

g !

DataTorrent RTS Architectural Overview!

Re-Usable Java Operator Library!

Scalable, High Performance, Fault Tolerant In-Memory Data Processing Platform!

© 2015 DataTorrent Confidential – Do Not Distribute

DataTorrent - Project Apex!•  Industry’s first open source enterprise-class unified stream and !

batch processing platform!•  DataTorrent RTS 3 Core engine!•  Key features requested by open source developer community!ᵒ  Event processing guarantees!ᵒ  In-memory performance & scalability!ᵒ  Fault tolerance and state management!ᵒ  Native rolling and static window support!ᵒ  Hadoop-native YARN & HDFS implementation!

•  Apache 2.0 License!•  Complemented by open source Malhar operator library!

https://www.datatorrent.com/product/project-apex/!

DataTorrent enables enterprises to process data in motion and take action in real-time !

Trend: Batch and streaming use cases!

Faster Time to Insight!Faster Time to Action!

Big Data. Big Actions. Now.!