simplifying big data analytics: unifying batch and … · simplifying big data analytics: unifying...

Click here to load reader

Post on 13-May-2018

216 views

Category:

Documents

1 download

Embed Size (px)

TRANSCRIPT

  • Simplifying Big Data Analytics:Unifying Batch and Stream ProcessingJohn Fanelli, !VP Product!In-Memory Compute Summit!June 30, 2015!!

  • 2015 DataTorrent Confidential Do Not Distribute

    S S S

    B B B

    D G GG D D D

    Streaming Analy.cs

    General-purpose data processing cluster

    Scale-up Database

    Data And Compute Grid

    Clustered Database

  • DataTorrent enables enterprises to process data in motion and take action in real-time !

    Trend: Batch and streaming use cases!

    Faster Time to Insight!Faster Time to Action!

  • 2015 DataTorrent Confidential Do Not Distribute

    Data Processing Categories in Big Data Use Cases!

    Known Unknown

    Ques.ons known?

    Data

    Velocity

    Sta.c

    Streaming

    Batch Processing

    Stream Processing

    Ad-hoc Query

    N/A

  • 2015 DataTorrent Confidential Do Not Distribute

    Transactional Data!Web Click Stream!Mobile Devices!Operational Log Files!Public Data!Sensor Data!!

    Data Sources!

  • 2015 DataTorrent Confidential Do Not Distribute

    Real-Time Advertising!Customer Service !Operational !Fraud Detection!Predictive Maintenance!

    Customer Uses!

  • 2015 DataTorrent Confidential Do Not Distribute

    Processing Data In Motion!

    Ingest !Archive !

    Transform Normalize !

    Analyze Business Logic!

    Alert !Action !

    Visualize !Persist !

  • 2015 DataTorrent Confidential Do Not Distribute

    Online advertising dynamic inventory purchases!

    High volume auto-scaling fault tolerant event stream.

    Dimensional computing to identify performing ads.!Ad Server 1!

    Ad Server 800!

    Real-time Dashboard!

    Ad Placement Strategy!

    Oracle DB!

    Fault-Tolerant Flume!

    In-memory analytic cube!

    Campaign Analysis!

  • 2015 DataTorrent Confidential Do Not Distribute

    SmartGrid Connected Home!

    Smart Grid provider with many partners has heterogeneous network sources, provides analytics to utilities

    & customers and provide ISV platform!

    SmartGrid Sensors!

    Home Sensor(s)!

    Enrichment Data!

    Consumer Energy Audit!

    ISV Applications!

    Operational Safety/Costs!

    Normalization! Analytics! Alert on error!

    Tableau!Visualizations!

  • 2015 DataTorrent Confidential Do Not Distribute

    Batch Data!

    Customer Information!

    Historical Sales!

    Support Data!

    Product Configuration!

    Corporate Info!

  • 2015 DataTorrent Confidential Do Not Distribute

    Batch Processing Data!

    Ingest !Archive !

    Transform Normalize !

    Visualize !Persist !

    Analyze Business Logic!

    Alert !Action !

  • 2015 DataTorrent Confidential Do Not Distribute

    The Enterprise Data Processing Problem!

    ETL!

    Business Analytics!

    ETL!

    Complex Event/!Event Streaming!

    BI & Analytics! Platform!

    Ingest !Archive !

    Transform Normalize !

    Analyze Business Logic!

    Alert !Action !

    Visualize !Persist !

  • 2015 DataTorrent Confidential Do Not Distribute

    The Enterprise Data Processing Problem!

    ETL!

    Business Analytics!

    ETL!

    Complex Event/!Event Streaming!

    BI & Analytics! Platform!

    Ingest !Archive !

    Transform Normalize !

    Analyze Business Logic!

    Alert !Action !

    Visualize !Persist !

  • 2015 DataTorrent Confidential Do Not Distribute

    The Enterprise Data Processing Problem!

    ETL!

    Business Analytics!

    ETL!

    Complex Event/!Event Streaming!

    BI & Analytics! Platform!

    Ingest !Archive !

    Transform Normalize !

    Analyze Business Logic!

    Alert !Action !

    Visualize !Persist !

    Transformation team - Parse, Dedup, Transform, Encrypt !

    Transmission team - Credit, Debit, ACH over Secure FTP !

    Distribution team - Hadoop, MPP, DB !

    Reports team !Dashboards & Alerts!

    Separate applications for each step in the end to end process.!o 4 to 5 batch jobs to complete the process end to end!o 1 to 2 runs a day. So typical time to value is around 12 hours!

  • 2015 DataTorrent Confidential Do Not Distribute

    Financial services big data fabric!

    Secure, fault tolerant, data ingestion, formatting & archiving. Data access layer for application

    processing!

    Financial Data!

    SMTP Logs!

    Historical!

    Application n!

    Application 1!

    Persistent!

    Encrypt! Compliance! Alert on error!

    Archive!

  • 2015 DataTorrent Confidential Do Not Distribute

    Satellite Television Provider!

    Automated, faster time to insight, driving accurate payment,

    auditing and business planning!

    Audit Reporting!Rate Rules!

    Package Data!

    Subscription!

    Payment System!

    Join! Data Prep! Data Compliance!

    Dated Archive!

    Rate Rules!

    Package Data!

    Subscription!Business Projection!

    Dated Archive!

  • 2015 DataTorrent Confidential Do Not Distribute

    Ingestion & Distribution for Hadoop! Graphical Application Assembly! Real-Time Data Visualization!

    Re-Usable Java Operator Library!

    Scalable, High Performance, Fault Tolerant In-Memory Data Processing Platform!

    Hadoop 2.0 YARN + HDFS!

    Physical Virtual Cloud!

    Ingest !Archive !

    Transform Normalize !

    Analyze Business Logic!

    Alert !Action !

    Visualize !Persist !

    Man

    agem

    ent &

    Mon

    itorin

    g !

    DataTorrent RTS Architectural Overview!

    Re-Usable Java Operator Library!

    Scalable, High Performance, Fault Tolerant In-Memory Data Processing Platform!

  • 2015 DataTorrent Confidential Do Not Distribute

    DataTorrent - Project Apex! Industrys first open source enterprise-class unified stream and !

    batch processing platform! DataTorrent RTS 3 Core engine! Key features requested by open source developer community! Event processing guarantees! In-memory performance & scalability! Fault tolerance and state management! Native rolling and static window support! Hadoop-native YARN & HDFS implementation!

    Apache 2.0 License! Complemented by open source Malhar operator library!

    https://www.datatorrent.com/product/project-apex/!

  • DataTorrent enables enterprises to process data in motion and take action in real-time !

    Trend: Batch and streaming use cases!

    Faster Time to Insight!Faster Time to Action!

  • Big Data. Big Actions. Now.!