introduction to big data analytics: batch, real-time, and the best of both worlds

of 28/28
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Worlds Srinath Perera Director, Research, WSO2 Inc. Visiting Faculty, University of Moratuwa Member, Apache Software Foundation Research Scientist, Lanka Software Foundation

Post on 14-Jul-2015

2.953 views

Category:

Data & Analytics

0 download

Embed Size (px)

TRANSCRIPT

  • Introduction to Big Data Analytics: Batch,

    Real-Time, and the Best of Both Worlds

    Srinath Perera Director, Research, WSO2 Inc.

    Visiting Faculty, University of Moratuwa Member, Apache Software Foundation

    Research Scientist, Lanka Software Foundation

  • What can We do with Big Data? Optimize (World is inefficient)

    o 30% food wasted farm to plate o GE 1% initiative (http://goo.gl/eYC0QE )

    - 1% saving in trains can save 2B/ year - 1% in US healthcare is 20B/ year - In contrast, Sri Lanka total exports 9B/ year.

    Save lives o Weather, Disease identification,

    Personalized treatment

    Technology advancement o Most high tech research are done via

    simulations

  • Big Data Architecture

  • Big data Processing Technologies

  • WSO2 Analy+cs Pla/orm

  • Big Data Analy+cs Oering

  • 8

    Combined Power

    Users can send events to both BAM and CEP via the same APIs

    CEP can combine output from batch Processing and data from various storage (e.g. databases) with real-time processing o e.g. Implementing Lambda

    Architecture

  • 9Highly Pluggable Architecture

  • WSO2 CEP

  • WSO2 BAM

    Powered by Apache Hadoop with management and queries using Apache Hive

    Parallel, distributed processing based on the MapReduce programming model

    Runs on local Hadoop node or can be delegated to a cluster of Hadoop nodes

    Scalable script-based analyAcs wriBen using an easy-to-learn, SQL-like query language.

    Analyzer Engine

    Hadoop Cluster Data Store(Cassandra/

    RDBMS)

  • 12

    High Level Languages For both batch and real-time, we provide

    structured , SQL-like query languages. o No Java programming is required

    Lowers the adoption entry point BAM o Relies on Apache Hive

    CEP o Implemented though our own solution, Siddhi.

  • 13

    Event table:(Map a database as an event stream)

    Filter: (Process single transacAon)

    Windows:(Track a window of events)

    CEP Operators with Siddhi

    define stream RequestStream ( correlationID string, serviceID string,userID string, tear string, requestTime long, ... ) ;

    define table BlacklistedUserTable(userID string,time long,requestCount long);

    from RequestStream[tear==BRONZE]#window.time(1 min)

    select userID, requestTime as time, count(correlationID) as requestCount

    group by userID having up requestCount > 5 insert into BlacklistedUserTable ;

  • 14

    Smart Home DEBS (Distributed Event Based Systems) is a

    premier academic conference, which post yearly event processing challenge (http://www.cse.iitb.ac.in/debs2014/?page_id=42)

    Smart Home electricity data: 2000 sensors, 40 houses, 4 Billion events

    We posted fastest single node solution measured (400K events/sec) and close to one million distributed throughput.

    WSO2 CEP based solution is one of the four finalists (with Dresden University of Technology, Fraunhofer Institute, and Imperial College London)

    Only generic solution to become a finalist

  • 15

    Healthcare Data Monitoring

    Allows to search/visualize/analyze healthcare records (HL7) across 20 hospitals in Italy

    Used in combination with WSO2 ESB and BAM Custom toolbox tailored to customers requirement

    ( to replace existing system)

  • 16

    Cloud IDE Analytics

    Custom solution created in partnership with Codenvy to bring analytics to Codenvy management team and its customers

    Developed in less than a month, with a custom plug-in to MongoDB.

    Deployed in the codenvy.com platform.

  • 17

    Watch at: https://www.youtube.com/watch?v=nRI6buQ0NOM

    Case Study: Realtime Soccer Analysis

  • 18

    Additional Customers Use Cases Used in Healthcare, Parking Monitoring (see Solution patterns based

    approach to rapidly create IoE solutions across industries, o http://us14.wso2con.com/videos/#Coumara-Radja

    Used by a Large Scale IoT System Provider for use cases including Vehicle tracking, Smart City, Building Monitoring (CEP) o See Internet of Big Things: The Story of Pacific Controls,

    http://us14.wso2con.com/videos/#Sajaad-Chaudry Transaction Monitoring in a Large Bank (CEP) Knowledge Mining and tracking Prospective Customers through Natural

    Language data sources (CEP) CEP Embedded in edge Devices o See WSO2Con 2013 - Keynote:Emerging Foundations of Next-

    Generation Business Systems https://www.youtube.com/watch?v=7CyG3JKUxWw

    Throttling and Anomaly Detection by Group of Telecom Companies

  • 19

    Extensions and Toolboxes Fraud and Anomaly Detection Toolbox - ( Static Rules, Statistical

    outliers, Markov Chains) Time Series Toolbox Natural Language Processing Plugin (Entity Extraction, POS tagging,

    Sentiment analysis) GIS Toolbox (Geo Fencing, Tracking, Speed Alarms) Running machine learning models exported as PMML with CEP (e.g.

    from R) Video Monitoring with OpenCV For more info,

    http://wso2.com/library/articles/2014/08/wso2-cep-in-action-an-analysis-of-use-in-real-world-applications-of-different-domains/

  • 20

    Geo Fencing and Tracking Toolbox

  • 21

    SolidCon Demo - hBp://wso2.com/library/arAcles/2014/09/demonstraAon-on-architecture-of-internet-of-things-an-analysis/

    IoT Demos and Use Cases

    IOT Reference Architecture, http://wso2.com/landing/internet-of-things-uk-2014/

    Internet of Big Things: The Story of Pacific Controls, http://us14.wso2con.com/videos/#Sajaad-Chaudry

    Federated Identity for IoT with OAuth, http://www.infoq.com/presentations/federated-identity-IoT-OAuth

  • 22

    Analyzing senAments for FIFA twiBer hashtag

    Sentimental Analysis Demo

  • Work in Progress

  • 24

    Predictive Analytics

  • 25

    Leveraging Apache Storm in CEP

  • 26

    BAM Enhancements Work underway to Switch to Apache

    Spark and Shark SQL like Queries support in BAM o Faster Queries o Keeping SQL like language

    Use Hive on Spark for migration purposes

    Lower the adoption point of BAM by packaging by default an RDBMS instead of Cassandra. o Architecture already scales from small

    deployments to BigData

  • Questions?

  • 28

    Business Model