wso2 big data platform and applications
TRANSCRIPT
WSO2 Big Data
Platform and
Applications
Srinath Perera
Director, Research, WSO2 Inc.
Visiting Faculty, University of Moratuwa
Member, Apache Software Foundation
Research Scientist, Lanka Software Foundation
What can We do with Big Data?
Optimize (World is inefficient)o 30% food wasted farm to plate
o GE 1% initiative (http://goo.gl/eYC0QE )- 1% saving in trains can save 2B/ year
- 1% in US healthcare is 20B/ year
- In contrast, Sri Lanka total exports 9B/ year.
Save lives o Weather, Disease identification,
Personalized treatment
Technology advancemento Most high tech research are done via
simulations
Big Data Architecture
Big data Processing Technologies
WSO2 Analytics Platform
Big Data Analytics Offering
8
Combined Power
Users can send
events to both BAM
and CEP via the
same APIs
CEP can combine
output from batch
Processing and data
from various storage
(e.g. databases) with
real-time processing
o e.g. Implementing Lambda
Architecture
9
Highly Pluggable Architecture
WSO2 CEP
WSO2 BAM
● Powered by Apache Hadoop with management and queries using Apache Hive
● Parallel, distributed processing based on the MapReduce programming model
● Runs on local Hadoop node or can be delegated to a cluster of Hadoop nodes
● Scalable script-based analytics written using an easy-to-learn, SQL-like query language.
Analyzer Engine
Hadoop Cluster
Data Store(Cassandra/
RDBMS)
12
High Level Languages
For both batch and real-time, we provide
structured , SQL-like query languages.o No Java programming is required
Lowers the adoption entry point
BAMo Relies on Apache Hive
CEPo Implemented though our own solution, Siddhi.
13
Event table:(Map a database as an event stream)
Filter: (Process single transaction)
Windows:(Track a window of events)
CEP Operators with Siddhi
define stream RequestStream ( correlationID string, serviceID
string,userID string, tear string, requestTime long, ... ) ;
define table BlacklistedUserTable(userID string,time long,requestCount
long);
from RequestStream[tear==‘BRONZE’]#window.time(1 min)
select userID, requestTime as time, count(correlationID) as
requestCount
group by userID
having up requestCount > 5
insert into BlacklistedUserTable ;
14
Smart Home
DEBS (Distributed Event Based Systems) is a
premier academic conference, which post
yearly event processing challenge
(http://www.cse.iitb.ac.in/debs2014/?page_id=
42)
Smart Home electricity data: 2000 sensors, 40
houses, 4 Billion events
We posted fastest single node solution
measured (400K events/sec) and close to one
million distributed throughput.
WSO2 CEP based solution is one of the four
finalists (with Dresden University of
Technology, Fraunhofer Institute, and Imperial
College London)
Only generic solution to become a finalist
15
Healthcare Data Monitoring
Allows to search/visualize/analyze healthcare
records (HL7) across 20 hospitals in Italy
Used in combination with WSO2 ESB and BAM
Custom toolbox tailored to customer’s requirement
( to replace existing system)
16
Cloud IDE Analytics
Custom solution created in partnership
with Codenvy to bring analytics to
Codenvy management team and its
customers
Developed in less than a month, with a
custom plug-in to MongoDB.
Deployed in the codenvy.com platform.
17
Watch at: https://www.youtube.com/watch?v=nRI6buQ0NOM
Case Study: Realtime Soccer Analysis
18
Additional Customers Use Cases
Used in Healthcare, Parking Monitoring (see Solution patterns based
approach to rapidly create IoE solutions across industries,
o http://us14.wso2con.com/videos/#Coumara-Radja
Used by a Large Scale IoT System Provider for use cases including Vehicle
tracking, Smart City, Building Monitoring (CEP)
o See “Internet of Big Things: The Story of Pacific Controls,
http://us14.wso2con.com/videos/#Sajaad-Chaudry”
Transaction Monitoring in a Large Bank (CEP)
Knowledge Mining and tracking Prospective Customers through Natural
Language data sources (CEP)
CEP Embedded in edge Devices
o See WSO2Con 2013 - Keynote:Emerging Foundations of Next-
Generation Business Systems
https://www.youtube.com/watch?v=7CyG3JKUxWw
Throttling and Anomaly Detection by Group of Telecom Companies
19
Extensions and Toolboxes
Fraud and Anomaly Detection Toolbox - ( Static Rules, Statistical
outliers, Markov Chains)
Time Series Toolbox
Natural Language Processing Plugin (Entity Extraction, POS tagging,
Sentiment analysis)
GIS Toolbox (Geo Fencing, Tracking, Speed Alarms)
Running machine learning models exported as PMML with CEP (e.g.
from R)
Video Monitoring with OpenCV
For more info, http://wso2.com/library/articles/2014/08/wso2-cep-in-
action-an-analysis-of-use-in-real-world-applications-of-different-
domains/
20
Geo Fencing and Tracking Toolbox
21
SolidCon Demo -http://wso2.com/library/articles/2014/09/demonstration-on-architecture-of-internet-of-things-an-analysis/
IoT Demos and Use Cases
IOT Reference Architecture,
http://wso2.com/landing/internet-of-
things-uk-2014/
Internet of Big Things: The Story of
Pacific Controls,
http://us14.wso2con.com/videos/#Saj
aad-Chaudry
Federated Identity for IoT with
OAuth,
http://www.infoq.com/presentations/f
ederated-identity-IoT-OAuth
22
Analyzing sentiments for FIFA twitter hashtag
Sentimental Analysis Demo
Work in Progress
24
Predictive Analytics
25
Leveraging Apache Storm in CEP
26
BAM Enhancements
Work underway to Switch to Apache
Spark and Shark SQL like Queries
support in BAMo Faster Queries
o Keeping SQL like language
Use “Hive on Spark” for migration
purposes
Lower the adoption point of BAM by
packaging by default an RDBMS instead
of Cassandra.o Architecture already scales from small
deployments to BigData
Questions?
28
Business Model