view, act, and react: shaping business activity with analytics, bigdata queries, and complex event...
DESCRIPTION
Sun Tzu said “if you know your enemies and know yourself, you can win a hundred battles without a single loss.” Those words have never been truer than in our time. We are faced with an avalanche of data. Many believe the ability to process and gain insights from a vast array of available data will be the primary competitive advantage for organizations in the years to come. To make sense of data, you will have to face many challenges: how to collect, how to store, how to process, and how to react fast. Although you can build these systems from bottom up, it is a significant problem. There are many technologies, both open source and proprietary, that you can put together to build your analytics solution, which will likely save you effort and provide a better solution. In this session, Srinath will discuss WSO2’s middleware offering in BigData and explain how you can put them together to build a solution that will make sense of your data. The session will cover technologies like thrift for collecting data, Cassandra for storing data, Hadoop for analyzing data in batch mode, and Complex event processing for analyzing data real time.TRANSCRIPT
View, Act, and React: Shaping Business Activity with Analytics,
BigData Queries, and Complex Event Processing
Srinath PereraDirector, Research
WSO2
Start
• 1942, Asimov wrote a book called Foundation, in which the character Hari Seldon use mathematical models to predict the future of civilization and then to save it.
• Paul Krugman,( the Nobel Laureate in Economics), said his interest in economic begin with foundation.
• We are entering that Era of our history where Mr. Asimov might have a point.
Image cedit, CC licence, http://ansem315.deviantart.com/art/Asimov-
Foundation-395188263
Consider a Day in your Life• What is the best road to take?• Would there be any bad
weather?• What is the best way to invest
the money?• Should I take that loan?• Is there a way to do this faster?• What others did in similar
cases?• Which product should I buy?
Bigdata Landscape
Big Data Architecture
Why it is hard?• System build of many
computers (1000 nodes to store 1PB with 1TB each)
• That handles lots of data (10Gb network => 83 days to copy 1PB)
• Running complex logic (models can be complex as the system)
• This pushes us to the frontier of Distributed Systems and Databases http://www.flickr.com/photos/mariachily/5250487136,
Licensed CC
Big Data Architecture with WSO2
Event Streams• We view the world as event
streams Event stream is series of events over time
• We use SQL like languages (Hive/ CEP) to process event streams and create new event streams
{'name':'PlayStream','version':'1.0.0', 'payloadData':[
'name':'sid', 'ts':'BIGINT','x':'DOUBLE',...
]}
Each stream has a name
Each event has attributes, that has
types
Select from PlayStream[x>2500 and .. ]İnsert into NearGoalStream
Demo Usecase (DEBS 2013)• Football game, players and ball
has sensors (DESB Challenge 2013)
• sid, ts, x,y,z, v,a• Use cases: Running analysis,
Ball Possession and Shots on Goal, Heatmap of Activity
• Siddhi did 100K+ on each usecase
• For this talk, we will look at user activity by region of the field.
Demo High-level Architecture
Data Collection• Can receive
events via SOAP, HTTP, JMS, ..
• WSO2 Events is highly optimized version (400K events TPS)
• Default Agents and you can write custom agents.
Agent agent = new Agent(agentConfiguration);publisher = new AsyncDataPublisher(
"tcp://localhost:7612", .. );
StreamDefinition definition = new
StreamDefinition(STREAM_NAME,VERSION);
definition.addPayloadData("sid", STRING);... publisher.addStreamDefinition(definition);... Event event = new Event();event.setPayloadData(eventData);publisher.publish(STREAM_NAME, VERSION, event);
Business Activity Monitor
BAM Hive QueryFind how much time spent in each cell.
CREATE EXTERNAL TABLE IF NOT EXISTS PlayStream …select sid,
ceiling((y+33000)*7/10000 + x/10000) as cell, count(sid) from PlayStream GROUP BY sid, ceiling((y+33000)*7/10000 + x/10000);
Complex Event Processor
CEP Querydefine partition sidPrt by PlayStream.sid, LocBySecStream.sid
from PlayStream#window.timeBatch(1sec) select sid, avg(x) as xMean, avg(y) as yMean, avg(z) as zMean insert into LocBySecStream partition by sidPrt
from every e1 = LocBySecStream -> e2 = LocBySecStream [e1.yMean + 10000 > yMean
or yMean + 10000 > e1.yMean] within 2sec select e1.sid insert into LongAdvStream partition by sidPrt ;
Calculate the mean location of each player every
second
Detect more than 10m run
Run Demo
Visualization
Conclusion
Thank You