Download - Real time analytics case study
Real Time Analytics – Big Data
Case Study
1
Agenda
Big Data
Real Time Analytics
Why is it needed?
Case Study – Telecom Industry
2 Impetus Confidential
Big Data & Hadoop
3 Impetus Confidential
What is Big Data?
Three dimensions of Big Data
• Volume
o Gathering/collecting over terabytes of information
• Velocity
o Analyzing million of trade events generated per day
• Variety
o Structured or unstructured data like text, sensor data, click
streams, audio, video and log files
4 Impetus Confidential
Big Data
Data is the key to Business, it could be used for
• User behavior analysis
• Ad targeting
• Trending topics
• Recommendations
How ?
• Hadoop is the de-facto for batch processing data analytics
o Provides parallel computation framework (Map Reduce)
o Redundant, fault tolerant data storage
o Designed to reliably store data using commodity machine
o Designed keeping in mind hardware failures
• Based on Google’s GFS and Map Reduce implementation
• Real Time Analytics? - NO
5 Impetus Confidential
Real Time Analytics
6 Impetus Confidential
What is Real Time Analytics?
What is it?
• Real-time analytics is a process of delivering information
about events as they occur
Some Examples
• Financial Industry - Fraud Detection, Trading
• E-commerce - Recommendations
• Telecom Industry - Machine to Machine communication
• Supply Chain Management
• Business Activity Monitoring
7 Impetus Confidential
Why is it needed ?
Time is money
• Inter-day risk analysis in real time could translate into
increased profits
Helps organizations to stay ahead of competition
• E-commerce – throwing information based on what a user is
browsing or interested in could help in better sales and
experience
• Content creator could produce relevant and quality content
8 Impetus Confidential
Case Study –
Telecommunication Industry
9 Impetus Confidential
The Company, Challenge & Benefits
10 Impetus Proprietary
Company
• Telecom firm providing wireless
network service designed to deliver
Machine to Machine communications
to millions of device.
Challenge
• Design a Near Real Time solution
for predicting patterns based on
data generated by Machine-to-
Machine (M2M) communication
and sent over wireless network.
• Solution should be able to support
addition of near real time streams
without much of a change.
• Enable customer to get real time
alerts for business critical
situations
Benefits
• Enabled customers to react to their critical business needs in real time.
• Improved Customer Experience.
• Reduced operating cost.
Examples
Machine to Machine Communication
• Vineyards watering
o Spread over huge area
o Critical to maintain water level threshold
• Vehicle Tracking & Geo-fencing
o Mark the radius of vehicle movement (in case of valet parking)
11 Impetus Confidential
Incoming Data Attributes
Continuous input streams
• Events as they happen
High data volume
• 1000-100000 events per second
Varied sources
• Data coming from multiple sources
12 Impetus Confidential
Expected Goals
Identify patterns
• Devices sending incorrect /duplicate data
Reliability
• Events are processed as they happen
• Events are not missed in case of failure
Scalability
• Should be able to support increase in volume
Capability to Add more Queries
• Should be able to add more queries for a particular type of
incoming stream
Notification / Alerts System
13 Impetus Confidential
Technology Stack – What all is needed?
Event Processing capability
• Esper
o Processing engine for data streams
o SQL-Like Support – run queries on data stream
o Sliding windows (time or length)
o Pattern Matching
o Executes large number of queries simultaneously
14 Impetus Confidential
Technology Stack – Esper
Esper - Simple steps to get started
• Get an Esper instance
• Create a statement (Esper Query Language)
• Register the statement with esper engine
• Create a Listner
• Attach listener to the statement
15 Impetus Confidential
Technology Stack – Esper
Esper – Sample Queries
Time based window
select avg(price) from StockTickEvent.win:time(30 sec)
Length based window
select symbol, avg(price) as averagePrice from
StockTickEvent.win:length(100) group by symbol
16 Impetus Confidential
Technology Stack - Storm
Data Carrier for Esper
• Storm
o Facilitates data transfer
o Continuous Computation
o Distributed, Fault tolerant
o Scalable, No Data Loss
o Provides parallelism
o Acking & Replay capability
17 Impetus Confidential
Technology Stack - Storm
Basic concept of Storm
• Streams, Spouts & Bolts
• Stream is unbounded sequence
of tuples
• Spouts are data emitters,
retrieving data from outside the
Storm cluster
• Bolts are data processors,
receive one or more stream and
emit (potentially) one or more
18 Impetus Confidential
Technology Stack - Storm
Storm Cluster
• Topology - A graph of spouts and bolts
that are connected with stream groupings
• Master Node – Runs daemon called
Nimbus
o Distributes code across cluster
o Assign tasks to machines
o Monitor failure
• Worker Node - Runs daemon called
Supervisor
o Listens for work assigned
o Start/Stop worker process
o Executes subset of topology
• Coordination between nimbus and
supervisor is done with Zookeeper
19 Impetus Confidential
Technology Stack - Flume
Log Data Collection
• Flume
o Stream oriented data flow
o Log streaming from various sources
o Collect, aggregate & move data to centralized data
store
o Distributed, Reliable
o Failover and recovery mechanism
20 Impetus Confidential
Technology Stack - Flume
Flume
• Agent - Receives data from
an application
• Collector – Writes data on to
a permanent storage
• Master – Separate service
controlling all the other
nodes
21 Impetus Confidential
Technology Stack - Messaging
Bridging the gap between Flume & Storm
• Queue Messaging System
o Robust messaging
o Flexible routing
o Highly available
o Makes Flume & Storm integration loosely coupled
• RabbitMQ fits the requirement
22 Impetus Confidential
Fitting it all together
23
Data Center
References
Esper
http://esper.codehaus.org/
Storm
https://github.com/nathanmarz/storm
https://github.com/tomdz/storm-esper
Flume
http://archive.cloudera.com/cdh/3/flume/UserGuide/#_architecture
Queue Messaging System
http://www.rabbitmq.com/
24 Impetus Confidential
Thank You