batch and interactive analytics: from data to insight

34
Senior Technical Lead Anjana Fernando Batch and Interactive Analytics: From Data to Insight

Upload: wso2

Post on 08-Aug-2015

294 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Batch and Interactive Analytics: From Data to Insight

Senior Technical LeadAnjana Fernando

Batch and Interactive Analytics: From Data

to Insight

Page 2: Batch and Interactive Analytics: From Data to Insight

2

Agenda

2

๏ Batch and Interactive Processing Defined๏ Technologies used for Batch/Interactive Analytics๏ WSO2 Analytics Architecture ๏ Solutions๏ Demo

Page 3: Batch and Interactive Analytics: From Data to Insight

3

Let’s Break It Down...

3

๏ Batch Analytics:

Batch Analytics is where the data is first stored, and later read back to do some relatively time consuming data processing task.

๏ Interactive Analytics:

Interactive analytics is used where, a stored data set can be queried in an ad-hoc manner in finding useful information quickly.

Source: http://themarketingblog.ecornell.com/

Page 4: Batch and Interactive Analytics: From Data to Insight

4

Where Can We Use It?

4

๏ Service Statistics Generation๏ Extracting KPIs: average response

time, maximum latency etc..๏ Log Analysis

๏ Efficiently store and analyse logs, in supporting comprehensive search operations

๏ Activity Monitoring๏ Trace a workflow of events

throughout a system. Useful in finding failed transactions, performance issues etc..

๏ Solving Optimization Problems๏ Analysing large amount of past

data in optimizing parameters for an existing algorithm

Source: http://www.axentas.com/

Page 5: Batch and Interactive Analytics: From Data to Insight

55

Batch Analytics Technologies

Page 6: Batch and Interactive Analytics: From Data to Insight

66

Interactive Analytics (Indexing) Technologies

๏ Solr / SolrCloud

๏ ElasticSearch

๏ WSO2 DAS

Page 7: Batch and Interactive Analytics: From Data to Insight

7

WSO2 Analytics Platform

7

Page 8: Batch and Interactive Analytics: From Data to Insight

8

WSO2 Analytics Platform

8

Page 9: Batch and Interactive Analytics: From Data to Insight

9

WSO2 DAS Architecture

9

Page 10: Batch and Interactive Analytics: From Data to Insight

10

Data Model

10

Data Published according to a strongly typed data stream

{

'name': 'stream.name',

'version': '1.0.0',

'nickName': 'stream nickname',

'description': 'description of the stream',

'metaData':[

{'name':'meta_data_1','type':'STRING'},

],

'correlationData':[

{'name':'correlation_data_1','type':'STRING'}

],

'payloadData':[

{'name':'payload_data_1','type':'BOOL'},

{'name':'payload_data_2','type':'LONG'}

]

}

Page 11: Batch and Interactive Analytics: From Data to Insight

11

WSO2 DAS - Batch Processing

11

๏ Powered by Apache Spark 10 - 100x higher performance than Hadoop๏ Parallel, distributed with optimized in-memory processing๏ Can run on top of Hadoop Yarn, Mesos or in Standalone mode๏ Scalable script-based analytics written using an easy-to-learn, SQL-like query

language powered by Spark SQL๏ Interactive built in web interface (Spark Console) for ad-hoc query execution๏ HA/FO supported scheduled query script execution ๏ Run Spark on a single node, Spark embedded Carbon server cluster or connect to

external Spark cluster๏ Custom UDF support

INSERT INTO TABLE UserTable SELECT userName, COUNT(DISTINCT orderID), SUM(quantity) FROM PhoneSalesTable WHERE version= "1.0.0" GROUP BY userName;

e.g.:-

Page 12: Batch and Interactive Analytics: From Data to Insight

12

Spark vs Hadoop MapReduce

12

๏ Hadoop MapReduce๏ Supports only Map/Reduce, fine

for single pass computations๏ High processing latency and

inefficiencies related to intermediate results persisted

๏ Hard to implement iterative algorithms

๏ Spark๏ Resilient Distributed Dataset (RDD)

based๏ Support more than just Map and

Reduce functions๏ Intermediate results kept in-

memory๏ Lazy evaluation of data operations,

allowing more optimization๏ Allows developer to implement

complex data operations in a DAG pattern

๏ In-Memory/Persisted mode operation, switch when required

๏ Simpler API

Page 13: Batch and Interactive Analytics: From Data to Insight

13

WSO2 DAS - Interactive Analytics Features

13

๏ Full text data indexing support powered by Apache Lucene๏ Drill-down search support๏ Distributed data indexing

๏ Designed to support scalability๏ Near real-time data indexing and retrieval

๏ Data indexed immediately as received๏ Distributed indexing implementation for scalability

๏ Index sharding with Lucene indices ๏ Data storage scalability achieved with underlying database, e.g. HBase,

Cassandra, RDBMS etc..

log: “ERROR” AND (ip: “192.168.4.33” OR ip: “192.168.4.34”) AND type: “HTTPD”

e.g.:-

Page 14: Batch and Interactive Analytics: From Data to Insight

14

WSO2 DAS - Mixing Real-time / Batch Processing

14

Page 15: Batch and Interactive Analytics: From Data to Insight

15

WSO2 DAS - Alerts

15

๏ Detecting conditions can be done via CEP Queries

๏ Key is the “Last Mile”๏ Email๏ SMS๏ Push notifications to a UI๏ Pager ๏ Trigger physical Alarm

๏ How?๏ Batch Analytics: Using WSO2’s custom

Analytics Provider for Spark SQL to directly send records as events to an event stream

๏ Select Email sender “Output Adaptor” from DAS, or send from DAS to ESB -> ESB Connectors

Page 16: Batch and Interactive Analytics: From Data to Insight

16

Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring

16

Page 17: Batch and Interactive Analytics: From Data to Insight

17

Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring

17

Page 18: Batch and Interactive Analytics: From Data to Insight

18

Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring

18

Page 19: Batch and Interactive Analytics: From Data to Insight

19

Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring

19

Page 20: Batch and Interactive Analytics: From Data to Insight

20

Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring

20

Page 21: Batch and Interactive Analytics: From Data to Insight

21

Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring

21

Page 22: Batch and Interactive Analytics: From Data to Insight

22

Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring

22

Page 23: Batch and Interactive Analytics: From Data to Insight

23

Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring

23

Page 24: Batch and Interactive Analytics: From Data to Insight

● Activity monitoring is for tracking events from multiple nodes in a flow to understand a

specific activity

○ e.g.:-

■ A client initiating a web services request which travels through multiple

ESBs, application servers and returns back. This flow will be uniquely

identified and visualized in DAS

○ Used for tracing messages, finding performance hotspots in the flow

○ Implemented based on a correlation id based mechanism and indexing

○ Upcoming: Mediator level tracing and profiling in WSO2 ESB 5.0

Solutions Supported with Batch/Interactive Analytics:Activity Monitoring

Page 25: Batch and Interactive Analytics: From Data to Insight

Solutions Supported with Batch/Interactive Analytics:Activity Monitoring

Page 26: Batch and Interactive Analytics: From Data to Insight

Solutions Supported with Batch/Interactive Analytics:Activity Monitoring

Page 27: Batch and Interactive Analytics: From Data to Insight

Solutions Supported with Batch/Interactive Analytics:Activity Monitoring

Page 28: Batch and Interactive Analytics: From Data to Insight

Solutions Supported with Batch/Interactive Analytics:Activity Monitoring

Page 29: Batch and Interactive Analytics: From Data to Insight

Solutions Supported with Batch/Interactive Analytics:Activity Monitoring

Page 30: Batch and Interactive Analytics: From Data to Insight

Solutions Supported with Batch/Interactive Analytics:Log Analysis

● Log analysis toolbox

● Log event indexing

○ Uses the new DAS v3.x indexing support

○ Event attributes can be indexed to be search by server, cluster, log type and also log

messages itself for full text search

● Custom search queries using Lucene queries and regular expressions

● Logstash adaptor for log publishing

Page 31: Batch and Interactive Analytics: From Data to Insight

Solutions Supported with Batch/Interactive Analytics:Log Analysis

Page 32: Batch and Interactive Analytics: From Data to Insight

Solutions Supported with Batch/Interactive Analytics:Log Analysis

Page 33: Batch and Interactive Analytics: From Data to Insight

Demo

Page 34: Batch and Interactive Analytics: From Data to Insight

Contact us !