batch and interactive analytics: from data to insight

Post on 08-Aug-2015

294 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Senior Technical LeadAnjana Fernando

Batch and Interactive Analytics: From Data

to Insight

2

Agenda

2

๏ Batch and Interactive Processing Defined๏ Technologies used for Batch/Interactive Analytics๏ WSO2 Analytics Architecture ๏ Solutions๏ Demo

3

Let’s Break It Down...

3

๏ Batch Analytics:

Batch Analytics is where the data is first stored, and later read back to do some relatively time consuming data processing task.

๏ Interactive Analytics:

Interactive analytics is used where, a stored data set can be queried in an ad-hoc manner in finding useful information quickly.

Source: http://themarketingblog.ecornell.com/

4

Where Can We Use It?

4

๏ Service Statistics Generation๏ Extracting KPIs: average response

time, maximum latency etc..๏ Log Analysis

๏ Efficiently store and analyse logs, in supporting comprehensive search operations

๏ Activity Monitoring๏ Trace a workflow of events

throughout a system. Useful in finding failed transactions, performance issues etc..

๏ Solving Optimization Problems๏ Analysing large amount of past

data in optimizing parameters for an existing algorithm

Source: http://www.axentas.com/

55

Batch Analytics Technologies

66

Interactive Analytics (Indexing) Technologies

๏ Solr / SolrCloud

๏ ElasticSearch

๏ WSO2 DAS

7

WSO2 Analytics Platform

7

8

WSO2 Analytics Platform

8

9

WSO2 DAS Architecture

9

10

Data Model

10

Data Published according to a strongly typed data stream

{

'name': 'stream.name',

'version': '1.0.0',

'nickName': 'stream nickname',

'description': 'description of the stream',

'metaData':[

{'name':'meta_data_1','type':'STRING'},

],

'correlationData':[

{'name':'correlation_data_1','type':'STRING'}

],

'payloadData':[

{'name':'payload_data_1','type':'BOOL'},

{'name':'payload_data_2','type':'LONG'}

]

}

11

WSO2 DAS - Batch Processing

11

๏ Powered by Apache Spark 10 - 100x higher performance than Hadoop๏ Parallel, distributed with optimized in-memory processing๏ Can run on top of Hadoop Yarn, Mesos or in Standalone mode๏ Scalable script-based analytics written using an easy-to-learn, SQL-like query

language powered by Spark SQL๏ Interactive built in web interface (Spark Console) for ad-hoc query execution๏ HA/FO supported scheduled query script execution ๏ Run Spark on a single node, Spark embedded Carbon server cluster or connect to

external Spark cluster๏ Custom UDF support

INSERT INTO TABLE UserTable SELECT userName, COUNT(DISTINCT orderID), SUM(quantity) FROM PhoneSalesTable WHERE version= "1.0.0" GROUP BY userName;

e.g.:-

12

Spark vs Hadoop MapReduce

12

๏ Hadoop MapReduce๏ Supports only Map/Reduce, fine

for single pass computations๏ High processing latency and

inefficiencies related to intermediate results persisted

๏ Hard to implement iterative algorithms

๏ Spark๏ Resilient Distributed Dataset (RDD)

based๏ Support more than just Map and

Reduce functions๏ Intermediate results kept in-

memory๏ Lazy evaluation of data operations,

allowing more optimization๏ Allows developer to implement

complex data operations in a DAG pattern

๏ In-Memory/Persisted mode operation, switch when required

๏ Simpler API

13

WSO2 DAS - Interactive Analytics Features

13

๏ Full text data indexing support powered by Apache Lucene๏ Drill-down search support๏ Distributed data indexing

๏ Designed to support scalability๏ Near real-time data indexing and retrieval

๏ Data indexed immediately as received๏ Distributed indexing implementation for scalability

๏ Index sharding with Lucene indices ๏ Data storage scalability achieved with underlying database, e.g. HBase,

Cassandra, RDBMS etc..

log: “ERROR” AND (ip: “192.168.4.33” OR ip: “192.168.4.34”) AND type: “HTTPD”

e.g.:-

14

WSO2 DAS - Mixing Real-time / Batch Processing

14

15

WSO2 DAS - Alerts

15

๏ Detecting conditions can be done via CEP Queries

๏ Key is the “Last Mile”๏ Email๏ SMS๏ Push notifications to a UI๏ Pager ๏ Trigger physical Alarm

๏ How?๏ Batch Analytics: Using WSO2’s custom

Analytics Provider for Spark SQL to directly send records as events to an event stream

๏ Select Email sender “Output Adaptor” from DAS, or send from DAS to ESB -> ESB Connectors

16

Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring

16

17

Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring

17

18

Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring

18

19

Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring

19

20

Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring

20

21

Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring

21

22

Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring

22

23

Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring

23

● Activity monitoring is for tracking events from multiple nodes in a flow to understand a

specific activity

○ e.g.:-

■ A client initiating a web services request which travels through multiple

ESBs, application servers and returns back. This flow will be uniquely

identified and visualized in DAS

○ Used for tracing messages, finding performance hotspots in the flow

○ Implemented based on a correlation id based mechanism and indexing

○ Upcoming: Mediator level tracing and profiling in WSO2 ESB 5.0

Solutions Supported with Batch/Interactive Analytics:Activity Monitoring

Solutions Supported with Batch/Interactive Analytics:Activity Monitoring

Solutions Supported with Batch/Interactive Analytics:Activity Monitoring

Solutions Supported with Batch/Interactive Analytics:Activity Monitoring

Solutions Supported with Batch/Interactive Analytics:Activity Monitoring

Solutions Supported with Batch/Interactive Analytics:Activity Monitoring

Solutions Supported with Batch/Interactive Analytics:Log Analysis

● Log analysis toolbox

● Log event indexing

○ Uses the new DAS v3.x indexing support

○ Event attributes can be indexed to be search by server, cluster, log type and also log

messages itself for full text search

● Custom search queries using Lucene queries and regular expressions

● Logstash adaptor for log publishing

Solutions Supported with Batch/Interactive Analytics:Log Analysis

Solutions Supported with Batch/Interactive Analytics:Log Analysis

Demo

Contact us !

top related