azure stream analytics

69
12.12.20 15 Azure Stream Analytics Marco Parenzan @marco_parenzan 12.12.20 15

Upload: marco-parenzan

Post on 16-Apr-2017

1.338 views

Category:

Software


9 download

TRANSCRIPT

Page 1: Azure Stream Analytics

12.12.2015

Azure Stream Analytics

Marco Parenzan@marco_parenzan

12.12.2015

Page 2: Azure Stream Analytics

12.12.2015

Thank you to our AWESOME sponsors!

Page 3: Azure Stream Analytics

12.12.2015

@marco_parenzan

Microsoft MVP 2015 for Azure Develop modern distributed

and cloud solutions Marco [dot] Parenzan [at] 1nn0va [dot] it

Passion for speaking and inspiring programmers, students, people www.innovazionefvg.net

SQL SATs organization addicted!

I’m a developer!

Page 4: Azure Stream Analytics

12.12.2015

Agenda Analytics in a modern world Why a developer talks about analytics Why cloud? Introduction to Azure Stream Analytics Azure Stream Analytics architecture Stream Analytics Query Language (SAQL) Handling time in Azure Stream Analytics Scaling Analytics Conclusions

Page 5: Azure Stream Analytics

12.12.2015

ANALYTICS IN A MODERN WORLD

Page 6: Azure Stream Analytics

12.12.2015

What is Analytics From Wikipedia

Analytics is the discovery and communication of meaningful patterns in data.

Especially valuable in areas rich with recorded information, analytics relies on the simultaneous application of statistics, computer programming and operations research to quantify performance.

Analytics often favors data visualization to communicate insight.

Page 7: Azure Stream Analytics

12.12.2015

Traditional analytics

Everything around us produce data From devices, sensors, infrastructures and

applications Traditional Business Intelligence first

collects data and analyzes it afterwards Typically 1 day latency, the day after

But we live in a fast paced world Social media Internet of Things Just-in-time production

Offline data is unuseful For many organizations, capturing and storing

event data for later analysis is no longer enough

Data at Rest

Page 8: Azure Stream Analytics

12.12.2015

Analytics in a modern world

We work with streaming data We want to monitor and

analyze data in near real time Typically a few seconds up to a

few minutes latency So we don’t have the time to

stop, copy data and analyze, but we have to work with streams of data

Data in motion

Page 9: Azure Stream Analytics

12.12.2015

Event-based systems Event I “something happened…

…somewhere… …sometime!

Event arrive at different times i.e. have unique timestamps

Events arrive at different rates (events/sec). In any given period of time there may be 0, 1 or

more events

Page 10: Azure Stream Analytics

12.12.2015

WHY A DEVELOPER TALKS ABOUT ANALYTICS

Page 11: Azure Stream Analytics

12.12.2015

Analytics with IoT

Page 12: Azure Stream Analytics

12.12.2015

Analytics with ASP.NET Api Apps, Logic Apps, World-wide distributed API (Rest)

Resource consuming (CPU, storage, network bandwidth)

Each request is logged With Event Hub or in log files

Evaluate how API is going on “real time” statistics

Ex. ASP.NET apps logs directly on EventHub

Page 13: Azure Stream Analytics

12.12.2015

WHY CLOUD?

Page 14: Azure Stream Analytics

12.12.2015

Why Analytics in the Cloud? Not all data is local

Event data is already in the Cloud Event data is globally distributed Bring the processing to the data, not the

data to the processing

14

Page 15: Azure Stream Analytics

12.12.2015

Apply cloud principles Focus on building solutions (PAAS or SAAS)

Without having to manage complex infrastructure and software

no hardware or other up-front costs and no time-consuming installation or setup

has elastic scale where resources are efficiently allocated and paid for as requested Scale to any volume of data while still achieving high

throughput, low-latency, and guaranteed resiliency Up and running in minutes

Page 16: Azure Stream Analytics

12.12.2015

INTRODUCTION TOAZURE STREAM ANALYTICS

Page 17: Azure Stream Analytics

12.12.2015

What is Azure Stream Analytics? Azure Stream Analytics is a cost effective

event processing engine is… …described via SQL-like syntax …a stream processing engine that is integrated

with a scalable event queuing system like Azure Event Hubs

..not alone …not the only one

Page 18: Azure Stream Analytics

12.12.2015

Microsoft Azure IoT Services

Devices Device Connectivity Storage Analytics Presentation & Action

Event Hubs SQL Database Machine Learning App Service

Service Bus Table/Blob Storage

Stream Analytics Power BI

External Data Sources DocumentDB HDInsight Notification

Hubs

IoT Hub External Data Sources Data Factory Mobile

Services

BizTalk Services

{ }

Page 19: Azure Stream Analytics

12.12.2015

Events handled by Azure Event Hubs

Event Producers

Azure Event Hub

> 1M Producers> 1GB/sec Aggregate Throughput

Up to 32 partitions via portal, more on

request

Par

titio

ns

Direct

PartitionKeyHash

Throughput Units:• 1 ≤ TUs ≤ Partition Count• TU: 1 MB/s writes, 2 MB/s reads

Consumer Group(s)

Receivers

Event Processor Host

IEventProcessor

Page 20: Azure Stream Analytics

12.12.2015

Analytics by Azure Stream Analytics Remember

Analytics is the discovery and communication of meaningful patterns in data.

Also Azure Machine Learning do the same : where is the difference?

Stream Analytics Machine LearningTransform (Stateless Functions, GROUP BY) Regression

Enrich (Select) Classification

Correlate (Join) Anomaly Detection

Page 21: Azure Stream Analytics

12.12.2015

Real-time analytics Intake millions of events per second

Intake millions of events per second (up to 1 GB/s) At variable loads

Scale that accommodates variable loads Low processing latency, auto adaptive (sub-second

to seconds) Transform, augment, correlate, temporal

operations Correlate between different streams, or with

reference data Find patterns or lack of patterns in data in real-time

Page 22: Azure Stream Analytics

12.12.2015

No challenges with scale Elasticity of the cloud for scale out

Spin up any number of resources on demand Scale from small to large when required Distributed, scale-out architecture

Page 23: Azure Stream Analytics

12.12.2015

Fully managed No hardware (PaaS offering)

Bypasses deployment expertise No software provisioning and maintaining No performance tuning Spin up any number of resources on demand

Expand your business globally leveraging Azure regions

Page 24: Azure Stream Analytics

12.12.2015

Mission critical availability Guaranteed events delivery

Guaranteed not to lose events or incorrect output Guaranteed “once and only once” delivery of event Ability to replay events

Guaranteed business continuity Guaranteed uptime (three nines of availability) Auto-recovery from failures Built in state management for fast recovery

Effective Audits Privacy and security properties of solutions are evident Azure integration for monitoring and ops alerting

Page 25: Azure Stream Analytics

12.12.2015

Lower costs Efficiently pay only for usage

Architected for multi-tenancy Not paying for idle resources

Typical cloud expense model Low startup costs Ability to incrementally add resources Reduce costs when business needs changes

Page 26: Azure Stream Analytics

12.12.2015

Rapid development SQL like language

High-level: focus on stream analytics solution Concise: less code to maintain First-class support for event streams and

reference data Built in temporal semantics

Built-in temporal windowing and joining Simple policy configuration to manage out-of-

order events and late arrivals

Page 27: Azure Stream Analytics

12.12.2015

AZURE STREAM ANALYTICS ARCHITECTURE

Page 28: Azure Stream Analytics

12.12.2015

Canonical Stream Analytics Pattern

Presentation and action

Storage andBatch Analysis

StreamAnalysis

IngestionCollectionEvent production

Event hubs

Cloud gateways(web APIs)

Field gateways

Applications

Legacy IOT (custom protocols)

Devices

IP-capable devices(Windows/Linux)

Low-power devices (RTOS)

Search and query

Data analytics(Power BI)

Web/thick client dashboardsEvent Hubs

SQL DB

Storage Tables

Power BI

Storage Blobs

Stream Analytics

Devices to take action

MachineLearning

more to come…

Page 29: Azure Stream Analytics

12.12.2015

Stream Analytics implements lambda-architecture generic, scalable and fault-tolerant data processing

architecture, based on his experience working on distributed data processing systems

robust system that is fault-tolerant, both against hardware failures and human mistakes

http://lambda-architecture.net/

All data entering the system is dispatched to both the batch layer and the speed layer for processing.The batch layer has two functions

managing the master dataset (an immutable, append-only set of raw data)(ii) to pre-compute the batch views.

The serving layer indexes the batch views so that they can be queried in low-latency, ad-hoc way.The speed layer compensates for the high latency of updates to the serving layer and deals with recent data only.Any incoming query can be answered by merging results from batch views and real-time views.

Page 30: Azure Stream Analytics

12.12.2015

Azure Stream Analytics

Data SourceCollect Process ConsumeDeliver

Event Inputs- Event Hub- IoT Hub- Azure Blob

Transform- Temporal joins- Filter- Aggregates- Projections- Windows- Etc.

EnrichCorrelate

Outputs- SQL Azure- Azure Blobs- Event Hub- Service Bus

Queue- Service Bus

Topics- Table storage- PowerBI- DocumentDb

BI Dashboards

Predictive Analytics

AzureStorage

• Temporal Semantics

• Guaranteed delivery

• Guaranteed up time

Reference Data- Azure BlobReference

Data

AzureDataFactory

Page 31: Azure Stream Analytics

12.12.2015

Inputs sources for a Stream Analytics Job

• Currently supported input Data Streams are Azure Event Hub , Azure IoT Hub and Azure Blob Storage. Multiple input Data Streams are supported.

• Advanced options lets you configure how the Job will read data from the input blob (which folders to read from, when a blob is ready to be read, etc).

• Reference data is usually static or changes very slowly over time.• Must be stored in Azure Blob

Storage. • Cached for performance

Page 32: Azure Stream Analytics

12.12.2015

Defining Event Schema

• The serialization format and the encoding for the for the input data sources (both Data Streams and Reference Data) must be defined.

• Currently three formats are supported: CSV, JSON and Avro (binary JSON - https://avro.apache.org/docs/1.7.7/spec.html)

• For CSV format a number of common delimiters are supported: (comma (,), semi-colon(;), colon(:), tab and space.

• For CSV and Avro optionally you can provide the schema for the input data.

Page 33: Azure Stream Analytics

12.12.2015

Output for Stream Analytics Jobs

Currently data stores supported as outputsAzure Blob storage: creates log files with temporal query results

Ideal for archivingAzure Table storage:

More structured than blob storage, easier to setup than SQL database and durable (in contrast to event hub)

SQL database: Stores results in Azure SQL Database tableIdeal as source for traditional reporting and analysis

Event hub: Sends an event to an event hubIdeal to generate actionable events such as alerts or notifications

Service Bus Queue: sends an event on a queueIdeal for sending events sequentially

Service Bus Topics: sends an event to subscribersIdeal for sending events to many consumers

PowerBI.com:Ideal for near real time reporting!

DocumentDb:Ideal if you work with json and object graphs

Page 34: Azure Stream Analytics

12.12.2015

STREAM ANALYTICS QUERY LANGUAGE (SAQL)

Page 35: Azure Stream Analytics

12.12.2015

SAQL – Language & Library

DMLSELECTFROMWHEREGROUP BYHAVINGCASE WHEN THEN ELSEINNER/LEFT OUTER JOINUNIONCROSS/OUTER APPLYCASTINTOORDER BY ASC, DSC

Scaling ExtensionsWITHPARTITION BYOVER

Date and Time FunctionsDateNameDatePartDayMonthYearDateTimeFromPartsDateDiffDateAdd

Windowing ExtensionsTumblingWindowHoppingWindowSlidingWindowDuration

Aggregate FunctionsSumCountAvgMinMaxStDevStDevPVarVarP

String FunctionsLenConcatCharIndexSubstringPatIndex

Temporal FunctionsLag, IsFirstCollectTop

Page 36: Azure Stream Analytics

12.12.2015

Supported types

Type Description

bigint Integers in the range -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807).

float Floating point numbers in the range - 1.79E+308 to -2.23E-308, 0, and 2.23E-308 to 1.79E+308.

nvarchar(max) Text values, comprised of Unicode characters. Note: A value other than max is not supported.

datetime Defines a date that is combined with a time of day with fractional seconds that is based on a 24-hour clock and relative to UTC (time zone offset 0).

Inputs will be casted into one of these typesWe can control these types with a CREATE TABLE statement:

This does not create a table, but just a data type mapping for the inputs

Page 37: Azure Stream Analytics

12.12.2015

INTO clause

Pipelining data from input to output Without INTO clause we write to destination named

‘output’ We can have multiple outputs

With INTO clause we can choose for every select the appropriate destination

E.g. send events to blob storage for big data analysis, but send special events to event hub for alerting

SELECT UserName, TimeZoneINTO OutputFROM InputStreamWHERE Topic = 'XBox'

Page 38: Azure Stream Analytics

12.12.2015

WHERE clause Specifies the conditions for the rows

returned in the result set for a SELECT statement, query expression, or subquery

There is no limit to the number of predicates that can be included in a search condition.

SELECT UserName, TimeZoneFROM InputStreamWHERE Topic = 'XBox'

Page 39: Azure Stream Analytics

12.12.2015

JOIN

We can combine multiple event streams or an event stream with reference data via a join (inner join) or a left outer join In the join clause we can specify the time

window in which we want the join to take place We use a special version of DateDiff for this

Page 40: Azure Stream Analytics

12.12.2015

Reference Data

Seamless correlation of event streams with reference data Static or slowly-changing data stored in blobs

CSV and JSON files in Azure Blobs scanned for new snapshots on a settable cadence

JOIN (INNER or LEFT OUTER) between streams and reference data sources

Reference data appears like another input:SELECT myRefData.Name, myStream.Value FROM myStreamJOIN myRefData

ON myStream.myKey = myRefData.myKey

Page 41: Azure Stream Analytics

12.12.2015

Reference data tips Currently reference data cannot be

refreshed automatically. You need to stop the job and specify new

snapshot with reference data Reference Data are only in Blog

Practice says that you use services like Azure Data Factory to move data from Azure Data Sources to Azure Blob Storage

Have you followed Francesco Diaz’s session?

Page 42: Azure Stream Analytics

12.12.2015

UNION

Combines the results of two or more queries into a single result set that includes all the rows that belong to all the queries in the union

Number and order of the columns must be the same in all queries

Data types must be compatible

If ‘ALL’ not specified duplicate rows are removed

SELECT TollId, ENTime AS Time , LicensePlate FROM EntryStream TIMESTAMP BY ENTime UNION SELECT TollId, EXTime AS Time , LicensePlate FROM ExitStream TIMESTAMP BY EXTime

TollId EntryTime LicensePlate …

1 2014-09-10 12:01:00.000 JNB 7001 …

1 2014-09-10 12:02:00.000 YXZ 1001 …

3 2014-09-10 12:02:00.000 ABC 1004 …

TollId ExitTime LicensePlate

1 2009-06-25 12:03:00.000 JNB 7001

1 2009-06-2512:03:00.000 YXZ 1001

3 2009-06-25 12:04:00.000 ABC 1004

TollId Time LicensePlate

1 2014-09-10 12:01:00.000 JNB 7001

1 2014-09-10 12:02:00.000 YXZ 1001

3 2014-09-10 12:02:00.000 ABC 1004

1 2009-06-25 12:03:00.000 JNB 7001

1 2009-06-2512:03:00.000 YXZ 10013 2009-06-25 12:04:00.000 ABC 1004

Page 43: Azure Stream Analytics

12.12.2015

HANDLING TIME IN AZURE STREAM ANALYTICS

Page 44: Azure Stream Analytics

12.12.2015

Traditional queries

Traditional querying assumes the data doesn’t change while you are querying it: query a fixed state If the data is changing: snapshots and transactions

‘freeze’ the data while we query it Since we query a finite state, our query should

finish in a finite amount of time

table query resulttable

Page 45: Azure Stream Analytics

12.12.2015

A different kind of query When analyzing a stream of data, we deal

with a potential infinite amount of data As a consequence our query will never end! To solve this problem most queries will use

time windows

stream temporal query

resultstrea

m

Page 46: Azure Stream Analytics

12.12.2015

Arrival Time Vs Application Time Every event that flows through the system comes with a

timestamp that can be accessed via System.Timestamp This timestamp can either be an application time which the

user can specify in the query A record can have multiple timestamps associated with it

The arrival time has different meanings based on the input sources. For the events from Azure Service Bus Event Hub, the arrival

time is the timestamp given by the Event Hub For Blob storage, it is the blob’s last modified time.

If the user wants to use an application time, they can do so using the TIMESTAMP BY keyword Data are sorted by timestamp column

Page 47: Azure Stream Analytics

12.12.2015

Temporal Joins

Join are used to combine events from two or more input sourcesJoins are temporal in nature – each JOIN must provide limits on how far the matching rows can be separated in timeTime bounds are specified inside the ON clause using DATEDIFF functionSupports LEFT OUTER JOIN to specify rows from the left table that do not meet the join conditionUseful for pattern detection

SELECT Make FROM EntryStream ES TIMESTAMP BY EntryTimeJOIN ExitStream EX TIMESTAMP BY ExitTimeON ES.Make= EX.Make AND DATEDIFF(second,ES,EX) BETWEEN 0 AND 10

Time(Seconds)

{“Mazda”,6} {“BMW”,7} {“Honda”,2} {“Volvo”,3}Toll Entry :

{“Mazda”,3} {“BMW”,7}{“Honda”,2} {“Volvo”,3}Toll Exit :

0 5 10 15 20 25

“Honda” – Not in result because event in Exit stream precedes event in Entry Stream“BMW” – Not in result because Entry and Exit stream events > 10 seconds apart Query Result = [Mazda, Volvo]

Page 48: Azure Stream Analytics

12.12.2015

Windowing Concepts

Common requirement to perform some set-based operation (count, aggregation etc) over events that arrive within a specified period of time

Group by returns data aggregated over a certain subset of data

How to define a subset in a stream? Windowing functions! Each Group By requires a windowing function

Page 49: Azure Stream Analytics

12.12.2015

Three types of windows

Every window operation outputs events at the end of the window The output of the window will be single event based on the

aggregate function used. The event will have the time stamp of the window

All windows have a fixed length

Tumbling windowAggregate per time interval

Hopping windowSchedule overlapping windows

Sliding windowWindows constant re-evaluated

Page 50: Azure Stream Analytics

12.12.2015

Tumbling Window

1 5 4 26 8 6 5

0 10 4020 30 Time (secs)

1 5 4 26

8 6

50

A 20-second Tumbling Window

60

3 6 1

5 3 6 1

Tumbling windows:• Repeat• Are non-overlapping

SELECT TollId, COUNT(*)FROM EntryStream TIMESTAMP BY EntryTimeGROUP BY TollId, TumblingWindow(second, 20)

Query: Count the total number of vehicles entering each toll booth every interval of 20 seconds.

An event can belong to only one tumbling window

Page 51: Azure Stream Analytics

12.12.2015

Hopping Window

1 5 4 26 8 6

0 10 4020 30 Time (secs)

50

A 20-second Hopping Window with a 10 second “Hop”

60

Hopping windows:• Repeat• Can overlap • Hop forward in time by a fixed periodSame as tumbling window if hop size = window sizeEvents can belong to more than one hopping window

SELECT COUNT(*), TollId FROM EntryStream TIMESTAMP BY EntryTimeGROUP BY TollId, HoppingWindow (second, 20,10)

4 26

8 6

5 3 6 1

1 5 4 26

8 6 5 3

6 15 3

QUERY: Count the number of vehicles entering each toll booth every interval of 20 seconds; update results every 10 seconds

Page 52: Azure Stream Analytics

12.12.2015

Sliding Window

Sliding window:• Continuously moves forward by an ε

(epsilon) • Produces an output only during the

occurrence of an event• Every windows will have at least one

eventEvents can belong to more than one sliding windowSELECT TollId, Count(*) FROM EntryStream ESGROUP BY TollId, SlidingWindow (second, 20)HAVING Count(*) > 10

Query: Find all the toll booths which have served more than 10 vehicles in the last 20 seconds

1 5

0 10 4020 30 Time (secs)

50

A 20-second Sliding Window

1

8

8

51

9

51 9

60

5 9

«5» enter

«1» enter

«9» enter

«1» exit

«5» exit 9

«9» exit «8» enter

Page 53: Azure Stream Analytics

12.12.2015

Demo: analyticsgames.azurewebsites.net

Mobile Controller (html)

WebApi MVC + Web Api

Event Hub-

Stream Analytics Service Bus (Queue)

Web Worker

Remote (html)

Json Tap event SignalR Message

http notificationJson Tap event

Json Event Hub Input source

Service busoutput queue

Input service busoutput queue

Page 54: Azure Stream Analytics

12.12.2015

SCALING STREAM ANALYTICS

Page 55: Azure Stream Analytics

12.12.2015

Steaming Unit Is a measure of the computing resource

available for processing a Job A streaming unit can process up to 1 Mb /

second By default every job consists of 1 streaming

unit. Total number of streaming units that can be used depends on : rate of incoming events complexity of the query

Page 56: Azure Stream Analytics

12.12.2015

Multiple steps, multiple outputs

A query can have multiple steps to enable pipeline execution A step is a sub-query defined using

WITH (“common table expression”) The only query outside of the WITH

keyword is also counted as a step Can be used to develop complex

queries more elegantly by creating a intermediary named result Each step’s output can be sent to

multiple output targets using INTO

WITH Step1 AS (

SELECT Count(*) AS CountTweets, TopicFROM TwitterStream PARTITION BY PartitionId

GROUP BY TumblingWindow(second, 3), Topic, PartitionId

),

Step2 AS (

SELECT Avg(CountTweets) FROM Step1

GROUP BY TumblingWindow(minute, 3)

)

SELECT * INTO Output1 FROM Step1

SELECT * INTO Output2 FROM Step2

SELECT * INTO Output3 FROM Step2

Page 57: Azure Stream Analytics

12.12.2015

Scaling Concepts – Partitions

When a query is partitioned, input events will be processed and aggregated in a separate partition groups Output events are produced for each partition group To read from Event Hubs ensure that the number of partitions match

The query within the step must have the Partition By keyword If your input is a partitioned event hub, we can write partitioned queries and

partitioned subqueries (WITH clause) A non-partitioned query with a 3-fold partitioned subquery can have (1+3) * 4 = 24

streaming units!

PartitionId = 1

PartitionId = 3PartitionId = 2

SELECT Count(*) AS Count, Topic FROM TwitterStream PARTITION BY PartitionId GROUP BY TumblingWindow(minute, 3), Topic, PartitionId

Stream AnalyticsQuery Result 1

Query Result 2

Query Result 3

PartitionId = 1PartitionId = 2PartitionId = 3

Event Hub

Page 58: Azure Stream Analytics

12.12.2015

Out of order inputs Event Hub guarantees monotonicity of the timestamp on each

partition of the Event Hub All events from all partitions are merged by timestamp order, there

will be no out of order events. When it's important for you to use sender's timestamp, so a

timestamp from the event payload is chosen using "timestamp by," there can be several sources or disorderness introduced. Producers of the events have clock skews. Network delay from the producers sending the events to Event Hub. Clock skews between Event Hub partitions.

Do we skip them (drop) or do we pretend they happened just now (adjust)?

Page 59: Azure Stream Analytics

12.12.2015

Handling out of order events

On the configuration tab, you will find the following defaults. Using 0 seconds as the out of order tolerance window means you assert

all events are in order all the time. To allow ASA to correct the disorderness, you can specify a non-

zero out of order tolerance window size. ASA will buffer events up to that window and reorder them using the

user chosen timestamp before applying the temporal transformation. Because of the buffering, the side effect is the output is delayed

by the same amount of time As a result, you will need to tune the value to reduce the number of out

of order events and keep the latency low.

Page 60: Azure Stream Analytics

12.12.2015

CONCLUSIONS

Page 61: Azure Stream Analytics

12.12.2015

Summary Azure Stream Analytics is the PaaS solution for

Analytics on streaming data It is programmable with a SQL-like language Handling time is a special and central feature Scale with cloud principles: elastic, self service,

multitenant, pay per use More questions:

Other solutions Pricing What to do with that data? Futures

Page 62: Azure Stream Analytics

12.12.2015

Microsoft real-time stream processing options

Complex event processing in SQL Server

Ease of development and operationalization Flexibility and customizability

On-premises or Azure IaaS Azure PaaS Azure PaaS

No No Yes

.NET/LINQ SQL SCP.NET, Java, Python

Visual Studio Web browser Visual Studio

Page 63: Azure Stream Analytics

12.12.2015

Apache Storm (in HDInsight) Apache Storm is a distributed, fault-tolerant,

open source real-time event processing solution. Storm was originally used by Twitter to process

massive streams of data from the Twitter firehose.

Today, Storm is an incubator project as part of the Apache Software foundation.

Typically, Storm will be integrated with a scalable event queuing system like Apache Kafka or Azure Event Hubs.

Page 64: Azure Stream Analytics

12.12.2015

Stream Analytics vs Apache Storm Storm:

Data Transformation Can handle more dynamic data (if you're willing to

program) Requires programming

Stream Analytics Ease of Setup JSON and CSV format only Can change queries within 4 minutes Only takes inputs from Event Hub, Blob Storage Only outputs to Azure Blob, Azure Tables, Azure SQL,

PowerBI

Page 65: Azure Stream Analytics

12.12.2015

Pricing Pricing based on volume per job:

Volume of data processed Streaming units required to process the data

stream

Price (USD)Volume of Data Processed Volume of data processed by the

streaming job (in GB)€ 0.0009 per GB

Streaming Unit* Blended measure of CPU, memory,

throughput.

€ 0.0262 per hour€ 18,864 per

month

Page 66: Azure Stream Analytics

12.12.2015

Azure Machine Learning

Undestand the “sequence” of data in the history to predict the future But Azure can ‘learn’ which values preceded issues

Azure Machine Learning

Page 67: Azure Stream Analytics

12.12.2015

Power BI Solutions to create realtime dashboards SaaS Service

Inside Office 365

Page 68: Azure Stream Analytics

12.12.2015

Futures

https://feedback.azure.com/forums/270577-azure-stream-analytics

[started] Native integration with Azure Machine Learning

(done this night!) Provide better ways to debug.

[planned] Call to a REST endpoint to invoke custom code

[under review] Take input from DocumentDb use SQL Azure as reference data

Page 69: Azure Stream Analytics

12.12.2015

Thanks Marco Parenzan

http://twitter.com/marco_parenzan http://www.slideshare.net/marcoparenzan http://www.github.com/marcoparenzan