azure stream analytics by nico jacobs

34
Azure Stream Analytics Dr. Nico Jacobs, nico@ .be, @SQLWaldorf et and win an Ignite 2016 ticket #itproceed

Upload: itproceed

Post on 13-Aug-2015

49 views

Category:

Technology


0 download

TRANSCRIPT

  1. 1. Azure Stream Analytics Dr. Nico Jacobs, nico@ .be, @SQLWaldorf Tweet and win an Ignite 2016 ticket #itproceed
  2. 2. Why Traditional Business Intelligence first collects data and analyzes it afterwards Typically 1 day latency But we live in a fast paced world Social media Internet of Things Just-in-time production We want to monitor and analyze streams of data in near real time Typically a few seconds up to a few minutes latency
  3. 3. A different kind of query Traditional querying assumes the data doesnt change while you are querying it: We query a fixed state If the data is changing: snapshots and transactions freeze the data while we query it Since we query a finite state, our query should finish in a finite amount of time table query result table 14
  4. 4. A different kind of query When analyzing a stream of data, we deal with a potential infinite amount of data As a consequence our query will never end! To solve this problem most queries will use time windows stream temporal query result stream 12:15:00 1 12:15:10 3 12:15:20 2
  5. 5. Azure Stream Analytics In Azure Stream Analytics we create, manage and run jobs Every job has at least one input, one query and one output But jobs can be more complex: a query can read from different inputs and write to multiple outputs QueryInput Output Query
  6. 6. Inputs Currently two types of input supported Data Stream: an Azure Event Hub or Azure Blob through which we receive a stream of data Reference Data: an Azure Blob for static reference data (lookup table) No support for Azure databases or other cloud storage (yet)
  7. 7. Temporal query Query is written in SQL! No Java or .Net coding skills needed Mainly a subset of T-SQL A few extra keywords are added to deal with temporal queries
  8. 8. Output Results are stored either in Azure Blob storage: creates log files with temporal query results Ideal for archiving SQL database: Stores results in Azure SQL Database table Ideal as source for traditional reporting and analysis Event hub: Sends an event to an event hub Ideal to generate actionable events such as alerts or notifications Azure Table storage: More structured than blob storage, easier to setup than SQL database and durable (in contrast to event hub) PowerBI.com: Ideal for near real time reporting!
  9. 9. Time for action! Online feedback on this talk Browse to itprofeed.azurewebsites.net Event hub Azure Stream Analytics PowerBI.com
  10. 10. Demos 1. Create an Azure Service Bus Event Hub 2. Implement applications to send data into the Event Hub 3. Create an Azure Stream Analytics job 4. Link the input 5. Create an output 6. Write and test a query 7. Start the job
  11. 11. Create Azure Event Hub Azure event hub is newest component in Azure Service Bus Typically used to collect sensor and app data Event hub collects and temporary stores thousands of events per second
  12. 12. Implement application for sending events
  13. 13. Create Azure Stream Analytics job Currently only available in the old Azure portal Preferably put it in the same region as Event Hub and data storage
  14. 14. Link the input Event hub does not assume any data format But stream analytics needs to parse the data Three data formats supported: JSON, CSV and Apache Avro (binary JSON) No columns specified
  15. 15. Create an output Five output options: Azure Table or Blob, SQL Database, Event Hub or PowerBI.com Blob and event hub do not require predefined meta-data Again: CSV, JSON and Avro supported When storing information in a SQL Database or Azure Table storage we need to create upfront the table in which we will store the results Meta-data needed upfront
  16. 16. Create Query In a query window we can write two types of statements: SELECT statement to extract a stream of results from one or more input streams Required Can use WITH clause to write more complex constructs or increase parallelism CREATE TABLE statements to specify type information on our input stream(s)
  17. 17. Simple SELECT statement SELECT | * FROM [WHERE ] This query simply produces a filtered output- stream based on the input stream In the SELECT statement and WHERE clause we can use functions such as DATEDIFF But many functions from T-SQL are not available E.g. we can use CAST but not CONVERT
  18. 18. Testing a query Trial and error query development would be slow: Starting a Stream Analytics job takes some minutes Inspecting the outcome of a job means checking tables or blobs We cannot modify a query while it is running Luckily when a job is stopped, we can run a query on data from a JSON text file and see the outcome in the browser There is even a sample input option
  19. 19. Data types Very simple type system: Bigint Float Nvarchar(max) Datetime Inputs will be casted into one of these types We can control these types with a CREATE TABLE statement: This does not create a table, but just a data type mapping for the inputs
  20. 20. Group by Group by returns data aggregated over a certain subset of data How to define a subset in a stream? Windowing functions! Each Group By requires a windowing function (fromMSDN)
  21. 21. 3 Windowing functions Tumbling Hopping Sliding
  22. 22. Timestamp by A record can have multiple timestamps associated with them E.g. the time a phone call starts, ends, is submitted to the event hub, is processed by Azure Stream Analytics, By default the timestamp used in the temporal SQL queries is System.Timestamp Event hub arrival time Blob last modified data But we can include an explicit timestamp in the data we provide. In that case we must follow the FROM in our temporal query with TIMESTAMP BY
  23. 23. JOIN We can combine multiple event streams or an event stream with reference data via a join (inner join) or a left outer join In the join clause we can specify the time window in which we want the join to take place We use a special version of DateDiff for this
  24. 24. INTO clause We can have multiple outputs Without INTO clause we write to destination named output With INTO clause we can choose for every select the appropriate destination E.g. send events to blob storage for big data analysis, but send special events to event hub for alerting
  25. 25. Out of order inputs What if event 6:54:32 arrives after event 6:55:55? Trick: buffer your data for n minutes: all events that arrive less than n minutes late will be processed (tolerance window) What do we do with everything that arrives more then n minutes late? Do we skip them (drop) or do we pretend they happened just now (adjust)?
  26. 26. Scaling By default every job consists of 1 streaming unit A streaming unit can process up to 1 Mb / second When higher throughput is needed we can activate up to 6 streaming units per regular query If your input is a partitioned event hub, we can write partitioned queries and partitioned subqueries (WITH clause) A non-partitioned query with a 3-fold partitioned subquery can have (1+3) * 4 = 24 streaming units!
  27. 27. Pricing Azure Stream Analytics 0.55 per streaming unit per day (+- 17 /month) 0.0008 per Gb throughput So, when processing about 10 million events at a max. rate of 1 Mb/sec. this costs less than 18 a month
  28. 28. Machine Learning Sensor thresholds are not always constant But Azure can learn which values preceded issues Azure Machine Learning
  29. 29. Summary Azure Stream Analytics is a PaaS version of StreamInsight Process stream of events via temporal queries Supports multiple input and output formats Scales to large volumes of events Temporal queries are written in SQL variant
  30. 30. And win a Lumia 635 Feedback form will be sent to you by email Give me (more) feedback
  31. 31. Follow Technet Belgium @technetbelux Subscribe to the TechNet newsletter aka.ms/benews Be the first to know
  32. 32. Thank you!
  33. 33. Belgiums biggest IT PRO Conference