building big data solutions on azure

43
© Copyright SELA Software & Education Labs Ltd. | 14-18 Baruch Hirsch St Bnei Brak, 51202 Israel | www.selagroup.com Eyal Ben Ivri Building Big Data Solutions on Azure

Upload: eyal-ben-ivri

Post on 16-Apr-2017

553 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Building big data solutions on azure

© Copyright SELA Software & Education Labs Ltd. | 14-18 Baruch Hirsch St Bnei Brak, 51202 Israel | www.selagroup.com

Eyal Ben Ivri

Building Big Data Solutions on Azure

Page 2: Building big data solutions on azure

About meEyal Ben IvriBig Data & Cloud Architect, Sela GroupFocus On Hadoop Eco-System & Big-Data + NoSQL Solutions

Page 3: Building big data solutions on azure

Modern Data – The Big PictureIoT

User Data

Media Files

Documents

Machine Data

Log Files

Page 4: Building big data solutions on azure
Page 5: Building big data solutions on azure

The Light Rail problem – TLV Railway

Imagine the new light Rail maintenance company

IoT – Internet of Trains (and cameras, and cash registers and carts and rails and more…)Analyze data in stream and in batchDashboardsAlertsThe perfect problem

Page 6: Building big data solutions on azure

What We NeedAn integrated data solution that will be:

Able to process events from external sourcesAble to walk data through different pipelinesFast and responsiveBig-Data Ready

Page 7: Building big data solutions on azure

In Other Words

Consume

BI Dashboards Applications

ProcessETL Aggregations Computation Analysis Querying

PersistHadoop SQL NoSQL

IngestIoT Structured Data Un-Structured Data

Page 8: Building big data solutions on azure

Microsoft Azure Services forIoT and BigData

Devices Device Connectivity Storage Analytics Presentation & Action

Event Hubs SQL Database Machine Learning App Service

Service Bus Table/Blob Storage Stream Analytics Power BI

External Data Sources DocumentDB HDInsight Notification

Hubs

Data Lake Store Data Factory Mobile Services

External Data Sources Data Lake

AnalyticsBizTalk Services

{ }

Page 9: Building big data solutions on azure

Microsoft Azure Services forIoT and BigData

Devices Device Connectivity Storage Analytics Presentation & Action

Event Hubs SQL Database Machine Learning App Service

Service Bus Table/Blob Storage Stream Analytics Power BI

External Data Sources DocumentDB HDInsight Notification

Hubs

Data Lake Store Data Factory Mobile Services

External Data Sources Data Lake

AnalyticsBizTalk Services

{ }

Page 10: Building big data solutions on azure

Event HubMessages at scaleWhy not throw it into a queue, and have a listener at the backend?

Scaling limits, because of the architecture of queues and topics of a standard Service BusEvent Hub uses a partition model

Page 11: Building big data solutions on azure

Getting StartedEasy to set up

Two ConfigurationsPartition Count – Depend on the number of consumers (2-32)Message Retention (days) – between 1 and 7 days

Secured using SAS Policies

Page 12: Building big data solutions on azure

Field Gateway

Device Connectivity & Management

IoT with Event HubsDe

vices

RTOS

, Linu

x, W

indow

s, An

droid

, iOS

Cloud GatewayEvent Hubs

Field Gateway

Protocol Adaptation

Event Hubs• High scale telemetry ingestion service• HTTP/AMQP protocol support• Each Event Hub supports

• 1 million publishers• 1GB/s ingress

• Generally available worldwide• 18 Billion messages per day• 60+ TB ingested per day

Page 13: Building big data solutions on azure

Field Gateway

Device Connectivity & Management

Analytics & Operationalized Insights

IoT & Data Processing PatternsDe

vices

RTOS

, Linu

x, W

indow

s, An

droid

, iOS

Protocol Adaptation

Batch Analytics & VisualizationsAzure HDInsight, AzureML, Power BI, Azure Data Factory

Hot Path AnalyticsAzure Stream Analytics, Azure HDInsight Storm

Hot Path Business LogicService Fabric & Actor Framework

Cloud GatewayEvent Hubs&IoT Hub

Field Gateway

Protocol Adaptation

Find insights to• Power new services• Improve your “things”

Operationalize your insights in real time

IoT Scale Object Models & Business Logic

Page 14: Building big data solutions on azure

TLV RailwayCan now ingest millions of messages each secondThese messages carry data from:

DevicesEnd-MachinesServers

Next, we need to use this data to create real-time alerts when something goes wrong

Page 15: Building big data solutions on azure

Azure Stream AnalyticsMission critical reliability and scale

Enables rapid development

Fully managed real-time analytics

Automatic recoveryMonitoring and alertingScale on demand

Managed Cloud ServiceEach unit handles 1MB/sCan scale up to 1GB/s

SQL like languagetemporal windowing semanticssupport for reference data

Page 16: Building big data solutions on azure

Stream Analytics – Main Concepts

InputsCan be stream or reference data (metadata)Stream Data sources can be Event Hub, Blob Storage (using blobs with timestamps) or IoT Hub (preview)Serialization types support CSV, JSON, and Avro

QueryA SQL query to that will select from input(s) and dump results to output(s)

OutputCan be Blob, SQL, Event Hub (notification), Power BI (preview), Table storage, Service Bus or DocumentDB

Page 17: Building big data solutions on azure

Tumbling WindowsHow many trains entered each station every 5 minutes?

SELECT TrainId, COUNT(*) FROM EntryStream GROUP BY TrainId, TumblingWindow(minute,5)

Page 18: Building big data solutions on azure

Temporal WindowsTumbling Window

A series of fixed-sized, non-overlapping and contiguous time intervals

Hopping WindowScheduled overlapping windows

Sliding WindowOutputs events only for those points in time when the content of the window actually changes

Page 19: Building big data solutions on azure

TLV RailwayCan now respond in near-real-time to events as they happenTrack and maintain malfunctioning equipmentReceive real time data regarding customers entering and leaving stationsData can now be processed, so we need a place to save it, preferably at scale.

Page 20: Building big data solutions on azure

DocumentDB and Azure Data Services

fully managed, scalable, queryable, schema free JSON document database service for modern applications

fully featured RDBMStransactional processing

rich query managed as a service

elastic scale

internet accessible http/rest

schema-free data model

arbitrary data formats

Page 21: Building big data solutions on azure

DocumentDB featuresJSON DocumentsSQL support Linq SupportREST API SupportJS Support (triggers, UDFs, stored procedures)Automatic IndexMultiple Document TransactionsTunable Consistency

Page 22: Building big data solutions on azure

DocumentDB Key ConceptCollection

A collection of DocumentsNot a table (different entities can go into the same collection)Collections = PartitionsNot just logical containers, but physical ones

Page 23: Building big data solutions on azure

Demo

TLV Railway – Part 1

Page 24: Building big data solutions on azure

TLV RailwayCan now store it’s data in a highly scalable storeGreat for interactive querying of any data

Messages from sensorsReference Data

But this data (and other data) needs to move to other places (SQL, Batch processing, ML). How?

Page 25: Building big data solutions on azure

What is Azure Data Factory?

Azure Data Factory is a managed service to produce trusted information from data stored in the cloud and on-premises. Easily create, orchestrate and schedule highly-available, fault tolerant work flows to move and transform your data at scale.

Page 26: Building big data solutions on azure

Evolving Approaches to Analytics

ETL Tool(SSIS, etc)

EDW(SQL Svr, Teradata, etc)

Extract

Original Data

Load

Transformed Data

Transform

OLTP

ERP LOB

…BI Tools

Devices

Web

Sensors

SocialIngestOriginal Data

Scale-out Storage & Compute

(HDFS, Blob Storage, etc)

Transform & Load

Data MartsData

Lake(s)Dashboar

dsApps

Streaming data

Page 27: Building big data solutions on azure

Data Factory – Main conceptsData Store

A data source/sink componentSQL (Azure or On-Premise), Storage, DocumentDB and more)

Data SetA defined data set that is contained inside a data storeOne data store can have many data sets

ComputeA service for computationHDInsight, Azure Batch, Data Lake Analytics, Azure ML

Page 28: Building big data solutions on azure

Data Factory – Main conceptsPipeline

Set of instructions“Take data from data set A and move to compute, then store results in data set B”

SlicesEverything is time slicedA data set (source) can declare on what time intervals the data can be sliced, and the pipeline will be activated when a new slice is ready

JSON

Page 29: Building big data solutions on azure
Page 30: Building big data solutions on azure

Microsoft Azure Services forIoT and BigData

Devices Device Connectivity Storage Analytics Presentation & Action

Event Hubs SQL Database Machine Learning App Service

Service Bus Table/Blob Storage Stream Analytics Power BI

External Data Sources DocumentDB HDInsight Notification

Hubs

Data Lake Store Data Factory Mobile Services

External Data Sources Data Lake

AnalyticsBizTalk Services

{ }

Page 31: Building big data solutions on azure

Microsoft Azure Services forIoT and BigData

Devices Device Connectivity Storage Analytics Presentation & Action

Event Hubs SQL Database Machine Learning App Service

Service Bus Table/Blob Storage Stream Analytics Power BI

External Data Sources DocumentDB HDInsight Notification

Hubs

Data Lake Store Data Factory Mobile Services

External Data Sources Data Lake

AnalyticsBizTalk Services

{ }

Page 32: Building big data solutions on azure

TLV RailwayCan now integrate different services and different data sourcesMove data with ease and as little hassle as possibleWhat about aggregations, deeper dive into data, for more complex analysis?

Page 33: Building big data solutions on azure
Page 34: Building big data solutions on azure

HDInsightHadoop-as-a-ServiceBased on the Hortonworks distributionFew flavors:

Hadoop (Windows + Linux)Storm (Windows + Linux)HBase (Windows + Linux)Spark (Windows + Linux)

Page 35: Building big data solutions on azure

Data size

Access

Updates

Structure

Integrity

Scaling

Hadoop vs. Relational DB

Page 36: Building big data solutions on azure

Demo

TLV Railway – Part 2

Page 37: Building big data solutions on azure

TLV Railway - SummaryCan now perform advanced analytics on top of large amounts of data, in a variety of formats (not just structured, boring data)Can integrate all the loose ends of data coming in, with data generated in ”Old-School” data platforms like SQL that is collected from Line-of-Business applicationsWe’ve covered data ingestion, responding in real-time, querying, storing and processingAzure Stack

Page 38: Building big data solutions on azure

Hadoop and OSS vs.Azure IoT and BigData Ecosystem

Azure Ecosystem OSS

Event Hubs Kafka

Stream Analytics Storm

HDInsight Hadoop

Map Reduce Map Reduce

Hive Hive

Spark Spark

HBase HBase

Azure ML Mahout

Data Factory Pig

DocumentDB MongoDB / Couchbase

Page 39: Building big data solutions on azure

Data Lake (preview)

Page 40: Building big data solutions on azure

Is “TLV Railway” fake?

Page 41: Building big data solutions on azure

London did it first

Page 42: Building big data solutions on azure

SummaryPresentation and action

Storage andBatch Analysis

StreamAnalysis

IngestionCollectionEvent production

Event hubs

Cloud gateways(web APIs)

Field gateways

Applications

Legacy IOT (custom protocols)

Devices

IP-capable devices(Windows/Linux)

Low-power devices (RTOS)

Search and query

Data analytics(Power BI)

Web/thick client dashboards

SQL DB

DocumentDBPower BI

Storage

Stream Analytics

Devices to take action

MachineLearning

DataFactory

Get started today at http://azure.microsoft.com

HDInsight

Page 43: Building big data solutions on azure

Questions