microsoft azure big data analytics

82
Big Data Analytics in the Cloud Microsoft Azure Cortana Intelligence Suite Mark Kromer Microsoft Azure Cloud Data Architect @kromerbigdata @mssqldude

Upload: mark-kromer

Post on 16-Apr-2017

438 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Microsoft Azure Big Data Analytics

Big Data Analytics in the CloudMicrosoft Azure

Cortana Intelligence Suite

Mark KromerMicrosoft Azure Cloud Data Architect

@kromerbigdata@mssqldude

Page 2: Microsoft Azure Big Data Analytics

What is Big Data Analytics?Tech Target: “… the process of examining large data sets to uncover hidden patterns, unknown

correlations, market trends, customer preferences and other useful business information.”Techopedia: “… the strategy of analyzing large volumes of data, or big data. This big data is

gathered from a wide variety of sources, including social networks, videos, digital images, sensors, and sales transaction records. The aim in analyzing all this data is to uncover patterns and connections that might otherwise be invisible, and that might provide valuable insights about the users who created it. Through this insight, businesses may be able to gain an edge over their rivals and make superior business decisions.”

2

Requires lots of data wrangling and Data Engineers

Requires Data Scientists to uncover patterns from complex raw data

Requires Business Analysts to provide business value from multiple data sources

Requires additional tools and infrastructure not provided by traditional database and BI technologies

Why Cloud for Big Data Analytics?

• Quick and easy to stand-up new, large, big data architectures

• Elastic scale• Metered pricing• Quickly evolve architectures to rapidly changing

landscapes

Page 3: Microsoft Azure Big Data Analytics

Microsoft Azure Big Data Analytics

Cortana Intelligence SuiteAzure Data Platform-at-a-glance

Page 4: Microsoft Azure Big Data Analytics

Cortana Intelligence Suite

Action

People

Automated Systems

Apps

Web

Mobile

Bots

Intelligence

Dashboards & Visualizations

Cortana

Bot Framework

Cognitive Services

Power BI

Information Management

Event Hubs

Data Catalog

Data Factory

Machine Learning and Analytics

HDInsight (Hadoop and Spark)

Stream Analytics

Intelligence

Data Lake Analytics

Machine Learning

Big Data Stores

SQL Data Warehouse

Data Lake Store

Data Sources

Apps

Sensors and devices

Data

Page 5: Microsoft Azure Big Data Analytics

Microsoft AzureWhat it is:

When to use it:

Microsoft’s Cloud Platform including IaaS, PaaS and SaaS• Storage and Data• Networking• Security• Services• Virtual Machines• On-demand Resources and Services

Page 6: Microsoft Azure Big Data Analytics

Azure Data FactoryWhat it is:

When to use it:

A pipeline system to move data in, perform activities on data, move data around, and move data out

• Create solutions using multiple tools as a single process

• Orchestrate processes - Scheduling• Monitor and manage pipelines• Call and re-train Azure ML models

Page 7: Microsoft Azure Big Data Analytics

ADF Components

Page 8: Microsoft Azure Big Data Analytics

ADF Logical Flow

Page 9: Microsoft Azure Big Data Analytics

Example - Churn

Azure Blob Storage

Call Log Files

Customer Table

On Premises Data Mart

Call Log Files

Customer Table

Azure DB

Customer Churn Table

Act (Visualize)

Azure Data Factory:

Activity: a processing step (Hadoop job, custom code, ML model, etc)

Data Set(Collection of files, DB table, etc)

Pipeline: a logical group of activities

Data Sources

Customers Likely to

ChurnCustomer

Call Details

Analyze

MoveTransform, Combine, etc

Transform & Analyze PublishIngest

Page 10: Microsoft Azure Big Data Analytics

Simple ADF:• Business Goal: Transform and Analyze Web

Logs each month

• Design Process: Transform Raw Weblogs, using a Hive Query, storing the results in Blob Storage

Web Logs Loaded to Blob

Files ready for analysis and use in AzureML

HDInsight HIVE query to transform Log entries

Page 11: Microsoft Azure Big Data Analytics

PowerShell ADF Example1. Add-AzureAccount and enter the user name and

password2. Get-AzureSubscription to view all the subscriptions

for this account.3. Select-AzureSubscription to select the subscription

that you want to work with.4. Switch-AzureMode AzureResourceManager5. New-AzureResourceGroup -Name

ADFTutorialResourceGroup -Location "West US"6. New-AzureDataFactory -ResourceGroupName

ADFTutorialResourceGroup –Name DataFactory(your alias)Pipeline –Location "West US"

Page 12: Microsoft Azure Big Data Analytics

Using Visual Studio

• Use in mature dev environments • Use when integrated into larger

development process

Page 13: Microsoft Azure Big Data Analytics

SQL Data WarehouseWhat it is:

When to use it:

A Scaling Data Warehouse Service in the Cloud

• When you need a large-data BI solution in the cloud

• MPP SQL Server in the Cloud• Elastic scale data warehousing• When you need pause-able scale-out compute

Page 14: Microsoft Azure Big Data Analytics

Elastic scale & performanceReal-time elasticity

Resize in <1 minute

On-demand compute

Expandor reduceas needed

Pause Data Warehouse to Save on Compute Costs.

I.e. Pause during non-business hours

Page 15: Microsoft Azure Big Data Analytics

Storage can be as big or small as required

Users can execute niche workloads without re-scanning data

Elastic scale & performanceScale

Page 16: Microsoft Azure Big Data Analytics

Logical overview

ControlCo

mpu

teSt

orag

e

Page 17: Microsoft Azure Big Data Analytics

Distributed queriesQuer

y

ControlCo

mpu

teSt

orag

eResul

t

Page 18: Microsoft Azure Big Data Analytics

Simple ExampleSELECT COUNT_BIG(*)FROM dbo.[FactInternetSales];

SELECT COUNT_BIG(*)FROM dbo.[FactInternetSales];

SELECT COUNT_BIG(*)FROM dbo.[FactInternetSales];

SELECT COUNT_BIG(*)FROM dbo.[FactInternetSales];

SELECT COUNT_BIG(*)FROM dbo.[FactInternetSales];

SELECT COUNT_BIG(*)FROM dbo.[FactInternetSales];

Compute

Control

Page 19: Microsoft Azure Big Data Analytics

Data LakeWhat it is:

When to use it:

Data storage (Web-HDFS) and Distributed Data Processing (HIVE, Spark, HBase, Storm, U-SQL) Engines

• Low-cost, high-throughput data store• Non-relational data• Larger storage limits than Blobs

Page 20: Microsoft Azure Big Data Analytics

The Data Lake approach

Ingest all data regardless of requirements

Store all data in native format without schema definition

Do analysisUsing analytic engines like Hadoop and ADLA

Interactive queriesBatch queries

Machine LearningData warehouse

Real-time analytics

Devices

Page 21: Microsoft Azure Big Data Analytics

WebHDFS

YARN

U-SQL

ADL Analytics

ADL HDInsight

1

1

1

1

1

1 1

1

1

1

1

1

Store

HiveAnalytics

Storage

Azure Data Lake (Store, HDInsight, Analytics)

Page 22: Microsoft Azure Big Data Analytics

No limits to SCALE

Store ANY DATA in its native format

HADOOP FILE SYSTEM (HDFS) for the cloud

Optimized for analytic workload PERFORMANCE

ENTERPRISE GRADE authentication, access control, audit, encryption at rest

Azure Data Lake StoreA hyper scale repository for big data analytics workloads

Introducing ADLS

Page 23: Microsoft Azure Big Data Analytics

• No fixed limits on:• Amount of data stored• How long data can be stored• Number of files• Size of the individual files• Ingestion/egress throughput

Seamlessly scales from a few KBs to several PBs

No limits to scale

Page 24: Microsoft Azure Big Data Analytics

No limits to storage

24

• Each file in ADL Store is sliced into blocks

• Blocks are distributed across multiple data nodes in the backend storage system

• With sufficient number of backend storage data nodes, files of any size can be stored

• Backend storage runs in the Azure cloud which has virtually unlimited resources

• Metadata is stored about each fileNo limit to metadata either.

Azure Data Lake Store file

…Block 1 Block 2 Block 2

Backend Storage

Data node Data node Data node Data node Data nodeData node

Block

Block

Block

Block

Block

Block

Page 25: Microsoft Azure Big Data Analytics

Massive throughput

25

• Through read parallelism ADL Store provides massive throughput

• Each read operation on a ADL Store file results in multiple read operations executed in parallel against the backend storage data nodes

Read operation

Azure Data Lake Store file

…Block 1 Block 2 Block 2

Backend storage

Data node Data node Data node Data node Data nodeData node

Block

Block

Block

Block

Block

Block

Page 26: Microsoft Azure Big Data Analytics

Enterprise grade securityEnterprise-grade security permits even sensitive data to be stored securelyRegulatory compliance can be enforcedIntegrates with Azure Active Directory for authenticationData is encrypted at rest and in flightPOSIX-style permissions on files and directoriesAudit logs for all operations

26

Page 27: Microsoft Azure Big Data Analytics

Enterprise grade availability and reliability

27

• Azure maintains 3 replicas of each data object per region across three fault and upgrade domains

• Each create or append operation on a replica is replicated to other two

• Writes are committed to application only after all replicas are successfully updated

• Read operations can go againstany replica

• Provides ‘read-after-write’ consistency

Data is never lost or unavailableeven under failures

Replica 1

Replica 2 Replica 3

Fault/upgradedomains

Write

Repli

catio

n ReplicationCommit

Page 28: Microsoft Azure Big Data Analytics

Enterprise-grade

Limitless scale

Productivity from day one

Easy and powerful data preparation

All data

28

010010100100010101010010100100010101010010100100010101010010100100010101010010100100010101010010100100010101010010100100010101010010100100010101010010100100010101010010100100010101010010100100010101010010100100010101010010100100010101010010100100010101010010100100010101010010100100010101010010100100010101010010100

Azure Data Lake Analytics

Page 29: Microsoft Azure Big Data Analytics

Developing big data appsAuthor, debug, & optimize big data apps in Visual StudioMultiple LanguagesU-SQL, Hive, & PigSeamlessly integrate .NET

Page 30: Microsoft Azure Big Data Analytics

Work across all cloud data

Azure Data Lake Analytics

Azure SQL DW Azure SQL DB Azure Storage Blobs

Azure Data Lake Store

SQL DB in an Azure VM

Page 31: Microsoft Azure Big Data Analytics

Simplified management and administrationWeb-based management in Azure PortalAutomate tasks using PowerShellRole-based access control with Azure ADMonitor service operations and activity

Page 32: Microsoft Azure Big Data Analytics

What isU-SQL?

A hyper-scalable, highly extensible language for preparing, transforming and analyzing all dataAllows users to focus on the what—not the how—of business problemsBuilt on familiar languages (SQL and C#) and supported by a fully integrated development environmentBuilt for data developers & scientists

32

Page 33: Microsoft Azure Big Data Analytics

U-SQL language philosophyDeclarative query and transformation language:• Uses SQL’s SELECT FROM WHERE with GROUP

BY/aggregation, joins, SQL Analytics functions• Optimizable, scalable

Operates on unstructured & structured data• Schema on read over files• Relational metadata objects (e.g. database, table)

Extensible from ground up:• Type system is based on C#• Expression language is C#

21User-defined functions (U-SQL and C#)User-defined types (U-SQL/C#) (future)User-defined aggregators (C#)User-defined operators (UDO) (C#)

U-SQL provides the parallelization and scale-out framework for usercode• EXTRACTOR, OUTPUTTER, PROCESSOR, REDUCER,

COMBINERSExpression-flow programming style:• Easy to use functional lambda composition • Composable, globally optimizable

Federated query across distributed data sources (soon)

REFERENCE MyDB.MyAssembly;CREATE TABLE T( cid int, first_order DateTime

, last_order DateTime, order_count int, order_amount float );

@o = EXTRACT oid int, cid int, odate DateTime, amount floatFROM "/input/orders.txt“USING Extractors.Csv();

@c = EXTRACT cid int, name string, city stringFROM "/input/customers.txt“USING Extractors.Csv();

@j = SELECT c.cid, MIN(o.odate) AS firstorder, MAX(o.date) AS lastorder, COUNT(o.oid) AS ordercnt, SUM(c.amount) AS totalamountFROM @c AS c LEFT OUTER JOIN @o AS o ON c.cid == o.cidWHERE c.city.StartsWith("New")&& MyNamespace.MyFunction(o.odate) > 10GROUP BY c.cid;

OUTPUT @j TO "/output/result.txt"USING new MyData.Write();INSERT INTO T SELECT * FROM @j;

33

Page 34: Microsoft Azure Big Data Analytics

Automatic "in-lining" of SQLIP expressions – whole script leads to a single execution modelExecution plan that is optimized out-of-the-box and w/o user interventionPer-job and user-driven parallelizationDetail visibility into execution steps, for debuggingHeat map functionality to identify performance bottlenecks

Expression-flow programming style

010010

100100

010101

Page 35: Microsoft Azure Big Data Analytics

• Schema on Read• Write to File• Built-in and custom

Extractors and Outputters• ADL Storage and Azure Blob

Storage

“Unstructured” Files EXTRACT Expression@s = EXTRACT a string, b int FROM "filepath/file.csv"USING Extractors.Csv(encoding: Encoding.Unicode);

• Built-in Extractors: Csv, Tsv, Text with lots of options• Custom Extractors: e.g., JSON, XML, etc.

OUTPUT ExpressionOUTPUT @sTO "filepath/file.csv"USING Outputters.Csv();

• Built-in Outputters: Csv, Tsv, Text• Custom Outputters: e.g., JSON, XML, etc. (see http://usql.io)

Filepath URIs• Relative URI to default ADL Storage account: "filepath/file.csv"

• Absolute URIs:• ADLS:

"adl://account.azuredatalakestore.net/filepath/file.csv"• WASB: "wasb://container@account/filepath/file.csv"

Page 36: Microsoft Azure Big Data Analytics

• Create assemblies• Reference assemblies• Enumerate assemblies• Drop assemblies

• VisualStudio makes registration easy!

Managing Assemblies• CREATE ASSEMBLY db.assembly FROM @path;• CREATE ASSEMBLY db.assembly FROM byte[];

• Can also include additional resource files

• REFERENCE ASSEMBLY db.assembly;

• Referencing .Net Framework Assemblies• Always accessible system namespaces:

• U-SQL specific (e.g., for SQL.MAP)• All provided by system.dll system.core.dll

system.data.dll, System.Runtime.Serialization.dll, mscorelib.dll (e.g., System.Text, System.Text.RegularExpressions, System.Linq)

• Add all other .Net Framework Assemblies with:REFERENCE SYSTEM ASSEMBLY [System.XML];

• Enumerating Assemblies• Powershell command• U-SQL Studio Server Explorer

• DROP ASSEMBLY db.assembly;

Page 37: Microsoft Azure Big Data Analytics

USING clause 'USING' csharp_namespace | Alias '=' csharp_namespace_or_class.

Examples: DECLARE @ input string = "somejsonfile.json";

REFERENCE ASSEMBLY [Newtonsoft.Json];REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

USING Microsoft.Analytics.Samples.Formats.Json;

@data0 = EXTRACT IPAddresses string FROM @input USING new JsonExtractor("Devices[*]");

USING json = [Microsoft.Analytics.Samples.Formats.Json.JsonExtractor];

@data1 = EXTRACT IPAddresses string FROM @input USING new json("Devices[*]");

Allows shortening and disambiguating C# namespace and class names

Page 38: Microsoft Azure Big Data Analytics

• Simple Patterns• Virtual Columns• Only on EXTRACT for now

(On OUTPUT by end of year)

File Sets Simple pattern language on filename and path@pattern string = "/input/{date:yyyy}/{date:MM}/{date:dd}/{*}.{suffix}";

• Binds two columns date and suffix• Wildcards the filename• Limits on number of files

(Current limit 800 and 3000 being increased in next refresh)

Virtual columnsEXTRACT

name string, suffix string // virtual column, date DateTime // virtual column

FROM @patternUSING Extractors.Csv();

• Refer to virtual columns in query predicates to get partition elimination(otherwise you will get a warning)

Page 39: Microsoft Azure Big Data Analytics

• Naming• Discovery• Sharing• Securing

U-SQL Catalog Naming• Default Database and Schema context: master.dbo• Quote identifiers with []: [my table]• Stores data in ADL Storage /catalog folder

Discovery• Visual Studio Server Explorer• Azure Data Lake Analytics Portal• SDKs and Azure Powershell commands

Sharing• Within an Azure Data Lake Analytics account

Securing• Secured with AAD principals at catalog and Database

level

Page 40: Microsoft Azure Big Data Analytics

• Views for simple cases• TVFs for parameterization

and most cases

VIEWs and TVFs

Views

CREATE VIEW V AS EXTRACT…CREATE VIEW V AS SELECT …

• Cannot contain user-defined objects (e.g. UDF or UDOs)!• Will be inlined

Table-Valued Functions (TVFs)

CREATE FUNCTION F (@arg string = "default") RETURNS @res [TABLE ( … )] AS BEGIN … @res = … END;

• Provides parameterization• One or more results• Can contain multiple statements• Can contain user-code (needs assembly reference)• Will always be inlined • Infers schema or checks against specified return

schema

Page 41: Microsoft Azure Big Data Analytics

ProceduresCREATE PROCEDURE P (@arg string = "default“) ASBEGIN …; OUTPUT @res TO …; INSERT INTO T …;END;

• Provides parameterization• No result but writes into file or table• Can contain multiple statements• Can contain user-code (needs assembly reference)• Will always be inlined • Can contain DDL (but no CREATE, DROP

FUNCTION/PROCEDURE)

Allows encapsulation of U-SQL scripts

Page 42: Microsoft Azure Big Data Analytics

• CREATE TABLE• CREATE TABLE AS SELECT

Tables CREATE TABLE T (col1 int , col2 string , col3 SQL.MAP<string,string> , INDEX idx CLUSTERED (col2 ASC) PARTITION BY (col1) DISTRIBUTED BY HASH (driver_id) );

• Structured Data, built-in Data types only (no UDTs)• Clustered Index (needs to be specified): row-oriented• Fine-grained distribution (needs to be specified):

• HASH, DIRECT HASH, RANGE, ROUND ROBIN• Addressable Partitions (optional)

CREATE TABLE T (INDEX idx CLUSTERED …) AS SELECT …;CREATE TABLE T (INDEX idx CLUSTERED …) AS EXTRACT…;CREATE TABLE T (INDEX idx CLUSTERED …) AS myTVF(DEFAULT);

• Infer the schema from the query• Still requires index and distribution (does not support

partitioning)

Page 43: Microsoft Azure Big Data Analytics

When to use Tables Benefits of Table clustering and distribution• Faster lookup of data provided by distribution and

clustering when right distribution/cluster is chosen• Data distribution provides better localized scale out• Used for filters, joins and grouping

Benefits of Table partitioning• Provides data life cycle management (“expire” old

partitions)• Partial re-computation of data at partition level• Query predicates can provide partition elimination

Do not use when…• No filters, joins and grouping• No reuse of the data for future queries

Page 44: Microsoft Azure Big Data Analytics

• ALTER TABLE ADD/DROP COLUMN

Evolving TablesALTER TABLE T ADD COLUMN eventName string;

ALTER TABLE T DROP COLUMN col3;

ALTER TABLE T ADD COLUMN result string, clientId string, payload int?;

ALTER TABLE T DROP COLUMN clientId, result;

• Meta-data only operation• Existing rows will get

• Non-nullable types: C# data type default value (e.g., int will be 0)

• Nullable types: null

Page 45: Microsoft Azure Big Data Analytics

U-SQLAnalytics

Windowing Expression

Window_Function_Call 'OVER' '(' [ Over_Partition_By_Clause ]

[ Order_By_Clause ] [ Row _Clause ]')'.

Window_Function_Call :=Aggregate_Function_Call

| Analytic_Function_Call| Ranking_Function_Call.

Windowing Aggregate Functions

ANY_VALUE, AVG, COUNT, MAX, MIN, SUM, STDEV, STDEVP, VAR, VARP

Analytics Functions

CUME_DIST, FIRST_VALUE, LAST_VALUE, PERCENTILE_CONT, PERCENTILE_DISC, PERCENT_RANK, LEAD, LAG

Ranking Functions

DENSE_RANK, NTILE, RANK, ROW_NUMBER

Page 46: Microsoft Azure Big Data Analytics

12Expression-flow Programming Style

• Automatic "in-lining" of U-SQL expressions – whole script leads to a single execution model.

• Execution plan that is optimized out-of-the-box and w/o user intervention.

• Per job and user driven level of parallelization.

• Detail visibility into execution steps, for debugging.

• Heatmap like functionality to identify performance bottlenecks.

Page 47: Microsoft Azure Big Data Analytics

Visual Studio integration

Page 48: Microsoft Azure Big Data Analytics

What can you do with Visual Studio?

Visualize and replay progress

of job

Fine-tune query performance

Visualize physical plan of U-SQL

query

Browse metadata catalog

Author U-SQL scripts (with

C# code)

Create metadata objects

Submit and cancel U-SQL

Jobs

Debug U-SQL and C# code

48

Page 49: Microsoft Azure Big Data Analytics

Plug-in

Page 50: Microsoft Azure Big Data Analytics

Authoring U-SQL queriesVisual Studio fully supports authoring U-SQL scriptsWhile editing, it provides:

IntelliSenseSyntax color codingSyntax checking…

Contextual Menu

50

Page 51: Microsoft Azure Big Data Analytics

Authoring with code-behind fileC# code to extend U-SQL can be authored and used directly in U-SQL Studio, without having to first creating and registering an external assembly.

CustomProcessor

51

Page 52: Microsoft Azure Big Data Analytics

Submitting a U-SQL jobJobs can be submitted directly from Visual Studio in two waysYou have to be logged into Azure and have to specify the target Azure Data Lake account.

Page 53: Microsoft Azure Big Data Analytics

Concepts: jobs, stages and vertexesEach job is broken into ‘n’ number of verticesEach vertex is some work that needs to be done

Input

Output

Output

6 Stages8 Vertexes

Vertexes are organized into stages– Vertexes in each stage do the

same work on the same data– Vertex in one stage may depend

on a vertex in a earlier stageStages themselves are organized into an acyclic graph

53

Page 54: Microsoft Azure Big Data Analytics

Job execution graph After a job is submitted the progress of the execution of the job as it goes through the different stages is shown and updated continuouslyImportant stats about the job are also displayed and updated continuously

54

Page 55: Microsoft Azure Big Data Analytics

Job diagnosticsDiagnostics information is shown to help with debugging and performance issues

Page 56: Microsoft Azure Big Data Analytics

Metadata objectsADL Analytics creates and stores a set of metadata objects in a catalog maintained by a metadata serviceTables and TVFs are created by DDL statements(CREATE TABLE …)Metadata objects can be created directly through the Server Explorer

Azure Data Lake Analytics accountDatabases– Tables– Table valued functions– Jobs– SchemasLinked storage

Page 57: Microsoft Azure Big Data Analytics

Metadata catalogThe metadata catalog can be browsed with the Visual Studio Server Explorer

Server Explorer lets you:1. Create new tables,

schemas and databases2. Register assemblies

Page 58: Microsoft Azure Big Data Analytics

HDInsight: Cloud Managed Hadoop

What it is:

When to use it:

Microsoft’s implementation of apache Hadoop (as a service) that uses Blobs for persistent storage

• When you need to process large scale data (PB+)

• When you want to use Hadoop or Spark as a service

• When you want to compute data and retire the servers, but retain the results

• When your team is familiar with the Hadoop Zoo

Page 59: Microsoft Azure Big Data Analytics

Hadoop and HDInsight

Using the Hadoop Ecosystem to process and query data

Page 60: Microsoft Azure Big Data Analytics

Microsoft Azure Big Data Analytics

Cortana Intelligence SuiteHDInsight Tools for Visual Studio

Page 61: Microsoft Azure Big Data Analytics
Page 62: Microsoft Azure Big Data Analytics
Page 63: Microsoft Azure Big Data Analytics
Page 64: Microsoft Azure Big Data Analytics
Page 65: Microsoft Azure Big Data Analytics

Deploying HDInsight Clusters• Cluster Type: Hadoop, Spark, HBase and Storm.

• Hadoop clusters: for query and analysis workloads• HBase clusters: for NoSQL workloads• Spark clusters: for in-memory processing, interactive queries, stream, and machine learning workloads

• Operating System: Windows or Linux• Can be deployed from Azure portal, Azure

Command Line Interface (CLI), or Azure PowerShell and Visual Studio

• A UI dashboard is provided to the cluster through Ambari.

• Remote Access through SSH, REST API, ODBC, JDBC.• Remote Desktop (RDP) access for Windows clusters

Page 66: Microsoft Azure Big Data Analytics

Azure MLWhat it is:

When to use it:

A multi-platform environment and engine to create and deploy Machine Learning models and API’s

• When you need to create predictive analytics• When you need to share Data Science

experiments across teams• When you need to create call-able API’s for ML

functions• When you also have R and Python experience on

your Data Science team

Page 67: Microsoft Azure Big Data Analytics

The Azure ML EnvironmentDevelopment Environment• Creating Experiments• Sharing a Workspace

Deployment Environment• Publishing the Model• Using the API• Consuming in various tools

Page 68: Microsoft Azure Big Data Analytics

Creating an Experiment

Get/Prepare Data

Build/Edit Experiment

Create/Update Model

Evaluate Model Results

Build and ModelCreateWorkspace

Deploy Model

Consume Model

Page 69: Microsoft Azure Big Data Analytics

Basic Azure ML Elements

Import Data

Preprocess

Algorithm

Train Model

Split Data

Score Model

Page 70: Microsoft Azure Big Data Analytics
Page 71: Microsoft Azure Big Data Analytics
Page 72: Microsoft Azure Big Data Analytics

Power BIWhat it is:

When to use it:

Interactive Report and Visualization creation for computing and mobile platforms

• When you need to create and view interactive reports that combine multiple datasets

• When you need to embed reporting into an application

• When you need customizable visualizations• When you need to create shared datasets,

reports, and dashboards that you publish to your team

Page 73: Microsoft Azure Big Data Analytics

Microsoft Azure Big Data Analytics

Cortana Intelligence SuiteCommon architectural patterns

Page 74: Microsoft Azure Big Data Analytics

Big Data Analytics – Data Flow

DATA

Business apps

Custom apps

Sensors and devices

INTELLIGENCE ACTION

People

Preparation, Analytics and Machine Learning

Azure Data Lake Store

Ingestion

Bulk Ingestion

Event Ingestion

Discovery

Azure Data Catalog

Visualization

Power BI

HDInsight Data Lake Analytics

Page 75: Microsoft Azure Big Data Analytics

Event Ingestion Patterns

Business apps

Custom apps

Sensors and devices

Events Events

Azure Data Lake Store

Transformed Data

Real Time Dashboards

Power BI

Raw Events

Azure Event Hubs

Kafka

Event Collection

Azure Stream Analytics

Spark Streaming

Stream Processing

Page 76: Microsoft Azure Big Data Analytics

Bulk Ingestion and Preparation

Business apps

Custom apps

Sensors and devices

Azure Data Lake Store

Prepared Data (Structured)

Raw DataBulk Load

Azure Data Factory

Prepared Data (Unstructured)

Data Preparation

Batch Analytics

Interactive Analytics

Power BI Notebooks

Spark on HDInsight

Azure SQL DW

Azure Data Catalog

Page 77: Microsoft Azure Big Data Analytics

Data Transformati

on

Data Collection

Presentation and action

Queuing System

Data Storage

Big Data Lambda Architecture78

Azure Search

Data analytics (Excel, Power BI, Looker, Tableau)

Web/thick client dashboards

Devices to take actionEvent hub

Event & data producers

Applications

Web and social

Devices

Live Dashboards

DocumentDBMongoDBSQL AzureADWHbaseBlob StorageKafka/RabbitMQ/

ActiveMQ

Event hubs Azure ML

Storm / Stream Analytics

Hive / U-SQL

Data Factory

Sensors

Pig

Cloud gateways(web APIs)

Field gateways

Page 78: Microsoft Azure Big Data Analytics

Get started today!

http://aka.ms/cisolutions 79

Cortana Intelligence Solutions

Page 79: Microsoft Azure Big Data Analytics

Cortana Intelligence Solutions: Discover

http://aka.ms/cisolutions

Page 80: Microsoft Azure Big Data Analytics

Cortana Intelligence Solutions: Try

Page 81: Microsoft Azure Big Data Analytics

Cortana Intelligence Solutions: Deploy

Page 82: Microsoft Azure Big Data Analytics

Instructions and Next Steps: Customize