azure data factory presentation with links

33
Cortana Intelligence Suite Workshop Class Notebook Classified as Microsoft General Key Concepts This session is brought to you by Microsoft’s Analytics and Data Science Team. 1

Upload: chris-testa-oneill

Post on 21-Apr-2017

79 views

Category:

Data & Analytics


5 download

TRANSCRIPT

Page 1: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Classified as Microsoft General

Key Concepts

This session is brought to you by Microsoft’s Analytics and Data Science Team.

1

Page 2: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

1. Understand how Azure Data Factory (ADF) fits into the Cortana Intelligence Suite

2. Understand the ADF logical flow

3. Create an ADF instance

4. An example of the ADF process

5. Understand and create the ADF components

Agenda

At the end of this Module, you will:

1. Understand how Azure Data Factory (ADF) fits into the Cortana Intelligence Suite

2. Understand the ADF logical flow

3. Create an ADF instance

2

Page 3: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

4. An example of the ADF process

5. Understand and create the ADF components

2

Page 4: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

This section of the course will cover:

• Cortana Intelligence in a sentence• The team data science process• The Cortana Intelligence platform• Summarizing Cortana Intelligence

3

Page 5: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Cortana Intelligence is a Platformand a Process to perform advanced analytics from start to finish

1. What you can do with CIS: https://www.microsoft.com/en-us/server-cloud/cortana-intelligence-suite/why-cortana-intelligence.aspx

2. More about the process: https://channel9.msdn.com/Blogs/Seth-Juarez/Understanding-Data-Science-for-building-Predictive-Analytics-Solutions-by-Francesca-Lazzeri

4

Page 6: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

For all of the technology that is available in Cortana Intelligence, they can be categorized into the following areas:

• Information management• Big data stores• Machine learning and analytics• Intelligence• Dashboards and visualization

Azure SQL Data Warehouse is categorized as a big data store. It is different to Data Lake in that it provides a relational big data store for structured data, but it does have the capability to interact with unstructured data as well.

5

Page 7: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

This section of the course will cover:

• Cortana Intelligence in a sentence• The team data science process• The Cortana Intelligence platform• Summarizing Cortana Intelligence

6

Page 8: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Azure Data Factory

Creates, orchestrates, & automates the movement, transformation and/or analysis of data through the cloud

1. Learning Path: https://azure.microsoft.com/en-us/documentation/articles/data-factory-introduction/

2. Developer Reference: https://msdn.microsoft.com/en-us/library/azure/dn834987.aspx

7

Page 9: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Azure Data Factory Logical Flow

1. Learning Path: https://azure.microsoft.com/en-us/documentation/articles/data-factory-introduction/

2. Quick Example: http://azure.microsoft.com/blog/2015/04/24/azure-data-factory-update-simplified-sample-deployment/

8

Page 10: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

This section of the course will cover:

• Cortana Intelligence in a sentence• The team data science process• The Cortana Intelligence platform• Summarizing Cortana Intelligence

9

Page 11: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Create the Data Factory

AzurePortal

PowerShell

Visual Studio

ARM Templates

1. Setting Up: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline/

10

Page 12: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Using the Portal

• Use in Non-MS Clients• Use for Exploration• Use when in demo/POC

1. Overview: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline/

2. Using the Portal: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline-using-editor/

11

Page 13: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Using PowerShell

• Use in MS Clients

• Use for Automation

• Use for quick set up and tear down

1. Learning Path: https://azure.microsoft.com/en-us/documentation/articles/data-factory-introduction/

2. Full Tutorial: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline/

12

Page 14: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Using Visual Studio

• Use in mature dev environments• Use when integrated into larger development process

1. Overview: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline/

2. Using the Portal: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline-using-editor/

13

Page 15: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Azure Resource Manager Templates

• Use in multiple environment

• Dev, Test, UAT and Production

• Works well where there are similar patterns

• ARM templates can be parameterized.

https://docs.microsoft.com/en-us/azure/data-factory/data-factory-how-to-use-resource-manager-templates

14

Page 16: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Create an ADF Instance

1. Open the ADF Student Workbook file from your \Resources folder

2. Follow the steps for Lab 1 to setup the lab environment

3. The follow the steps for Lab 2 to setup Azure Data Factory

4. Note – There’s a useful JSON prettifier here: http://www.jsoneditoronline.org/

15

Page 17: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

This section of the course will cover:

• Cortana Intelligence in a sentence• The team data science process• The Cortana Intelligence platform• Summarizing Cortana Intelligence

16

Page 18: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

ADF Process

1. Define Architecture: Set up objectives and flow2. Create the Data Factory: Portal, PowerShell, VS3. Create Linked Services: Connections to Data and

Services4. Create Datasets: Input and Output5. Create Pipeline: Define Activities6. Monitor and Manage: Portal or PowerShell, Alerts

and Metrics

1. Full Tutorial: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline/

17

Page 19: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Example - Churn

Call Log Files

Customer Table

Call Log Files

Customer Table

Customer Churn Table

Azure Data

Factory:

Data Sources

Customers Likely to Churn

Customer Call Details

Transform & Analyze PublishIngest

1. Video of this process: https://azure.microsoft.com/en-us/documentation/videos/azure-data-factory-102-analyzing-complex-churn-models-with-azure-data-factory/

18

Page 20: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

This section of the course will cover:

• Cortana Intelligence in a sentence• The team data science process• The Cortana Intelligence platform• Summarizing Cortana Intelligence

19

Page 21: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Azure Data Factory Components

1. ADF Components: https://docs.microsoft.com/en-gb/azure/data-factory/data-factory-introduction#relationship-between-data-factory-entities

20

Page 22: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Linked ServicesCompute resource

Data transformation activity

Compute environment

Hive HDInsight [Hadoop]

Pig HDInsight [Hadoop]

MapReduce HDInsight [Hadoop]

Hadoop Streaming HDInsight [Hadoop]

Machine Learning activities: Batch Execution and Update Resource

Azure VM

Stored ProcedureAzure SQL, Azure SQL DW, or SQL Server

Data Lake Analytics U-SQL Azure Data Lake Analytics

DotNetHDInsight [Hadoop] or Azure Batch

Category Data storeSupported as a source

Supported as a sink

Azure Azure Blob storage ✓ ✓

Azure Data Lake Store

✓ ✓

Azure DocumentDB

✓ ✓

Azure SQL Database

✓ ✓

Azure SQL Data Warehouse

✓ ✓

Azure Search Index

Azure Table storage

✓ ✓

Databases Amazon Redshift ✓

DB2 ✓

MySQL ✓

Oracle ✓ ✓

PostgreSQL ✓

SAP Business Warehouse

SAP HANA ✓

SQL Server ✓ ✓

Sybase ✓

Teradata ✓

Other data sources are support. see the link in the notes for full details

Data Sources

AZURE SQL DATABASE EXAMPLE{"name": "AzureSqlLinkedService","properties": {"type": "AzureSqlDatabase","typeProperties": {"connectionString": "Server=tcp:ctosqldb.database.windows.net,1433;Database=EquityDB;User ID=ctesta-

oneill;Password=P@ssw0rd;Trusted_Connection=False;Encrypt=True;Connection Timeout=30"}

}}

AZURE BLOB STORE EXAMPLE{"name": "StorageLinkedService","properties": {"type": "AzureStorage","typeProperties": {"connectionString":

"DefaultEndpointsProtocol=https;AccountName=ctostorageaccount;AccountKey=087ubp097guh8*JON*&B*(97g9879"}

}}

1. Linked Services: https://docs.microsoft.com/en-us/azure/data-factory/data-factory-introduction#linked-services

21

Page 23: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Datasets{

"name": "<name of dataset>",Dataset name

"properties": {

Properties"type": "<type of dataset: AzureBlob, AzureSql etc...>","external": <boolean flag to indicate external data. only for input datasets>,"linkedServiceName": "<Name of the linked service that refers to a data store.>",

Type

External

LinkedServiceName

"structure": [{

"name": "<Name of the column>","type": "<Name of the type>"

}],"typeProperties": {

"<type specific property>": "<value>","<type specific property 2>": "<value 2>",

},Structure

Name

Type

"availability": {"frequency": "<Specifies the time unit for data slice production. >","interval": "<Specifies the interval within the defined frequency.>"

},

Availability "policy":{ }

}}

Policy

AzureSqlLinkedService

StorageLinkedService

1. Datasets: https://docs.microsoft.com/en-us/azure/data-factory/data-factory-create-datasets

22

Page 24: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Time Slicing Data"availability": {

"frequency": "<Specifies the time unit for data slice production. >","interval": "<Specifies the interval within the defined frequency.>"

},

Availability

Offset

"availability":{

"frequency": "Day","interval": 1,"offset": "06:00:00"

}

anchorDateTime

"availability": {

"frequency": "Hour", "interval": 23, "anchorDateTime":"2007-04-19T08:00:00"

}

{"name": "AzureBlobOutput",

"properties": {"published": false,"type": "AzureBlob","linkedServiceName":

"AzureStorageLinkedService","typeProperties": {"folderPath": "datacontainer/partitioneddata","format": {"type": "TextFormat","columnDelimiter": ","

}},"availability": {"frequency": "Month","interval": 1

}}

}

Style

"availability":{

"frequency": "Day","interval": 1,"offset": "06:00:00“"style": “EndOfInterval”

}

{"name": "AzureBlobInput",

"properties": {"published": false,"type": "AzureBlob","linkedServiceName": "StorageLinkedService","typeProperties": {"fileName": "input.log","folderPath": "datacontainer/inputdata","format": {"type": "TextFormat","columnDelimiter": ","

}},"availability": {"frequency": "Month","interval": 1

},"external": true,"policy": {}

}}

1. Time Slicing: https://docs.microsoft.com/en-us/azure/data-factory/data-factory-create-datasets

23

Page 25: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Linked Services and Datasets

1. Open the ADF Student Workbook file from your \Resources folder

2. Follow the steps for Lab 1 to setup the lab environment

3. The follow the steps for Lab 2 to setup Azure Data Factory

4. Note – There’s a useful JSON prettifier here: http://www.jsoneditoronline.org/

24

Page 26: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Activities

Data transformation activities

Data transformation activity

Compute environment

Hive HDInsight [Hadoop]

Pig HDInsight [Hadoop]

MapReduce HDInsight [Hadoop]

Hadoop Streaming HDInsight [Hadoop]

Machine Learning activities: Batch Execution and Update Resource

Azure VM

Stored ProcedureAzure SQL, Azure SQL DW, or SQL Server

Data Lake Analytics U-SQL Azure Data Lake Analytics

DotNetHDInsight [Hadoop] or Azure Batch

Data movement activities

{"name": "MyFirstPipeline","properties": {

"description": "My first Azure Data Factory pipeline","activities": [

{"type": "HDInsightHive","typeProperties": {

"scriptPath": "adfgetstarted/script/partitionweblogs.hql","scriptLinkedService": "StorageLinkedService","defines": {

"inputtable": "wasb://[email protected]/inputdata","partitionedtable": "wasb://[email protected]/partitioneddata"

}},"inputs": [

{"name": "AzureBlobInput"

}],"outputs": [

{"name": "AzureBlobOutput"

}],"policy": {

"concurrency": 1,"retry": 3

},"scheduler": {

"frequency": "Month","interval": 1

},"name": "RunSampleHiveActivity","linkedServiceName": "HDInsightOnDemandLinkedService"

}],"start": "2016-04-01T00:00:00Z","end": "2016-04-02T00:00:00Z","isPaused": false,"hubName": "ctogetstarteddf_hub","pipelineMode": "Scheduled"

}}

1. What is an activity: https://docs.microsoft.com/en-gb/azure/data-factory/data-factory-create-pipelines#what-is-an-activity

25

Page 27: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Pipelines

Pipeline is a grouping of logically related activities.

Pipeline can be scheduled so the activities within it get executed.

Pipeline can be managed and monitored.

1. Pipelines: https://docs.microsoft.com/en-gb/azure/data-factory/data-factory-create-pipelines

26

Page 28: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Activities and Pipelines

27

Page 29: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Classified as Microsoft General

28

Page 30: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Classified as Microsoft General

ADF orchestrates other tech to move, transform or analyze data

Broad range of options to create an ADF instance

Linked Services can point to data sources or compute resource

Datasets can be structures or unstructured

Activities can transform and analyse data sets

Pipelines are used to schedule and monitor ADF pipelines

Summary

In this session, you have learned:

• Scale-out distributed query engine• De-coupled storage from compute• Fully managed• Completely elastic• Platform as a Service (PaaS)• Petabyte scale• Leveraging cloud ecosystem• Broad range of connectivity options

29

Page 31: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Classified as Microsoft General

Click on the graphics to explore more learning options from your Advanced Analytics and Data Science team, including:

• Online training

• Videos

• Instructor Led training

• Blogs

• Cortana Intelligence Gallery

30

Page 32: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Classified as Microsoft General

31

Page 33: Azure Data Factory presentation with links

Cortana Intelligence Suite Workshop Class Notebook

Classified as Microsoft General

Information in this document, including URL and other Internet Web site references, is subject to change without notice. Unless otherwise noted, the companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, e-mail address, logo, person, place, or event is intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

For more information, see Microsoft Copyright Permissions at http://www.microsoft.com/permission

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.The Microsoft company name and Microsoft products mentioned herein may be either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

This document reflects current views and assumptions as of the date of development and is subject to change. Actual and future results and trends may differ materially from any forward-looking statements. Microsoft assumes no responsibility for errors or omissions in the materials.

THIS DOCUMENT IS FOR INFORMATIONAL AND TRAINING PURPOSES ONLY AND IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, WHETHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT.

32