microsoft azure virtual training day: delivering the ... delivering t… · modern data warehousing...

102
Microsoft Azure Virtual Training Day: Delivering the modern data warehouse

Upload: others

Post on 08-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Microsoft Azure

Virtual Training Day:

Delivering the modern data

warehouse

Page 2: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Delivering a modern data warehouse

Nicholas Moore

Cloud Solutions Architect

Page 3: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Agenda

Page 4: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Why modern data warehousing?

Page 5: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

91%of business leaders see Digital Transformation as a

way of sparking innovation and finding efficiencies

68%say Digital Transformation is increasing profits

85%say they must offer digital services or

become irrelevant

64%

say they have less than 4 years to

complete a Digital Transformation

or they may go out of business

transformationDigital

Page 6: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Common challenges with on-premises solutions

Data Silos Performance Constraints Solution Complexity Escalating Costs

Page 7: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Derive real value from your data in the cloud

One hub for all data

Unlimited data scale

Common Platform Lower TCO

Page 8: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Common customer use cases

Modern data warehouse

“Integrate all our data—including Big

Data—with our data warehouse for

analytics and reporting”

Real-time analytics

“Derive insights from our devices

and data streams in real-time”

Advanced analytics

“Predict next best offer and

customer churn”

Page 9: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Building the modern data warehouse

Page 10: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Modern data warehouse patterns

Modern data warehouse

“Integrate all our data—including Big

Data—with our data warehouse for

analytics and reporting”

Real-time analytics

“Derive insights from our devices

and data streams in real-time”

Advanced analytics

“Predict next best offer and

customer churn”

Page 11: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Modern data warehousing patterns

Model & Serve VisualizeIngest & Prep

Store

Logs (unstructured)

Media (unstructured)

Files (unstructured)

Business/custom apps (structured)

Code-free data transformation and

ingestion from 90+ data integration

connectors

Leader in the Magic Quadrant for Business

Intelligence and Analytics Platforms*

Up to 14x faster and costs 94% less

than other cloud providers

High performance data lake

available in all 54 Azure regions

Power BIAzure Synapse Analytics(Data Warehouse)

Azure Data Factory

Azure Data Lake Storage

Up to 10x faster than vanilla Spark

Azure Databricks(Data prep)

Page 12: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Ingest and Prep

Model & Serve VisualizeIngest & Prep

Store

Logs (unstructured)

Media (unstructured)

Files (unstructured)

Business/custom apps (structured)

Code-free data transformation and

ingestion from 90+ data integration

connectors

Leader in the Magic Quadrant for Business

Intelligence and Analytics Platforms*

Up to 14x faster and costs 94% less

than other cloud providers

High performance data lake

available in all 54 Azure regions

Power BIAzure Synapse Analytics(Data Warehouse)

Azure Data Factory

Azure Data Lake Storage

Up to 10x faster than vanilla Spark

Azure Databricks(Data prep)

Page 13: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Model and Serve

Model & Serve VisualizeIngest & Prep

Store

Logs (unstructured)

Media (unstructured)

Files (unstructured)

Business/custom apps (structured)

Code-free data transformation and

ingestion from 90+ data integration

connectors

Leader in the Magic Quadrant for Business

Intelligence and Analytics Platforms*

Up to 14x faster and costs 94% less

than other cloud providers

High performance data lake

available in all 54 Azure regions

Power BIAzure Synapse Analytics(Data Warehouse)

Azure Data Factory

Azure Data Lake Storage

Up to 10x faster than vanilla Spark

Azure Databricks(Data prep)

Page 14: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Visualize

Model & Serve VisualizeIngest & Prep

Store

Logs (unstructured)

Media (unstructured)

Files (unstructured)

Business/custom apps (structured)

Code-free data transformation and

ingestion from 90+ data integration

connectors

Leader in the Magic Quadrant for Business

Intelligence and Analytics Platforms*

Up to 14x faster and costs 94% less

than other cloud providers

High performance data lake

available in all 54 Azure regions

Power BIAzure Synapse Analytics(Data Warehouse)

Azure Data Factory

Azure Data Lake Storage

Up to 10x faster than vanilla Spark

Azure Databricks(Data prep)

Page 15: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Advanced Analytics patterns

Page 16: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

The evolving world of Analytics

Descriptive Diagnostic Predictive

Prescriptive Cognitive

Page 17: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Advanced Analytics patterns

Modern data warehouse

“Integrate all our data—including Big

Data—with our data warehouse for

analytics and reporting”

Real-time analytics

“Derive insights from our devices

and data streams in real-time”

Advanced analytics

“Predict next best offer and

customer churn”

Page 18: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Advanced Analytics patterns

Model & Serve VisualizeIngest & Prep

Store

Logs (unstructured)

Media (unstructured)

Files (unstructured)

Business/custom apps (structured)

Up to 10x faster than vanilla Spark

Code-free data transformation and

ingestion from 90+ data integration

connectors

Leader in the Magic Quadrant for Business

Intelligence and Analytics Platforms*

Up to 14x faster and costs 94% less than other

cloud providers

High performance data lake

available in all 54 Azure regions

Power BI

Azure Synapse Analytics(Data Warehouse)

Azure Databricks(Data prep)

Azure Data Factory

Azure Data Lake Storage

Train

Up to 10x faster than vanilla Spark

Azure Databricks(Machine Learning)

Page 19: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Real-time analytics patterns

Modern data warehouse

“Integrate all our data—including Big

Data—with our data warehouse for

analytics and reporting”

Real-time analytics

“Derive insights from our devices

and data streams in real-time”

Advanced analytics

“Predict next best offer and

customer churn”

Page 20: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Streaming

Streaming

Real-time analytics patterns

Model & Serve VisualizeIngest & Prep

Store

Logs (unstructured)

Media (unstructured)

Files (unstructured)

Business/custom apps (structured)

Up to 10x faster than vanilla Spark

Code-free data transformation and

ingestion from 90+ data integration

connectors

Leader in the Magic Quadrant for Business

Intelligence and Analytics Platforms*

Up to 14x faster and costs 94% less than other

cloud providers

High performance data lake

available in all 54 Azure regions

Power BI

Azure Synapse Analytics(Data Warehouse)

Azure Databricks(Data prep)

Azure Data Factory

Azure Data Lake Storage

Train

Up to 10x faster than vanilla Spark

Azure Databricks(Machine Learning)

Sensors and IoT

Azure Event Hub

Page 21: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Demo:

Modern Data Warehousing

and Cloud Scale Analytics

Page 22: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

The evolution of Cloud Scale Analytics

Page 23: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

QUERY

SQL

On Demand Spark

Analytics Runtimes

Synapse Studio

Unified experience

Integration Management Monitoring SecurityCloud data

SaaS data

On-premises data

Devices data

Power BI

Synapse

PREVIEW

AnalyticsAzureLimitless data warehouse with unmatched time to insights

STORE

Azure Data Lake Storage

Azure Machine Learning

SQL Provisioned

Page 24: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Azure Synapse Analytics

Powerful

insights

Unmatched

security

Unified

experience

Limitless

scale

Page 25: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Ingesting data for analytics workloads

Nicholas Moore

Cloud Solutions Architect

Page 26: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Agenda

Page 27: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

What is Azure Data Factory?

Page 28: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

A cloud-based data integration service that allows

you to orchestrate and automate

data movement and data transformation.

Azure Data Factory

Page 29: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

MonitorPublishTransform

& EnrichConnect & Collect

Azure Data Factory process

Page 30: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

CFControl

Flow

IRIntegration

Runtime

@ Parameters

Pipeline

Activities

Triggers

Dataset

Azure Databricks

Data Lake Store

Linked Service

Azure Data Factory Components

Page 31: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Ingesting data

Page 32: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Data ingestion

Load flat files

into data lake

on a schedule

Data storage

Transactional storage

Applications

manage their

transactional

data directly

Data preparation

Read data from files

using DBFS

Extract and

transform

relational data

Load into SQL

DW tables

Data prep.

Serving

Load processed data

into tables optimized

for analytics

Azure Storage/

Data Lake Store

Azure Synapse Analytics

Azure Databricks

Azure Data

Factory

SQL DBAzure Data

Factory

Logs, files, and media

(unstructured)

Business and custom

apps (structured)

Power BI

Dashboards

Applications

Visualize

Connect & Collect

ADF Copy Activity

Data transformation in Azure

Page 33: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Reads data from a source data store.

Performs serialization/deserialization, compression/decompression, column

mapping, and so on. It performs these operations based on the configuration

of the input dataset, output dataset, and Copy activity.

Writes data to the sink/destination data store

Copy Activity process

Page 34: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

SQL Server

Self-hosted

Integration Runtime

IR

Azure Integration

Runtime

IR

Integration Runtime

Page 35: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Supported file formats:

Text

JSON

Avro

ORC

Parquet

Copy activity can compress

and decompress files with

The following codecs:

Gzip

Deflate

Bzip2

ZipDeflate

Copy files with the Copy Activity

Page 36: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Transforming and enriching data

Nicholas Moore

Cloud Solutions Architect

Page 37: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

What is Azure Data Factory?

Page 38: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

A cloud-based data integration service that

allows you to orchestrate and automate

data movement and data transformation.

Azure Data Factory

Page 39: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

MonitorPublishTransform

& EnrichConnect & Collect

Azure Data Factory process

Page 40: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

CFControl

Flow

IRIntegration

Runtime

@ Parameters

Pipeline

Activities

Triggers

Dataset

Azure Databricks

Data Lake Store

Linked Service

Azure Data Factory Components

Page 41: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Component dependencies

Page 42: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Transforming data with

the ADF Mapping Data Flow

Page 43: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Transform

& Enrich

Data ingestion

Load flat files

into data lake

on a schedule

Data storage

Transactional storage

Applications

manage their

transactional

data directly

Data preparation

Read data from files

using DBFS

Extract and

transform

relational data

Load into SQL

DW tables

Data prep.

Serving

Load processed data

into tables optimized

for analytics

Azure Storage/

Data Lake Store

Azure Synapse Analytics

Azure Databricks

Azure Data

Factory

SQL DBAzure Data

Factory

Logs, files, and media

(unstructured)

Business and custom

apps (structured)

Power BI

Dashboards

Applications

Visualize

Data transformation in Azure

Page 44: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Mapping Data Flow

SSIS PackagesCompute resources

Methods for transforming in Azure Data Factory

Page 45: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Mapping Data Flow

Compute resources

SSIS Packages

Code free data transformation at scale

Methods for transforming data in Azure Data Factory

Page 46: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Mapping Data Flow

Perform data cleansing, transformation,

aggregations, etc.

Enables you to build resilient data flows in a code

free environment

Enable you to focus on building business logic and

data transformation

Underlying infrastructure is provisioned

automatically with cloud scale via Spark execution

Code free data transformation at scale

Benefits of Mapping Data Flow

Page 47: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Code free data transformation at scale

Using the Mapping Data Flow

Page 48: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Code free data transformation at scale

Starting the Mapping Data Flow

Page 49: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Transformation options in the Mapping Data Flow

Page 50: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Triggering and monitoring

Page 51: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Code free data transformation at scale

Triggering the Mapping Data Flow

Page 52: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Demo:

Transforming your data

in Azure Data Factory

Page 53: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

© Microsoft Corporation

In Summary:

Transforming

Data with

Azure Data

Factory

Page 54: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Demo:

Ingesting data with

Azure Data Factory

Page 55: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Monitoring data ingestion

Page 56: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

© Microsoft Corporation

Pipeline runsActivity runs

Monitoring

Page 57: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

© Microsoft Corporation

In Summary:

Ingesting

Data with

Azure Data

Factory

Page 58: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Data Loading Best Practices

Luis Silva

Cloud Solution Architect, Data Platform

Page 59: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Agenda

Page 60: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

What is Azure Synapse Analytics?

Page 61: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Azure Synapse Analytics

A limitless analytics service with

unmatched time to insight, that

delivers insights from all your data,

across data warehouses and big data

analytics systems, with blazing speed

Page 62: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Workload

Management

Separate

Storage/Compute

Pause/Resume

Big DataElastic ScalePaaS

Azure Synapse Analytics

Page 63: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Data Warehouse Architecture

Compute Node

Compute Node

01101010101010101011

01010111010101010110

01101010101010101011

01010111010101010110

Compute Node

Compute Node

01101010101010101011

01010111010101010110

01101010101010101011

01010111010101010110

Compute Node

Compute Node

01101010101010101011

01010111010101010110

01101010101010101011

01010111010101010110Control Node

Page 64: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Control

SQL Pool Scaling DW2500c (5 compute nodes)

Compute

Dist_DB_1

Dist_DB_2

Dist_DB_12

Storage

Queries

Engine

DMS

SQL DB

DMS

SQL DB

Compute

Dist_DB_13

Dist_DB_14

Dist_DB_24

DMS

SQL DB

Compute

Dist_DB_25

Dist_DB_26

Dist_DB_36

DMS

SQL DB

Compute

Dist_DB_37

Dist_DB_38

Dist_DB_48

DMS

SQL DB

Compute

Dist_DB_49

Dist_DB_50

Dist_DB_60

DMS

SQL DB

Page 65: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

SQL Pool Scaling DW5000c (10 compute nodes)

Compute

Dist_DB_1

Dist_DB_2

Dist_DB_6

Queries Control

Engine

DMS

SQL DB

DMS

SQL DB

Compute

Dist_DB_7

Dist_DB_8

Dist_DB_12

DMS

SQL DB

Compute

Dist_DB_13

Dist_DB_14

Dist_DB_18

DMS

SQL DB

Compute

Dist_DB_19

Dist_DB_20

Dist_DB_24

DMS

SQL DB

Compute

Dist_DB_25

Dist_DB_26

Dist_DB_30

DMS

SQL DB

Compute

Dist_DB_31

Dist_DB32

Dist_DB_26

DMS

SQL DB

Compute

Dist_DB_37

Dist_DB_38

Dist_DB_42

DMS

SQL DB

Compute

Dist_DB_43

Dist_DB_44

Dist_DB_48

DMS

SQL DB

…Compute

Dist_DB_49

Dist_DB_50

Dist_DB_54

DMS

SQL DB

Compute

Dist_DB_55

Dist_DB_56

Dist_DB_60

DMS

SQL DB

Storage

Page 66: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

QueryLoadProvision

Data Warehouse Processes

Page 67: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Loading design goals

Page 68: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Load data efficiently

Load Data non-obtrusively, respecting

concurrent queries and loads

Reduce table fragmentation as much as

possible

Provide system recovery capabilities in the

event of data load failure with minimal

impact on concurrent queries

Multiple methods of loading

Loading design goals

Page 69: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Data ingestion

Load flat files

into data lake

on a schedule

Data storage

Transactional storage

Applications

manage their

transactional

data directly

Data preparation

Read data from files

using DBFS

Extract and

transform

relational data

Load into SQL

DW tables

Data prep.

Serving

Load processed data

into tables optimized

for analytics

Azure Storage/

Data Lake Store

Azure Synapse Analytics

Azure Databricks

Azure Data

Factory

SQL DBAzure Data

Factory

Power BI

Dashboards

Logs, files, and media

(unstructured)

Applications

Business and custom

apps (structured)

Visualize

Loading data into a data warehouse in Azure Synapse Analytics

Data warehousing loading in Azure

Page 70: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

File based

PolyBase

Heterogenous

SSIS

File based

BCP

Loading Methods

Page 71: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Variety of file formats

PolyBase supports a

variety of file formats

including RC, ORC

and Gzip files.

Azure Data Factory support

Azure Data Factory also

supports PolyBase loads

and can achieve similar

performance to running

PolyBase manually

Leverages MPP architecture

PolyBase is designed to

leverage the MPP

(Massively Parallel

Processing) architecture of

Azure Synapse Analytics

and will therefore load and

export data magnitudes

faster than any other tool.

Best practices for loading large amount of data

PolyBase benefits

Page 72: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

External Tables

External File Format

External Data Source

Components of PolyBase

Page 73: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Loading best practices

Page 74: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Manage your files

Compute Node

Compute Node

01101010101010101011

01010111010101010110

01101010101010101011

01010111010101010110

Compute Node

Compute Node

01101010101010101011

01010111010101010110

01101010101010101011

01010111010101010110

Compute Node

Compute Node

01101010101010101011

01010111010101010110

01101010101010101011

01010111010101010110Control Node

Page 75: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Compute Node

Compute Node

01101010101010101011

01010111010101010110

01101010101010101011

01010111010101010110

Compute Node

Compute Node

01101010101010101011

01010111010101010110

01101010101010101011

01010111010101010110

Compute Node

Compute Node

01101010101010101011

01010111010101010110

01101010101010101011

01010111010101010110Control Node

Reduce concurrent access

Page 76: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Compute Node

Compute Node

01101010101010101011

01010111010101010110

01101010101010101011

01010111010101010110

Compute Node

Compute Node

01101010101010101011

01010111010101010110

01101010101010101011

01010111010101010110

Compute Node

Compute Node

01101010101010101011

01010111010101010110

01101010101010101011

01010111010101010110Control Node

Create a dedicated load user account

Page 77: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Manage singleton updates

Compute Node

Compute Node

01101010101010101011

01010111010101010110

01101010101010101011

01010111010101010110

Compute Node

Compute Node

01101010101010101011

01010111010101010110

01101010101010101011

01010111010101010110

Compute Node

Compute Node

01101010101010101011

01010111010101010110

01101010101010101011

01010111010101010110Control Node

Page 78: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Production Tables

Staging Tables

Azure Synapse Analytics

Load into

SQL DW

tablesAzure Storage/

Data Lake Store

Staging data, a 2 step process

Optimize your loads

Page 79: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Improve the query performance for users

Create statistics after loading

Azure Synapse Anlytics

Production Tables

Page 80: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Demo:

Loading data into

Azure Synapse Analytics Data Warehouse

Page 81: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Optimizing data warehousing query performance

Luis Silva

Cloud Solution Architect, Data Platform

Page 82: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Agenda

Page 83: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

What is Azure Synapse Analytics?

Page 84: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Azure Synapse Analytics

A limitless analytics service with

unmatched time to insight, that

delivers insights from all your data,

across data warehouses and big data

analytics systems, with blazing speed

Page 85: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Data Warehouse Architecture

Compute Node

Compute Node

01101010101010101011

01010111010101010110

01101010101010101011

01010111010101010110

Compute Node

Compute Node

01101010101010101011

01010111010101010110

01101010101010101011

01010111010101010110

Compute Node

Compute Node

01101010101010101011

01010111010101010110

01101010101010101011

01010111010101010110Control Node

Page 86: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

QueryLoadProvision

Data Warehouse Processes

Page 87: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Data ingestion

Load flat files

into data lake

on a schedule

Data storage

Transactional storage

Applications

manage their

transactional

data directly

Data preparation

Read data from files

using DBFS

Extract and

transform

relational data

Load into SQL

DW tables

Data prep.

Serving

Load processed data

into tables optimized

for analytics

Azure Storage/

Data Lake Store

Azure Synapse Analytics

Azure Databricks

Azure Data

Factory

SQL DBAzure Data

Factory

Power BI

Dashboards

Logs, files, and media

(unstructured)

Applications

Business and custom

apps (structured)

Visualize

Data warehouse performance in Azure Synapse Analytics

Page 88: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Maximizing Performance

Page 89: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Maximizing Query Performance

Replicated Tables

Round Robin Tables

Hash Distributed Tables

Page 90: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Maximizing Query Performance

Round-robin Tables

Is the default option for newly created tables

Evenly distributes the data across the available

compute nodes in a random manner, giving an

even distribution of data across all nodes

Loading into Round-robin tables is fast

Queries on Round-robin tables may require

more data movement as data is “reshuffled” to

organize the data for the query

Great to use for loading staging tables

Page 91: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Maximizing Query Performance

Hash Distributed Tables

Distributes rows based on the value in the

distribution column, using a deterministic hash

function to assign each row to one distribution.

Is designed to achieve high performance for

queries that run against large fact tables in a star

schema.

Choosing a good distribution column is

important to ensure the hash distribution

performs well

As a starting point, use on tables that are greater

than 2GB in size and has frequent inserts, updates

and deleted

But don’t choose a volatile column for the hash

distributed column

Page 92: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Maximizing Query Performance

Replicated Tables

A full copy of a table is placed on every single

compute node to minimize data movement

Works well for dimension tables in a star

schema that are less than 2GB in size and are

used regularly in queries with simple

predicates

Should not be used on dimension tables that

are updated on a regular basis

You can convert existing round-robin tables to

replicated tables to take advantage of the

feature using a CTAS statement

Page 93: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Create statistics after loading

Azure Synapse Analytics

Production Tables

Page 94: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Demo:

Creating distributed tables

Page 95: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Query Performance Tuning

Page 96: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

© Microsoft Corporation

Query Data Store

Overcomes the 10,000-row limit of DMV’s output

Pinpoint and fix queries with plan regression

• View queries which produce multiple plans

• 7-day retention period

• Full query text

A/B Testing with your Azure Synapse Analytics (SQL

DW)

Identify, improve and tune ad hoc queries

• Top hitting queries for performance tuning

Page 97: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Query Data Store

VIEW DATABASE STATE

permission

DMVs are in UTC time

zone

Page 98: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Query execution with Query Data Store

Flush to disc every 15 minutes seconds

10GB is the max storage size

Retention period is 7 days

Maximum plans per query is 200

Queries

Compute

Dist_DB_1

Dist_DB_2

Dist_DB_15

DMS

SQL DB

Compute

Dist_DB_15

Dist_DB_16

Dist_DB_30

DMS

SQL DB

Compute

Dist_DB_31

Dist_DB_32

Dist_DB_45

DMS

SQL DB

Compute

Dist_DB_46

Dist_DB_47

Dist_DB_60

DMS

SQL DB

Engine

DMSShell DB

CONTROL

QDS

Page 99: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Azure Synapse Analytics recommendations

Data skew +

Replicate tables

Stats

Tempdb

Azure Advisor Recommendation Blade

Recommendation generation (every 24 hours)

RecommendationAPITelemetry

Adaptive Cache

Azure Synapse Analytics

Page 100: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

© Microsoft Corporation

In Summary:

Query Performance

Select the proper table distribution

Detect data skew

• Use Query Data store

• Consider changing key columns

• Only as fast as your slowest distribution

Provision additional adaptive cache capacity

Reduce tempdb contention

Create and update statistics

Page 101: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Demo:

Query Performance Tuning

Page 102: Microsoft Azure Virtual Training Day: Delivering the ... Delivering t… · Modern data warehousing patterns Ingest & Prep Model & Serve Visualize Store Logs (unstructured) Media

Thank you!