deep-dive into sql server polybasesqlkonferenz.de/files/1_2_1115_deep dive into polybase.pdf ·...

42
Deep-Dive into SQL Server Polybase Gerhard Brueckl

Upload: others

Post on 21-May-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Deep-Dive into SQL Server PolybaseGerhard Brueckl

Page 2: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

About Me

Gerhard Brückl

From Austria

Working with Microsoft Data Platform since 2006

Mainly focused on Analytics and Reporting

Big Data / IoT

Microsoft Azure

[email protected]@gbrueckl blog.gbrueckl.at

www.pmone.com

Page 3: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Deep-Dive into SQL Server Polybase

Big Data

• What is it?

• Why is it relevant for you?

Polybase

• Introduction

• Setup

• Using it

Scenarios

Staging / Archive

Import / Export

Performance

Query Processing

Predicate Pushdown

Azure SQL DW

Page 4: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Big Data

Page 5: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Big Data

The way we generate data is changing

ERP

CRM

Sales

Customer Product Date Amount

Cust1 ProdA Oct-06 100€

Cust2 ProdB Oct-01 50€

CustN ProdZ Oct-01 … €

Social

MediaSensors

LogsDigital

Media

Ma

chin

e G

ene

rate

d

New

kin

ds

of D

ata

Page 6: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Big Data

The way we store data is changing

HDD/SSD

RaidHDD/SSD Array

SAN / NAS Multi-Node Cluster

Page 7: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Big Data

The way we process data is changing

Multi-Node Cluster

Single CoreMulti Core

NUMA

CPU1 CPUx

Page 8: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Big Data and the classic DWH

SAP ERP CRMExternal

Systems

Data Warehouse

Data Integration / ETL

Transactional Data

SQL Server RDBMS

SQL Server Integration Services

SQL Server Reporting ServicesReporting Analysis

Combining the “old” data with the “new” data

Page 9: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Big Data and the classic DWH

Reporting Analysis

Data Integration / ETL

Sensors Web LogsDigital

Media

Social

Media

Machine generated Semi-structured Data *Transactional Data

Data Warehouse

Advanced Analytics

SQL Server 2016

SSIS / Polybase

SSRS / Power BI

* Requires further

processing

SAP ERP CRMExternal

Systems

Combining the “old” data with the “new” data

Page 10: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Polybase

Page 11: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

What is Polybase

“Polybase is a Technology which allows us to access data

which resides in a distributed file system (like HDFS)

in a traditional way using regular SQL commands.”

Analytical Platform System (APS)

Separate Storage and Compute

Scalability

Page 12: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

SQL Server 2016

PolyBase

Engine

PolyBase DMS

Page 13: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

SQL Server Scale-Out Group

PolyBase

Engine

Polybase

DMS

Head Node

PolyBase

Engine

Polybase

DMS

PolyBase

Engine

Polybase

DMS

PolyBase

Engine

Polybase

DMS

Page 14: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Supported Sources

File Systems

• Hadoop Distributed File System (HDFS)

• Hortonworks Data Platform (HDP)

• Cloudera (CDH)

• Azure Blog Store

• Azure Data Lake Store !NEW!

File Formats

Delimited Text (CSV, TSV, …) UTF 8 / UTF 16

ORC

RC

Parquet

MORE TO COME!

Page 15: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Setup

Page 16: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Setting Up Polybase

Page 17: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

PolyBase Internal Databases

Page 18: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Configure Hadoop Connectivity

Value Description

0 Disable Hadoop connectivity

1HDP 1.3 on Windows

Azure Blob Store

2 HDP 1.3 on Linux

3 CDH 4.3 on Linux

4HDP 2.0 on Windows

Azure Blob Store

5 HDP 2.0 on Linux

6 CHD 5.1+ on Linux

7

(default)

HDP 2.1+ on Linux

HDP 2.1+ on Windows

Azure Blob Store

--Configure Hadoop connectivity sp_configure

@configname = 'hadoop connectivity',@configvalue = { 0 - 7 }

[;]

Different Sources require different settings

Defines binaries/JARs which are loaded

Page 19: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

External Objects

External Table

External File FormatExternal Data Source

Table View

SQL Query

Credential

Page 20: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

MasterKey and Credential

MasterKey

• Encrypt sensitive Information

Credential

• Access to DataSource

MSDN

-- Create a Master Key using my own password.CREATE MASTER KEY ENCRYPTION BYPASSWORD = 'Pass@word1!';

-- Create database CredentialCREATE DATABASE SCOPED CREDENTIAL CRED_AzureStorageWITHIDENTITY = 'gbdomaindata', --> Storage AccountSECRET = '<AccessKey>'; --> AccessKey

Page 21: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Reference to external Storage Service

• Type

• Azure Blob Store

• Azure Data Lake Analytics

• Hadoop / HDFS

• Hortonworks

• Cloudera

• Location

External Data Sources

MSDN

-- Create the External Data Source using CredentialCREATE EXTERNAL DATA SOURCE AzureStorageWITH (

TYPE = HADOOP,LOCATION ='wasbs://<container>@<name>.blob.core.windows.net',CREDENTIAL = CRED_AzureStorage

);

Page 22: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

External File Format

Definition of File Format

Format Type

• Delimited Text

• ORC

• RCFile

• Parquet

Data Compression

Format Options

SerDe

MSDN

-- Create External File FormatCREATE EXTERNAL FILE FORMAT CSV_QuotedWITH (

FORMAT_TYPE = DELIMITEDTEXT,FORMAT_OPTIONS (

FIELD_TERMINATOR = ',',STRING_DELIMITER = '"',USE_TYPE_DEFAULT = FALSE

));

Page 23: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

External Tables

Definition of external data

• Columns

• Data Type

• Reject Settings

External Data Source

• Location

External File Format

MSDN

-- Create the External Table CREATE EXTERNAL TABLE [azure].[DimCurrency] (CurrencyKey int NOT NULL,CurrencyAlternateKey nchar(3) NOT NULL,CurrencyName nvarchar(50) NOT NULL)WITH (LOCATION='/AdventureWorksDW2012/DimCurrency',

DATA_SOURCE = AzureStorage,FILE_FORMAT = CSV_Quoted,REJECT_TYPE = VALUE,REJECT_VALUE = 0

);

Page 24: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Working with External Tables

SELECT

• Export: INSERT / CETAS

Full SQL Syntax

• Joins

• Group By

• Aggregation

SELECTdim.[ProductGroup],SUM(ext.[Revenue]) AS TotalRevenue

FROM [dwh].[DimProduct] dimINNER JOIN [ext].[Sales] ext

ON dim.[ProductKey]= ext.[ProductKey]

WHERE ext.[Quantity] > 10GROUP BY dim.[ProductGroup]HAVING SUM(ext.[Revenue] > 100

Page 25: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Scenarios

Page 26: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Scenarios

Staging

Archive

Import / Export

SQL Interface for HDFS

Page 27: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Polybase for Staging

SQL Server 2016

Stage

Table

DWH

External

Table

Page 28: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Polybase as Staging Archive

ETL

SQL Server 2016

Stage

Table

DWH

External

Table

Page 29: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Polybase as Archive

ETL

SQL Server 2016

Stage

Table

DWH

External

Table

Page 30: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Polybase for Import / Export

SQL Server 2016

DWHData Processing

Page 31: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Polybase as SQL Interface

SQL Server 2016

External

Table

External

Table

S

Q

L

Page 32: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Performance

Page 33: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Processing of a Query

CREATE Temp-Table

• Only required columns

Add Extended Properties

UPDATE STATISTICS

All Nodes

All Nodes

All Nodes

Page 34: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Processing of a Query

Run regular SQL Query on Temp-Table

Return intermediate result to HeadNode

DROP Temp-Table

All Nodes

All Nodes

All Nodes

Page 35: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Processing of a Query

Combine results of all Nodes

Single Node Processing

Return results to Client

Head Node

Head Node

Head Node

Page 36: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Predicate Push-Down

HDFS only!

Resource Manager link set up

Starts a MapReduce Job

• Data Volume

• Statistics on Table

• Forced in Query

OPTION(FORCE EXTERNALPUSHDOWN);

OPTION(DISABLE EXTERNALPUSHDOWN);

Page 37: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Predicates – HDFS only!

Push-able

• Select subset of columns

• Arithmetic Operators

• Comparison Operators

• Logical Operators

• Unary Operators

* Partially Push-able

Non-Push-able

• Joins

• Complex Calculations

• Group By *

• Aggregations *

Page 38: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

PolyBase

DMS

PolyBase

DMS

PolyBase

DMS

Namenode

(HDFS)

FSFS FSFS

Head Node

PolyBase

DMS

PolyBase

Engine

SQL Server 2016

Hadoop Cluster

8 [External] Workers per Node

Query Distribution!

Page 39: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

File Layout

One Big File vs. Many Small Files

Compressed vs. Uncompressed Files

File Format

Align with Number of Readers !!!

No Partitioning yet !!!

Page 40: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

File Format – CSV

Proper Date-Formats

• [Date] = yyyy-MM-dd

• [Time] = hh:mm:ss

• [DateTime] = {Date} {Time}

• Only one format for all Date*-datatypes!

UTF-8 and UTF-16 only!

No support for Header Rows

No Linebreaks in Text

Page 41: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI

Azure SQL Data Warehouse

APS/PDW in the cloud

• Pause/Stop Compute

• Persisted Storage

• Always 60 Distributions for Storage

• Flexible number of ComputeNodes

Number

of:

DWU

100 200 300 400 500 600 1,000 1,200 1,500 2,000

Compute

Nodes1 2 3 4 5 6 10 12 15 20

Readers 8 16 24 32 40 48 80 96 120 160

Writers 60 60 60 60 60 60 80 96 120 160

Page 42: Deep-Dive into SQL Server Polybasesqlkonferenz.de/files/1_2_1115_Deep Dive into Polybase.pdf · 2019-10-08 · Polybase •Introduction •Setup ... SSIS / Polybase SSRS / Power BI