get started with microsoft sql polybase

49
SQL Server 2016 PolyBase Henk van der Valk Oct.15, 2016 Level: Beginner www.Henkvandervalk.com

Upload: henk-van-der-valk

Post on 13-Apr-2017

204 views

Category:

Documents


0 download

TRANSCRIPT

PowerPoint Presentation

SQL Server 2016 PolyBase

Henk van der Valk

Oct.15, 2016

Level: Beginnerwww.Henkvandervalk.com

http://www.sqlsaturday.com/551/Sessions/Schedule.aspx

SQL PolyBase has been an high-end feature for SQL APS and now also introduced in SQL2016, SQL DB and SQLDW! It allows you to use regular T-SQL statements to ad-hoc access data stored in Hadoop and/or Azure Blob Storage from within SQL Server. This session will show you how it works & how to get started!

1

Starting SQL2016 on a server with 24 TB RAM

Just 4 fun!

Microsoft Worldwide Partner Conference 2016 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.15/10/16 14:082

Thanks to our platinum sponsors :PASS SQL Saturday Holland - 20163 |

Please add this slide add the start of your presentation after the first welcome slide

3

Thanks to our gold and silver sponsors :PASS SQL Saturday Holland - 20164 |

APS Onsite!

Please add this slide add the start of your presentation after the first welcome slide4

Speaker Introduction10+ years active in SQLPass community!10 years of Unisys-EMEA Performance Center2002- Largest SQL DWH in the world (SQL2000)Project Real (SQL 2005)

ETL WR - loading 1TB within 30 mins (SQL 2008)Contributor to SQL performance whitepapersPerf Tips & tricks: www.henkvandervalk.com

Schuberg Philis- 100% uptime for mission critical appsSince april 1st, 2011 Microsoft Data Platform !

All info represents my own personal opinion (based upon my own experience) and not that of Microsoft@HenkvanderValk

AgendaIntro - What is PolyBase & Why?

Getting started - SQL Server product versions supported- Installation & Setup

Creating External Tables, Running hybrid queries

Monitoring- Tips to improve Hadoop performance

Scale out Groups

6

SQL Server 2016 as fraud detection scoring engine

https://blogs.technet.microsoft.com/machinelearning/2016/09/22/predictions-at-the-speed-of-data/

HTAP (Hybrid Transactional Analytical Processing)

8 socket, 192 cores16 TB RAM

The Big Data lake Challenge

How to orchestrate?

Different types of dataWebpages, logs, and clicksHardware and software sensorsSemi-structured/unstructured dataLarge scaleHundreds of serversAdvanced data analysisIntegration between structured and unstructured dataPower of both

PolyBase builds the BridgeRDBMSHadoopPolyBaseAccess any data

Azure Blob StorageJust-in-Time data integration Across relational and non-relational dataFast, simple data loading Best of both worldsT-SQL compatible Uses computational power at source Opportunity for new types of analysis

15/10/16 13:369

2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

PolyBase View in SQL Server 2016PolyBase ViewExecute T-SQL queries against relational data in SQL Server and semi-structured data in HDFS and/or AzureLeverage existing T-SQL skills and BI tools to gain insights from different data storesExpand the reach of SQL Server to Hadoop(HDFS & WASB)

SQL ServerHadoopAzure Blob Storage

QueryResultsAccess any data

10

Remove the complexity of big dataT-SQL over HadoopJSON support

PolyBase T-SQL query

SQL Server

Hadoop

Quote:

**************************************************************************************************************** $658.39

Simple T-SQL to query Hadoop data (HDFS)Manage structured & unstructured data NameDOBStateDenny Usher11/13/58WAGina Burch04/29/76WA

NEWNEWNEW

Server & Tools Business 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.15/10/1611

PolyBase use casesLoad data

Use Hadoop as an ETL tool to cleanse data before loading to data warehouse with PolyBase

Interactively query

Analyze relational data with semi-structured data using split-based query processing

Age-out data

Age-out data to HDFS and use it as cold but queryable storage

Access any data

15/10/16 12:1112

2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Polybase - Turning raw data tweets into information

Query & Store Hadoop data Bi-directional seamless & fast

Azure Blob Storage

Setup & QuerySQL Server 2016 & SQL DW Polybase!

#Demo

BCP out vs RTC 16

Prerequisites

An instance of SQL Server (64-bit) Ent.Ed. / Developer Ed..Microsoft .NET Framework 4.5.Oracle Java SE RunTime Environment (JRE) version 7.51 or higher (64-bit). (Either JRE or Server JRE will work). Go to Java SE downloads.

Note:The installer will fail if JRE is not present.

Minimum memory: 4GBMinimum hard disk space: 2GBTCP/IP connectivity must be enabled.

Step 2: Install SQL Server

SQL16PolyBaseDLLs

SQL16PolyBaseDLLs

SQL16PolyBaseDLLs

SQL16PolyBaseDLLsInstall one or more SQL Server instances with PolyBase

PolyBase DLLs (Engine and DMS) are installed and registered as Windows Services

Prerequisite: User must download and install JRE (Oracle)

Access any data

15/10/16 14:2118

2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Components introduced in SQL Server 2016PolyBase Engine ServicePolyBase Data Movement Service (with HDFS Bridge)External table constructsMR pushdown computation support

Access any data

How to use PolyBase in SQL Server 2016Set up a Hadoop Cluster or Azure Storage blob

Install SQL Server

Configure a PolyBase group

- Choose Hadoop flavor- Attach Hadoop Cluster or Azure StoragePolyBase T-SQL queries submitted herePolyBase queries can only refer to tables here and/or external tables here

Compute nodes

Head nodes

Access any dataHadoopCluster

Step 1: Set up a Hadoop Cluster

Hortonworks or Cloudera DistributionsHadoop 2.0 or above Linux or WindowsOn-premises or in AzureAccess any dataHadoopCluster

15/10/16 12:1121

2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Step 1: Or set up an Azure Storage blobAzure Storage blob (ASB) exposes an HDFS layerPolyBase reads and writes from ASB using Hadoop RecordReader/RecordWriteNo compute pushdown support for ASB

AzureStorageVolumeAzureStorageVolumeAzureStorageVolumeAzureAccess any data

15/10/16 12:1122

2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Step 2: Configure a PolyBase group

SQL16PolyBaseEngine

SQL16

SQL16

SQL16PolyBaseDMSPolyBaseDMSPolyBaseDMSPolyBaseDMSHead nodeCompute nodes

PolyBase scale-out group

Head node is the SQL Server instance to which queries are submitted

Compute nodes are used for scale-out query processing for data in HDFS or Azure

15/10/16 12:1123

2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Step 3: Choose /Select Hadoop flavorSupported Hadoop distributions Cloudera CHD 5.x on LinuxHortonworks 2.x on Linux and Windows Server

What happens under the covers? Loading the right client jars to connect to Hadoop distribution

Access any data

Step 4: Attach Hadoop Cluster or Azure Storage

SQL16PolyBaseEngine

SQL16

SQL16

SQL16PolyBaseDMSPolyBaseDMSPolyBaseDMSPolyBaseDMSHead node

AzureStorageVolumeAzureStorageVolumeAzureStorageVolumeAzureAccess any dataHadoopCluster

After Setup

Compute nodes are used for scale-out query processing on external tables in HDFS

Tables on compute nodes cannot be referenced by queries submitted to head node

Number of compute nodes can be dynamically adjusted by DBA

Hadoop clusters can be shared between multiple SQL16 PolyBase groups

PolyBase T-SQL queries submitted herePolyBase queries can only refer to tables here and/or external tables here

Compute nodes

Head nodes

Access any dataHadoopCluster

- Improved PolyBase query performance with scale-out computation on external data (PolyBase scale-out groups)- Improved PolyBase query performance with faster data movement from HDFS to SQL Server and between PolyBase Engine and SQL Server

26

Polybase configuration

--1: Create a master key on the database. -- Required to encrypt the credential secret. CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'SQLSat#551'; -- select * from sys.symmetric_keys-- Create a database scoped credential for Azure blob storage. -- IDENTITY: any string (this is not used for authentication to Azure storage). -- SECRET: your Azure storage account key. CREATE DATABASE SCOPED CREDENTIAL AzureStorageCredential WITH IDENTITY = 'wasbuser', Secret = '1abcdEFGb3Mcn0F9UdJS/10taXmr5L17xrEO17rlMRL8SNYg==';

Create external Data Source--2: Create an external data source. -- LOCATION: Azure account storage account name and blob container name. -- CREDENTIAL: The database scoped credential created above. CREATE EXTERNAL DATA SOURCE AzureStorage with ( TYPE = HADOOP, LOCATION ='wasbs://[email protected]', CREDENTIAL = AzureStorageCredential );

-- view list of external data sources;select * from sys.external_data_sources

Create External file format--select * from sys.external_file_formats --3: Create an external file format. -- FORMAT TYPE: Type of format in Hadoop -- (DELIMITEDTEXT, RCFILE, ORC, PARQUET).

-- With GZIP: CREATE EXTERNAL FILE FORMAT TextDelimited_GZIP WITH ( FORMAT_TYPE = DELIMITEDTEXT , FORMAT_OPTIONS (FIELD_TERMINATOR ='|',USE_TYPE_DEFAULT = TRUE), DATA_COMPRESSION = 'org.apache.hadoop.io.compress.GzipCodec' );

Create External Table--4: Create an external table. -- The external table points to data stored in Azure storage. -- LOCATION: path to a file or directory that contains the data (relative to the blob container). -- To point to all files under the blob container, use LOCATION='/' CREATE EXTERNAL TABLE [dbo].[lineitem4] ([ROWID1] [bigint] NULL,[L_SHIPDATE] [smalldatetime] NOT NULL,[L_ORDERKEY] [bigint] NOT NULL,[L_DISCOUNT] [smallmoney] NOT NULL,[..[L_COMMENT] [varchar](44) NOT NULL) WITH (LOCATION='/', DATA_SOURCE = AzureStorage, FILE_FORMAT = TextFileFormat,REJECT_TYPE = VALUE, REJECT_VALUE = 0));

Import-------------------------------------- IMPORT Data from WASB into NEW table:------------------------------------SELECT * INTO [dbo].[LINEITEM_MO_final_temp] from ( SELECT * FROM [dbo].[lineitem1]) AS Import

Export data (Gzipped)

-- Enable Export/ INSERT into external table sp_configure 'allow polybase export', 1; Reconfigure

CREATE EXTERNAL TABLE [dbo].[lineitem_export] ([ROWID1] [bigint] NULL,..[L_SHIPINSTRUCT] [varchar](25) NOT NULL,[L_COMMENT] [varchar](44) NOT NULL) WITH (LOCATION='/gzipped', DATA_SOURCE = AzureStorage, FILE_FORMAT = TextDelimited_GZIP,REJECT_TYPE = VALUE, REJECT_VALUE = 0)

ManageExternal resourcesSSMS / VSTS

New:- External Tables- External ResourcesExt. Data SourcesExt. File formats

PolyBase query example #1-- select on external table (data in HDFS) SELECT * FROM Customer WHERE c_nationkey = 3 and c_acctbal < 0;A possible execution plan:CREATE temp table TExecute on compute nodes1IMPORTFROM HDFSHDFS Customer file read into T2EXECUTEQUERYSelect * from T where T.c_nationkey =3 and T.c_acctbal < 03Access any data

Additionally - there is- Support for exporting data to external data source via INSERT INTO EXTERNAL TABLE SELECT FROM TABLE- Support for push-down computation to Hadoop for string operations (compare, LIKE)- Support for ALTER EXTERNAL DATA SOURCE statement

34

PolyBase query example #2-- select and aggregate on external table (data in HDFS) SELECT AVG(c_acctbal) FROM Customer WHERE c_acctbal < 0 GROUP BY c_nationkey;Execution plan:Run MR Job on HadoopApply filter and compute aggregate on Customer. 1 What happens here? Step 1: QO compiles predicate into Java and generates a MapReduce (MR) job Step 2: Engine submits MR job to Hadoop cluster. Output left in hdfsTemp.hdfsTemp

Access any data

35

PolyBase query example #2-- select and aggregate on external table (data in HDFS) SELECT AVG(c_acctbal) FROM Customer WHERE c_acctbal < 0 GROUP BY c_nationkey;Execution plan:Predicate and aggregate pushed into Hadoop cluster as a MapReduce job

Query optimizer makes a cost-based decision on what operators to pushRun MR Job on HadoopApply filter and compute aggregate on Customer. Output left in hdfsTemp 1IMPORThdfsTEMPRead hdfsTemp into T 3CREATE temp table TOn DW compute nodes2RETURN OPERATIONSelect * from T4hdfsTemp

Access any data

36

Query relational and non-relational data, on-premises and in Azure

AppsT-SQL query

SQL ServerHadoopCapabilityT-SQL for querying relational and non-relational data across SQL Server and Hadoop BenefitsNew business insights across your data lakeLeverage existing skill sets and BI toolsFaster time to insights and simplified ETL process

Summary: PolyBase

Query relational and non-relational data with T-SQLAccess any data

When it comes to key BI investments, we are making it much easier to manage relational and non-relational data. PolyBase technology allows you to query Hadoop data and SQL Server relational data through a single T-SQL query. One of the challenges we see with Hadoop is there are not enough people knowledgeable in Hadoop and MapReduce, and this technology simplifies the skill set needed to manage Hadoop data. This can also work across your on-premises environment or SQL Server running in Azure.Server & Tools Business 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.15/10/1637

Monitoring Polybase Queries

Lots of new DMVs------------------------------------------ Monitoring Polybase / All DMV's :----------------------------------------SELECT * FROM sys.external_tablesSELECT * FROM sys.external_data_sourcesSELECT * FROM sys.external_file_formats

SELECT * FROM sys.dm_exec_compute_node_errorsSELECT * FROM sys.dm_exec_compute_node_status SELECT * FROM sys.dm_exec_compute_nodes SELECT * FROM sys.dm_exec_distributed_request_steps SELECT * FROM sys.dm_exec_dms_services

SELECT * FROM sys.dm_exec_distributed_requests SELECT * FROM sys.dm_exec_distributed_sql_requests SELECT * FROM sys.dm_exec_dms_workers SELECT * FROM sys.dm_exec_external_operations SELECT * FROM sys.dm_exec_external_work

Find the longest running query -- Find the longest running query SELECT execution_id, st.text, dr.total_elapsed_time FROM sys.dm_exec_distributed_requests dr cross apply sys.dm_exec_sql_text(sql_handle) st ORDER BY total_elapsed_time DESC;

Find the longest running step of the distributed query plan

-- Find the longest running step of the distributed query plan SELECT execution_id, step_index, operation_type, distribution_type, location_type, status, total_elapsed_time, command FROM sys.dm_exec_distributed_request_steps WHERE execution_id = 'QID1120' ORDER BY total_elapsed_time DESC;

Details on a Step_index

SELECT execution_id, step_index, dms_step_index, compute_node_id, type, input_name, length, total_elapsed_time, status FROM sys.dm_exec_external_work WHERE execution_id = 'QID1120' and step_index = 7ORDER BY total_elapsed_time DESC;

Optimizations

Polybase - data compression to minimize data movement

http://henkvandervalk.com/aps-polybase-for-hadoop-and-windows-azure-blob-storage-wasb-integration

Enable Pushdown configuration (Hadoop)Improves query performance

Find the file yarn-site.xml in the installation path of SQL Server.

C:\Program Files\Microsoft SQL Server\MSSQL13.SQL2016RTM\MSSQL\Binn\Polybase\Hadoop\conf \ yarn-site.xml

On the Hadoop machine:in the Hadoop configuration directory. Copy the value of the configuration key yarn.application.classpath.

On the SQL Server machine, in the yarn.site.xml file, find the yarn.application.classpath property. Paste the value from the Hadoop machine into the value element.

Time to InsightsAPS Cybercrime Filmpje & Demo!

Various sourcesSingle query

Further Reading

Get started with Polybase:https://msdn.microsoft.com/en-us/library/mt163689.aspx

Data compression tests:http://henkvandervalk.com/aps-polybase-for-hadoop-and-windows-azure-blob-storage-wasb-integration

Q&A [email protected]

www.henkvandervalk.com

Please fill in the evaluation forms

Please add this slide add the end of your presentation to get feedback from the audience49