trivadis techevent 2016 polybase challenges hive relational access to non-relational hdfs by olaf...
TRANSCRIPT
![Page 1: Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz](https://reader031.vdocument.in/reader031/viewer/2022021919/58719b7b1a28ab044e8b5c01/html5/thumbnails/1.jpg)
BASLE BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA
HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH
Polybase challenges Hiverelational access to non-relational HDFS
Olaf Nimz
![Page 2: Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz](https://reader031.vdocument.in/reader031/viewer/2022021919/58719b7b1a28ab044e8b5c01/html5/thumbnails/2.jpg)
Agenda
Proposed marriage between SQL Server and Hadoop
Building Bridges to HDFS
Distributed query processing
Sensible Hybrid Scenarios
![Page 3: Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz](https://reader031.vdocument.in/reader031/viewer/2022021919/58719b7b1a28ab044e8b5c01/html5/thumbnails/3.jpg)
Take Home Message
1. Access to non-relational world is easier with Polybase
T-SQL only
Unstructured data still complex e.g. nested JSON stuctures
2. Hybrid solutions
Fact Extractor - IoT
Staging Area for DWH – keep entire history
Dirty data source files
Near real-time
3. Scenarios
Swiss Air - Flight Logs
SwissCom - Call Data Records
Archiving (c)old DWH Facts
![Page 4: Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz](https://reader031.vdocument.in/reader031/viewer/2022021919/58719b7b1a28ab044e8b5c01/html5/thumbnails/4.jpg)
Polybase
![Page 5: Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz](https://reader031.vdocument.in/reader031/viewer/2022021919/58719b7b1a28ab044e8b5c01/html5/thumbnails/5.jpg)
Polybase
Requirements
– Java (64-bit JRE >7.51)
– Azure storage account or Hadoop (not HDInsight)
> Hortonwork’s Data Platform (HDP 1.3, 2.0 – 2.3)
> Cloudera’s CDH (4.3, 5.1 – 5.5)
Installation Check
– SELECT SERVERPROPERTY ('IsPolybaseInstalled'); returns 1?
Configuration external data source
– sp_configure @configname = 'hadoop connectivity', @configvalue = 7;
![Page 6: Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz](https://reader031.vdocument.in/reader031/viewer/2022021919/58719b7b1a28ab044e8b5c01/html5/thumbnails/6.jpg)
Data Movement Services
![Page 7: Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz](https://reader031.vdocument.in/reader031/viewer/2022021919/58719b7b1a28ab044e8b5c01/html5/thumbnails/7.jpg)
FeatureSQL Server
2016
Azure SQL Data
WarehouseAPS Appliance - PDW
Query Hadoop data with Transact-SQL yes no yes
Query Azure blob storage with
Transact-SQLyes yes yes
Import data from Hadoop yes no yes
Import data from Azure blob storage yes yes yes
Export data to Hadoop yes no yes
Export data to Azure blob storage yes yes yes
Run PolyBase queries from Microsoft's
BI toolsyes yes yes
Push down query computations to
Hadoopyes no yes
Feature
![Page 8: Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz](https://reader031.vdocument.in/reader031/viewer/2022021919/58719b7b1a28ab044e8b5c01/html5/thumbnails/8.jpg)
Objects for Polybase
![Page 9: Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz](https://reader031.vdocument.in/reader031/viewer/2022021919/58719b7b1a28ab044e8b5c01/html5/thumbnails/9.jpg)
2015 © Trivadis
Define external objects
CREATE MASTER KEY ENCRYPTION
BY PASSWORD = 'S0me!nfo';
CREATE DATABASE SCOPED CREDENTIAL
HadoopUser
WITH IDENTITY = '<hadoop_user_name>', SECRET = '<hadoop_password>';
CREATE EXTERNAL DATA SOURCE
HadoopCluster
WITH ( TYPE = HADOOP,
LOCATION ='hdfs://10.xxx.xx.xxx:xxxx',
RESOURCE_MANAGER_LOCATION = '10.xxx.xx.xxx:xxxx',
CREDENTIAL = HadoopUser);
![Page 10: Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz](https://reader031.vdocument.in/reader031/viewer/2022021919/58719b7b1a28ab044e8b5c01/html5/thumbnails/10.jpg)
2015 © Trivadis
Define external objects
CREATE EXTERNAL FILE FORMAT
TextFileFormat
WITH ( FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (FIELD_TERMINATOR ='|', USE_TYPE_DEFAULT = TRUE)
CREATE EXTERNAL TABLE
[dbo].[CarSensor_Data] (
[SensorKey] int NOT NULL, [CustomerKey] int NOT NULL,
[GeographyKey] int NULL, [Speed] float NOT NULL,
[YearMeasured] int NOT NULL )
WITH (LOCATION = '/Demo/',
DATA_SOURCE = HadoopCluster,
FILE_FORMAT = TextFileFormat );
![Page 11: Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz](https://reader031.vdocument.in/reader031/viewer/2022021919/58719b7b1a28ab044e8b5c01/html5/thumbnails/11.jpg)
2015 © Trivadis
Query external data
SELECT DISTINCT Insured_Customers.FirstName
, Insured_Customers.LastName
, Insured_Customers.YearlyIncome
, CarSensor_Data.Speed
FROM Insured_Customers
, CarSensor_Data -- cross join
WHERE Insured_Customers.CustomerKey = CarSensor_Data.CustomerKey
and CarSensor_Data.Speed > 35
ORDER BY CarSensor_Data.Speed DESC
OPTION (FORCE EXTERNALPUSHDOWN);
-- or OPTION (DISABLE EXTERNALPUSHDOWN)
![Page 12: Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz](https://reader031.vdocument.in/reader031/viewer/2022021919/58719b7b1a28ab044e8b5c01/html5/thumbnails/12.jpg)
2015 © Trivadis
Export Data to Hadoop
CREATE EXTERNAL TABLE [dbo].[FastCustomers2009] ( … );
Move cold data to Hadoop/Blob while keeping it query-able via an external table:
INSERT INTO dbo.FastCustomer2009
SELECT *
FROM Insured_Customers T1
JOIN CarSensor_Data T2
ON (T1.CustomerKey = T2.CustomerKey)
WHERE T2.YearMeasured = 2009
AND T2.Speed > 40;
![Page 13: Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz](https://reader031.vdocument.in/reader031/viewer/2022021919/58719b7b1a28ab044e8b5c01/html5/thumbnails/13.jpg)
Polybase
Objects in SSMS
![Page 14: Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz](https://reader031.vdocument.in/reader031/viewer/2022021919/58719b7b1a28ab044e8b5c01/html5/thumbnails/14.jpg)
Dynamic Management Views
Monitor and troubleshoot PolyBase queries using the DMVs.
longest running queries
longest running step of the distributed query
execution progress of the longest running step
- of a SQL step
- XML remote query plan
- of a DMS step
Find information about external DMS operations
- View the PolyBase query plan
- XML remote query plan (node properties)
![Page 15: Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz](https://reader031.vdocument.in/reader031/viewer/2022021919/58719b7b1a28ab044e8b5c01/html5/thumbnails/15.jpg)
JSON Format
Parse JSON text and read or modify values.
Transform arrays of JSON objects into table format.
Use any Transact SQL query on the converted JSON objects.
Format the results of Transact-SQL queries in JSON format.
![Page 16: Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz](https://reader031.vdocument.in/reader031/viewer/2022021919/58719b7b1a28ab044e8b5c01/html5/thumbnails/16.jpg)
JSON
![Page 17: Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz](https://reader031.vdocument.in/reader031/viewer/2022021919/58719b7b1a28ab044e8b5c01/html5/thumbnails/17.jpg)
Parse «unstructured» JSON cell content
stored in the jsonCol column:
[ { "name": "John", "skills": [ "SQL", "C#", "Azure“ ] }, { "name": "Jane", "surname": "Doe" } ]
SELECT Name, Surname,
JSON_VALUE(jsonCol, '$.info.address.PostCode') as PostCode,
JSON_VALUE(jsonCol, '$.info.address."Address Line 1"') +' '+
JSON_VALUE(jsonCol, '$.info.address."Address Line 2"') as Address,
JSON_QUERY(jsonCol, '$.info.skills') as Skills
FROM PeopleCollection
WHERE ISJSON(jsonCol) > 0
AND JSON_VALUE(jsonCol, '$.info.address.town') = 'Belgrade'
AND Status = 'Active'
ORDER BY JSON_VALUE(@jsonInfo, '$.info.address.PostCode')
![Page 18: Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz](https://reader031.vdocument.in/reader031/viewer/2022021919/58719b7b1a28ab044e8b5c01/html5/thumbnails/18.jpg)
Convert «unstructured» JSON to table
SET @json = '[
{ "id" : 2, "info": { "name": "John", "surname": "Smith" }, "age": 25 },
{ "id" : 5, "info": { "name": "Jane", "surname": "Smith" }, "dob": "2005-11-04T12:00:00" }
]'
SELECT *
FROM OPENJSON(@json)
WITH (id int 'strict $.id',
firstName nvarchar(50) '$.info.name', lastName nvarchar(50) '$.info.surname',
age int, dateOfBirth datetime2 '$.dob')
![Page 19: Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz](https://reader031.vdocument.in/reader031/viewer/2022021919/58719b7b1a28ab044e8b5c01/html5/thumbnails/19.jpg)
Performance Scaling
![Page 20: Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz](https://reader031.vdocument.in/reader031/viewer/2022021919/58719b7b1a28ab044e8b5c01/html5/thumbnails/20.jpg)
Take Home Message
1. Access to non-relational world is easier with Polybase
T-SQL only
Unstructured data still complex e.g. nested JSON stuctures
2. Hybrid solutions
Fact Extractor - IoT
Staging Area for DWH – keep entire history
Dirty data source files
Near real-time
3. Scenarios
Swiss Air - Flight Logs
Swisscom - Call Data Records
Archiving (c)old DWH Facts
![Page 21: Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz](https://reader031.vdocument.in/reader031/viewer/2022021919/58719b7b1a28ab044e8b5c01/html5/thumbnails/21.jpg)
Outlook
Table definition remains challenging
Push down computation
Scale-out the SQL Server side
– using e.g. idle Fail Over Instance
see Blob Post with Code Examples
![Page 22: Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-relational HDFS by Olaf Nimz](https://reader031.vdocument.in/reader031/viewer/2022021919/58719b7b1a28ab044e8b5c01/html5/thumbnails/22.jpg)
BASEL BERN BRUGG LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN
THANK YOU. Trivadis AG
Olaf Nimz
Sägereistrasse 29
8152 Glattbrugg
Tel. +41-44-808 70 20
Fax +41-44-808 70 21
www.trivadis.com