sponsorzy strategiczni sponsorzy srebrni. polybase – data beyond tables hubert kobierzewski

25
Sponsorzy strategiczni Sponsorzy srebrni

Upload: may-morris

Post on 19-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski

Sponsorzy strategiczni

Sponsorzy srebrni

Page 2: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski

PolyBase – data beyond tablesHubert Kobierzewski

Page 3: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski

Hubert K. Kobierzewski

• BI Consultant in Codec-dss (aka Codec Systems) – over 8 years• Specialized in: Data Warehousing, ETL processes and Business

Intelligence• Ex-Developer• MS SQL Server certified (MCDBA, MCTS, MCITP, MCSE – BI, ex-MCT)• Member of Data Platform Advisors (internal MS group)• co-leader of Warsaw PLSSUG Chapter

Page 4: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski

Megabytes

What is Big Data and why is it valuable to the business?

Evolution in the nature and use of data in the enterprise

Data complexity: variety and velocity

Peta

byte

s

Historical analysis

Insight analysis

Predictive analytics

Predictive forecasting

Valu

e t

o t

he b

usi

ness

Page 5: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski

Hadoop (some elements, relevant in this presentation)

HDFSDistributed, scalable fault tolerant file system

MapReduceA framework for writing fault tolerant, scalable distributed applications

HiveA relational DBMS that stores its tables in HDFS and uses MapReduce as its target execution language

SqoopA library and framework for moving data between HDFS and a relational DBMS

HDFS

MapReduce

Hive

Sqoop

Page 6: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski

Move HDFS into the warehouse before analysis

HDFS (Hadoop)

ETL

WarehouseHDFS (Hadoop)

Learn new skills

T-SQL

Build Integrate ManageMaintainSupport

Hadoop alone is not the answer to all Big Data challenges

Steep learning curve, slow and inefficient

Hadoop ecosystem

New data sources

Devices Web Sensor Social

“New” data sources

New data sources

Devices Web Sensor Social

Page 7: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski

PolyBase in the Modern Data WarehouseBackground

Research done by Gray System Lab lead by Technical Fellow David DeWitt

High-level goals for PolyBase Seamless Integration with Hadoop via regular T-SQL

Enhancing the MPP query engine to process data coming from the Hadoop Distributed File System (HDFS)

Fully parallelized query processing for highly performing data import and export from HDFS

Integration with various Hadoop implementations

Hadoop on Windows Server, Hortonworks, and Cloudera

Page 8: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski

Prerequisites for installing PolyBase• 64-bit SQL Server Evaluation edition• Microsoft .NET Framework 4.0. • Oracle Java SE RunTime Environment (JRE) version

7.51 or higher• NOTE: Java JRE version 8 does not work.

• Minimum memory: 4GB• Minimum hard disk space: 2GB

Page 9: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski

Using the installation wizard for PolyBase

• Run SQL Server Installation Center. (Insert SQL Server installation media and double-click Setup.exe)

• Click Installation, then click New Standalone SQL Server installation or add features

• On the feature selection page, select PolyBase Query Service for External Data.

• On the Server Configuration Page, configure the PolyBase Engine Service and PolyBase Data Movement Service to run under the same account.

Page 10: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski

External tables

CREATE EXTERNAL TABLE table_name ({<column_definition>}[,..n ])

{WITH (DATA_SOURCE = <data_source>,

FILE_FORMAT = <file_format>,

LOCATION =‘<file_path>’,

[REJECT_VALUE = <value>], …} [;]

Referencing external file format

Referencing external data source

Path of the Hadoop file/folder

(Optional) Reject parameters

1

2

3

• Internal representation of data residing outside of appliance

• Supports wide array of data types

o Excluding text, ntext and similar but including binary and varbinary

• SQL permissions

o CREATE TABLE, and ALTER ANY SCHEMA

o ALTER ANY DATA SOURCE

4

Page 11: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski

External data sources

CREATE EXTERNAL DATA SOURCE datasource_name

{WITH (TYPE = <data_source>,

LOCATION =‘<location>’,

[JOB_TRACKER_LOCATION = ‘<jb_location>’]} [;]

Location of external data source

Type of external data source

Enabling or disabling of MapReduce job generation

1

2

3

• Internal representation of an external data source

o Support of Hadoop as a data source and Windows Azure Blob Storage (WASB, formerly known as ASV)

• Enabling and disabling of split-based query processing

o Generation of MapReduce jobs on-the-fly [fully transparent for end user]

• ALTER ANY EXTERNAL DATA SOURCE permission required

Page 12: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski

External file format

CREATE EXTERNAL FILE FORMAT fileformat_name

{WITH ( FORMAT_TYPE = <type>,

[SERDE_METHOD = ‘<sede_method>’]

[DATA_COMPRESSION = ‘<compr_method>’]

[FORMAT_OPTIONS (<format_options>)]}[;]

(De)Serialization method [Hive RCFile]

Type of external data source

Compression method

(Optional) Format Options [Text Files]

1

2

3

• Internal representation of an external file format

o Support of delimited text files, Hive RCFiles and Hive ORC

• Enabling and disabling of split-based query processing

o Generation of MapReduce jobs on-the-fly

• ALTER ANY EXTERNAL FILE FORMAT permission required

4

Page 13: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski

Format options for delimited text files

<Format Options> :: = [,FIELD_TERMINATOR= ‘Value’], [,STRING_DELIMITER = ‘Value’], [,DATE_FORMAT = ‘Value’], [USE_TYPE_DEFAULT = ‘Value’]

FIELD_TERMINATOR

STRING_DELIMITER

USE_TYPE_DEFAULT

DATE_FORMAT

To indicate a column delimiter

To specify the delimiter for string data type fields

To specify a particular date format

To specify how missing entries in text files are treated

Page 14: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski

HDFS File / Directory//hdfs/social_media/twitter

//hdfs/social_media/twitter/Daily.log

Hadoop

Column filtering

Dynamic binding

Row filtering

User Location Product Sentiment Rtwt Hour Date

Sean

Suz

Audie

Tom

Sanjay

Roger

Steve

CA

WA

CO

IL

MN

TX

AL

xbox

xbox

excel

sqls

wp8

ssas

ssrs

-1

0

1

1

1

1

1

5

0

0

8

0

0

0

2

2

2

2

1

23

23

1-8-14

1-8-14

1-8-14

1-8-14

1-8-14

1-8-14

1-7-14

PolyBase – Predicate pushdown

SELECT User, Product, Sentiment

FROM Twitter_Table

WHERE Hour = Current - 1AND Date = TodayAND Sentiment <= 0

Page 15: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski

SELECT DISTINCT C.FirstName, C.LastName, C.MaritalStatusFROM Insurance_Customer_SQL -- table in SQL Server…OPTION (FORCE EXTERNALPUSHDOWN) – push-down computationCREATE EXTERNAL DATA SOURCES ds-hdp WITH .( TYPE = Hadoop, LOCATION = “hdfs://10.193.27.52:8020”, Resources_Manager_Location = ‘10.193.27.52:8032’);

Pushing Compute

Either on data source level or Per-query basis using new query hints

Query CapabilitiesPush-Down Computation

Page 16: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski
Page 17: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski
Page 18: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski

PolyBase Demo

Page 19: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski

Use cases where PolyBase simplifies using Hadoop data

Bringing islands of Hadoop data together

Running queries against Hadoop data

Archiving data warehouse data to Hadoop (move)

Exporting relational data to Hadoop (copy)

Importing Hadoop data into a data warehouse (copy)

Page 20: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski

HDInsightThe MPP Engine’s Integration Method – without PolyBase

Control Node Compute Node

MPP DWH Engine

Compute Node

Name Node Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Hadoop Cluster

SQOOP-based connector

Data Node

Page 21: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski

HDInsightThe MPP Engine’s Integration Method – with PolyBase

Name NodeData Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Hadoop Cluster

Control Node

Compute NodeMPP DWH Engine

Compute Node

DMS DMS

Page 22: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski

Major Competitors

• Oracle since version 9i (ca. 2003)• IBM PureData System• Pivotal Greenplum• Oracle BDA (Big Data Appliance)

Page 24: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski

Questions

Page 25: Sponsorzy strategiczni Sponsorzy srebrni. PolyBase – data beyond tables Hubert Kobierzewski

Sponsorzy strategiczni

Sponsorzy srebrni