ai big dataconference_eugene_polonichko_azure data lake

30
Azure Data Lake: What is it? Why is it? Where is it? EUGENE POLONICHKO DATA PLATFORM MVP BI\DWH ARCHITECT

Upload: olga-zinkevych

Post on 21-Jan-2018

65 views

Category:

Engineering


5 download

TRANSCRIPT

Page 1: Ai big dataconference_eugene_polonichko_azure data lake

Azure Data Lake: What is it? Why is it? Where is it?

EUGENE POLONICHKO

DATA PLATFORM MVP

BI\DWH ARCHITECT

Page 2: Ai big dataconference_eugene_polonichko_azure data lake

About me

Eugene Polonichko has over 7 years of experience with SQL Server. He mainly focused on BI projects (SSAS, SSIS, PowerBI, Cognos, InformaticaPowerCenter, Pentaho, Tableau). Eugene is a passionate speaker and SQL community volunteer presenting regularly at PASS SQL Saturday events and local user groups around Ukraine and Europe. Eugene is PASS Chapter Leader and he has a status MVP Data Platform

https://www.linkedin.com/in/eugenepolonichko/

https://twitter.com/EvgenPolonichko

Page 3: Ai big dataconference_eugene_polonichko_azure data lake

Agenda What is Data Lake?

Architecture of Azure Data Lake

Azure Data Lake Store

Overview of Azure Data Lake Store

Compare

For big data processing

Azure Data Lake Analytics

U-SQL

Concepts

U-SQL Script Structure

Extractors

U-SQL Jobs

U-SQL catalog

Monitoring and performance U-SQL jobs

Data Lake Analytics pricing

Page 4: Ai big dataconference_eugene_polonichko_azure data lake

Data Lake

Page 5: Ai big dataconference_eugene_polonichko_azure data lake

Data Lake

Page 6: Ai big dataconference_eugene_polonichko_azure data lake

Architecture of Azure Data Lake

Page 7: Ai big dataconference_eugene_polonichko_azure data lake

Azure Data Lake Stores

Azure Data Lake Store is a hyper-scale repository for big data analytic workloads. Azure Data Lake enables you to capture data of any size, type, and ingestion speed in one single place for operational and exploratory analytics.

The Azure Data Lake store is an Apache Hadoop file system compatible with Hadoop Distributed File System (HDFS)

Can be accessed from Hadoop (available with HDInsight cluster) using the WebHDFS-compatible REST APIs

Page 8: Ai big dataconference_eugene_polonichko_azure data lake

Azure Data Lake Stores

Use Cases

Store social media

posts, log files, sensor

data

Store corporate data

such as

relational databases

(as flat files)

Page 9: Ai big dataconference_eugene_polonichko_azure data lake

Data Lake Storage vs Azure Storage

Optimized storage for big data analytics workloads

General purpose object store for a wide variety of

storage scenarios

Batch, interactive, streaming analytics, log files and etc

Any type of text or binary data, such as application

back end,

account contains folders, which in turn contains data stored as

files

Storage account has containers

Optimized performance for parallel analytics workloads. High

Throughput and IOPS.

Not optimized for analytics workloads

Page 10: Ai big dataconference_eugene_polonichko_azure data lake

Big Data requirements

Page 11: Ai big dataconference_eugene_polonichko_azure data lake

Pricing

Transaction prices

Storage prices

Page 12: Ai big dataconference_eugene_polonichko_azure data lake

DEMO

Page 13: Ai big dataconference_eugene_polonichko_azure data lake

Azure Data Lake Analytics

Azure Data Lake Analytics is an on-demand analytics job service to simplify big data analytics. You can focus on writing, running, and managing jobs rather than on operating distributed infrastructure.

Dynamic scaling

Develop faster, debug, and optimize smarter using familiar tools

Affordable and cost effective

Works with all your Azure Data

U-SQL: simple and familiar, powerful, and extensible

Page 14: Ai big dataconference_eugene_polonichko_azure data lake

U-SQL

T-SQL C#

U-SQL

Page 15: Ai big dataconference_eugene_polonichko_azure data lake

Concepts

Retrieve data from stored locations in rowset format

Transform the rowset(s)

Transform the rowset(s)

Page 16: Ai big dataconference_eugene_polonichko_azure data lake

U-SQL Script Structure

Script :=

Statement_List.

Statement_List :=

{ [Statement] ';' }.

Statement := Use_Statement

| If_Else_Statement| Declare_Variable_Statement| Reference_Assembly_Statement| Deploy_Resource_Statement| DDL_Statement| Query_Statement| Procedure_Call| Import_Package_Statement| DML_Statement| Output_Statement.

Page 17: Ai big dataconference_eugene_polonichko_azure data lake

U-SQL Script Structure

Page 18: Ai big dataconference_eugene_polonichko_azure data lake

U-SQL Built-in Extractors:

Extractors.Text() :

Extractors.Csv()

Extractors.Tsv()

Extractors

Page 19: Ai big dataconference_eugene_polonichko_azure data lake

U-SQL Jobs

UNIT

V--

V--

V—V---

V--

V--

ADLAUs

Page 20: Ai big dataconference_eugene_polonichko_azure data lake

U-SQL Jobs

ADLAUs

Azure

Data

Lake

Analytics

Unit

Parallelism N = N ADLAUs1 ADLAU ~=A VM with 2 cores and 6GB of memory

Page 21: Ai big dataconference_eugene_polonichko_azure data lake

U-SQL Jobs

Page 22: Ai big dataconference_eugene_polonichko_azure data lake

U-SQL Catalog

Database

Table

Views

Procedures

Page 23: Ai big dataconference_eugene_polonichko_azure data lake

DEMO

Page 24: Ai big dataconference_eugene_polonichko_azure data lake

Monitoring

1 Azure Portal

Page 25: Ai big dataconference_eugene_polonichko_azure data lake

Monitoring

Visual Studio

Page 26: Ai big dataconference_eugene_polonichko_azure data lake

DEMO

Page 27: Ai big dataconference_eugene_polonichko_azure data lake

Pricing

Page 28: Ai big dataconference_eugene_polonichko_azure data lake

Links

http://www.sqlservercentral.com/stairway/142480/

https://azure.microsoft.com/en-us/solutions/data-lake/

Page 29: Ai big dataconference_eugene_polonichko_azure data lake

Questions?

Page 30: Ai big dataconference_eugene_polonichko_azure data lake

Thank you