campus days azure hdinsight automation

32
#CampusDays @DoktorKermit & @regbac [email protected] [email protected]

Upload: kenneth-nielsen

Post on 14-Jul-2015

772 views

Category:

Technology


2 download

TRANSCRIPT

#CampusDays

@DoktorKermit & @regbac

[email protected] [email protected]

#C

am

pusD

ays

Agenda

Elements in a BIG DATA Project on AZURE• Walkthrough of the elements needed

HDInsight• Deploy through Azure Portal

• Deploy with Powershell and Windows Azure SQL Database

• Multiple Storage Accounts and Configuration Values

• Deploy as part of your normal ETL

#C

am

pusD

ays

Elements in a BIG DATA Project on AZURE

#C

am

pusD

ays

Elements in a BIG DATA Project on AZURE

• AZURE Account

• Storage Account

• SQL Server

• SQL Databases

• Firewall rules

• HDInsight Cluster

• Data

• Hive Scripts

• Machine Learning

#C

am

pusD

ays

Deployment via AZURE portal

#C

am

pusD

ays

Deployment via AZURE portal

Requirements

• AZURE Account

• Either a free trial

• MSDN Subscription

• Or paid subscription

• Create one here - http://azure.microsoft.com/da-dk/pricing/free-trial/

#C

am

pusD

ays

Deployment via AZURE portal

Storage konto

lowercase

#C

am

pusD

ays

Deployment via AZURE portal

• SQL Server

• Create either when creating a datebase

• Or alone without a database

#C

am

pusD

ays

Deployment via AZURE portal

• SQL Databases

• Easy created only name, server and subscription needed

#C

am

pusD

ays

Deployment via AZURE portal

• Firewall Rules

• Cluster will not be able to see metastore and cluster creation fails

#C

am

pusD

ays

Deployment via AZURE portal

• HDInsight Cluster

• Needs a storage account

• Firewall rules must be set to allow all AZURE Services

#C

am

pusD

ays

Deployment via AZURE portal

• Upload files to Azure

• Use Azure Explorer• Upload files yourself

• Import job via portal• Ship harddrive to Microsoft

• Demo

#C

am

pusD

ays

Deployment via AZURE portal

• Many steps

• Easy to make mistakes

• This will be done over and over again

• Is there another way to make this easier?

• YES!

• Lets have a look at it

#C

am

pusD

ays

Let’s automate it – using PowerShell

#C

am

pusD

ays

Let’s automate it – using PowerShell

• Using PowerShell

• Multiple scripts

• Configuration

#C

am

pusD

ays

Let’s automate it – using PowerShell

• Why Automate it?• Reliability

• Repeatability

• Save time

• Eliminate tiresome work

• Eliminate manual work

• Manual work is bound to fail at some point

#C

am

pusD

ays

Let’s automate it – using PowerShell

• Configuration• Flexible

• Create and recreate

• Upload data to Cluster

• Easy to make changes to project

• Easy to test

#C

am

pusD

ays

Demo

#C

am

pusD

ays

Let’s automate it – using PowerShell

• Load Data to Cluster

• Configuration• Shall we download files

• Shall we upload files

• Directories

• Automate download• Unzip files

• Upload csv

• Cleanup

#C

am

pusD

ays

Demo

#C

am

pusD

ays

Let’s automate it – using PowerShell

• After usage – clean up -> save money• Script to cleanup cluster

• Storage

• SQL server

• SQL databases

This saves money, and we can easily

recreate the objects needed

#C

am

pusD

ays

Demo

#C

am

pusD

ays

Let’s automate it – using PowerShell

• Firewall Rule is required

• Cluster will not be able to see metastore and cluster creation fails

• Allow All Azure Services

• On SQL Server created early

New-AzureSqlDatabaseServerFirewallRule

-ServerName Campusdays2014

-AllowAllAzureServices

-Verbose

#C

am

pusD

ays

Let’s automate it – using PowerShell

• Remember to Add-AzureAccount to your Powershell session.

• Otherwise you’ll get an error.

#C

am

pusD

ays

HDInsight the SSIS way

#C

am

pusD

ays

HDInsight as a part of your ETL

• Normal ETL on-prem

• Benefits of the Cloud

• Staying on-prem

#C

am

pusD

ays

Keep the cost down and the flexibility high

• Supports Hybrid scenarios• Run on-prem

• Create HDInsight cluster

• Do some cool stuff

• Destroy the cluster

• No need for PowerShell knowledge

#C

am

pusD

ays

HDinsight SSIS Components

• Community driven

• More than 10 SSIS components (Incl. connections)

• First step for moving to the cloud

#C

am

pusD

ays

Hadoop Versioner

#C

am

pusD

ays

Demo

#C

am

pusD

ays

Questions ?

EVENT SPONSORER

EXPO SPONSORER

TRACK SPONSORER