campus days azure hdinsight automation
TRANSCRIPT
#C
am
pusD
ays
Agenda
Elements in a BIG DATA Project on AZURE• Walkthrough of the elements needed
HDInsight• Deploy through Azure Portal
• Deploy with Powershell and Windows Azure SQL Database
• Multiple Storage Accounts and Configuration Values
• Deploy as part of your normal ETL
#C
am
pusD
ays
Elements in a BIG DATA Project on AZURE
• AZURE Account
• Storage Account
• SQL Server
• SQL Databases
• Firewall rules
• HDInsight Cluster
• Data
• Hive Scripts
• Machine Learning
#C
am
pusD
ays
Deployment via AZURE portal
Requirements
• AZURE Account
• Either a free trial
• MSDN Subscription
• Or paid subscription
• Create one here - http://azure.microsoft.com/da-dk/pricing/free-trial/
#C
am
pusD
ays
Deployment via AZURE portal
• SQL Server
• Create either when creating a datebase
• Or alone without a database
#C
am
pusD
ays
Deployment via AZURE portal
• SQL Databases
• Easy created only name, server and subscription needed
#C
am
pusD
ays
Deployment via AZURE portal
• Firewall Rules
• Cluster will not be able to see metastore and cluster creation fails
#C
am
pusD
ays
Deployment via AZURE portal
• HDInsight Cluster
• Needs a storage account
• Firewall rules must be set to allow all AZURE Services
#C
am
pusD
ays
Deployment via AZURE portal
• Upload files to Azure
• Use Azure Explorer• Upload files yourself
• Import job via portal• Ship harddrive to Microsoft
• Demo
#C
am
pusD
ays
Deployment via AZURE portal
• Many steps
• Easy to make mistakes
• This will be done over and over again
• Is there another way to make this easier?
• YES!
• Lets have a look at it
#C
am
pusD
ays
Let’s automate it – using PowerShell
• Using PowerShell
• Multiple scripts
• Configuration
#C
am
pusD
ays
Let’s automate it – using PowerShell
• Why Automate it?• Reliability
• Repeatability
• Save time
• Eliminate tiresome work
• Eliminate manual work
• Manual work is bound to fail at some point
#C
am
pusD
ays
Let’s automate it – using PowerShell
• Configuration• Flexible
• Create and recreate
• Upload data to Cluster
• Easy to make changes to project
• Easy to test
#C
am
pusD
ays
Let’s automate it – using PowerShell
• Load Data to Cluster
• Configuration• Shall we download files
• Shall we upload files
• Directories
• Automate download• Unzip files
• Upload csv
• Cleanup
#C
am
pusD
ays
Let’s automate it – using PowerShell
• After usage – clean up -> save money• Script to cleanup cluster
• Storage
• SQL server
• SQL databases
This saves money, and we can easily
recreate the objects needed
#C
am
pusD
ays
Let’s automate it – using PowerShell
• Firewall Rule is required
• Cluster will not be able to see metastore and cluster creation fails
• Allow All Azure Services
• On SQL Server created early
New-AzureSqlDatabaseServerFirewallRule
-ServerName Campusdays2014
-AllowAllAzureServices
-Verbose
#C
am
pusD
ays
Let’s automate it – using PowerShell
• Remember to Add-AzureAccount to your Powershell session.
• Otherwise you’ll get an error.
#C
am
pusD
ays
HDInsight as a part of your ETL
• Normal ETL on-prem
• Benefits of the Cloud
• Staying on-prem
#C
am
pusD
ays
Keep the cost down and the flexibility high
• Supports Hybrid scenarios• Run on-prem
• Create HDInsight cluster
• Do some cool stuff
• Destroy the cluster
• No need for PowerShell knowledge
#C
am
pusD
ays
HDinsight SSIS Components
• Community driven
• More than 10 SSIS components (Incl. connections)
• First step for moving to the cloud