hadoop on azure
DESCRIPTION
deck for SoCalCodeCamp June2012TRANSCRIPT
![Page 1: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/1.jpg)
Hadoop on Azure
Lynn LangitPractioner, Author, Instructor
June 2012- for SoCalCodeCamp
![Page 2: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/2.jpg)
Hadoop = BigData?
• HUGE Hype factor in 2011 / 2012
Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license• enables applications to work with thousands of nodes and petabytes of data• was inspired by Google's MapReduce and Google File System (GFS) papers
![Page 3: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/3.jpg)
Oracle Loader for Hadoop
SQL Server Connector for Hadoop
![Page 4: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/4.jpg)
Flavors of NoSQL
![Page 5: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/5.jpg)
Column Database
Wide, sparse column sets
![Page 6: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/6.jpg)
RDBMS vs. HadoopTraditional RDBMS Hadoop
Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)
Access Interactive and Batch Batch – NOT Interactive
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
Query Response Time
Can be near immediate Has latency (due to batch processing)
![Page 7: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/7.jpg)
What about the cloud?
![Page 8: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/8.jpg)
The reality…two pivots
Storage Methods• SQL (RDBMS) • Hadoop
Storage Locations• On premises • Cloud-hosted
![Page 9: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/9.jpg)
Demo - Setting up Your Cluster
![Page 10: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/10.jpg)
Cluster Allocation Process
![Page 11: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/11.jpg)
Working with Hadoop on AzureTools / Languages• MapReduce
• Map (query/format)• Reduce (aggregate)• plug-in for Eclipse (Java)• JavaScript• C# Streaming
• Pig (ETL -- Java)• Hive (HQL Query)
• HBase tables• Others
• Mahout (analyze)• R (analyze)
![Page 12: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/12.jpg)
Tasks – DBA vs. Hadoop on AzureRDBMS Hadoop on AzureImport Data Upload Data using FTP or import via SqoopSetup Security Setup SecurityScale Compute (up or out) Add child nodes to the clusterPerform a Backup Monitor and replace failed nodesRestore a Database n/aClean up data via ETL Execute a PIG jobCreate an Index – query tune Write a HIVE query (HQL)Join Tables Together Run MapReducen/a Monitor and manage running MapReduce jobsSchedule a Job Schedule a (Cron) JobRun Database Maintenance Monitor space and resources used
Send an Email from SQL Server Set up resource threshold alerts
Manage License costs Manage usage time charges
![Page 13: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/13.jpg)
Demo - Basic Administration
Open Ports
![Page 14: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/14.jpg)
Demo - Basic Administration
Connect via RDP
![Page 15: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/15.jpg)
NameNode Utility – Top Level
![Page 16: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/16.jpg)
NameNode Utility – Drill Down
![Page 17: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/17.jpg)
Demo - Basic Administration
Configure connections to remote storage
![Page 18: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/18.jpg)
Configuring Upload from AWS S3
![Page 19: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/19.jpg)
Configuring Upload from Azure
![Page 20: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/20.jpg)
Using the Azure Storage Viewer
![Page 21: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/21.jpg)
Configuring Upload from DataMarket
![Page 22: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/22.jpg)
Asking Questions = MapReduce
![Page 23: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/23.jpg)
Samples
![Page 24: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/24.jpg)
Demo - MapReduce using Java
• WordCount example using AWS S3 data
![Page 25: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/25.jpg)
Demo - MapReduce using C# Streaming
• WordCount example
![Page 26: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/26.jpg)
Demo - MapReduce using JavaScript
• WordCount example
![Page 27: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/27.jpg)
Demo - Using HIVE
• WordCount example
![Page 28: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/28.jpg)
Demo - Using HIVE
![Page 29: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/29.jpg)
Monitoring Job Results• In the portal
– Main Console• Job icon (button) status summary• Job History
– Interactive Console• JS quick feedback• JS detailed feedback (log)
• Using RDP– Map/Reduce tool
![Page 30: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/30.jpg)
Demo – Monitoring Job Status
![Page 31: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/31.jpg)
Download – ODBC for HIVE
• Includes add-in for Excel
![Page 32: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/32.jpg)
Demo - Hadoop Connector to Excel
![Page 33: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/33.jpg)
Connecting to PowerPivot
• Create an ODBC connection to HIVE• Connect to ‘other data source’ in PowerPivot
![Page 34: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/34.jpg)
Real-World – Hadoop and…
Facebook runs on Hadoop & MySQL
Twitter runs on Hadoop (ran on FlockDb/graph)
Yahoo runs on Hadoop
LinkedIn runs on Hadoop & Voldemort
Klout runs Hadoop (on Azure) &HBase (Hive) & SQL Server SSAS BISM cubes
![Page 35: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/35.jpg)
Hadoop To-Do ListBigData = Hadoop• Use Hadoop when business
needs designate
Hadoop on the cloud• Quick and cheap• Specialized use cases• Behavioral data• dev, test , training environments
Hadoop access technologies• Learn Map/Reduce• Use HIVE via Excel
![Page 36: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/36.jpg)
The Changing Data Landscape
HadoopRDBMS
OtherServices
![Page 37: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/37.jpg)
TeachingKidsProgramming.org
Do a Recipe Teach a Kid (Ages 10 ++)SmallBasic or Java Free Courseware (recipes)
![Page 38: Hadoop on Azure](https://reader034.vdocument.in/reader034/viewer/2022051608/54556ca4af7959755d8b4861/html5/thumbnails/38.jpg)
Toward Data Craftsmanship…
Follow me @LynnLangit
RSS my blog www.LynnLangit.com
Hire me• To help build your BI/Big Data solution• To teach your team next gen BI• To learn more about using NoSQL solutions