software installation deck big data workshop saturday march 10 th, 2012

15
Software Installation Deck Big Data Workshop Saturday March 10 th , 2012

Upload: zoe-regan

Post on 27-Mar-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Software Installation Deck Big Data Workshop Saturday March 10 th, 2012

Software Installation Deck

Big Data WorkshopSaturday March 10th, 2012

Page 2: Software Installation Deck Big Data Workshop Saturday March 10 th, 2012

Outline• Local Installation– Python– Word Count Code and Files– R and R-Studio– Hadoop Local Installation

• Cloud Access– Amazon Web Services Account– Cloud-Based Software Demos– R and R-Studio in the Cloud– Cloudera Virtual Manager– Virtualization Software– R and Hadoop: ‘rmr’

Page 3: Software Installation Deck Big Data Workshop Saturday March 10 th, 2012

Local

Page 4: Software Installation Deck Big Data Workshop Saturday March 10 th, 2012

Python Installation

• Mac/Linux comes with Python (should be able to run).

• Windows use the following website to download and install:

– http://www.python.org/getit/windows

Page 5: Software Installation Deck Big Data Workshop Saturday March 10 th, 2012

Python Wikipedia Word Count Files

What URL

Python Word Count Script https://s3.amazonaws.com/com.hadoopinboston.scripts/seq.py

Very Small File: 10 lines, 251 words: https://s3.amazonaws.com/com.hadoopinboston.inputdata/input-lines

Small: 64188 lines, 1.65M words (10MB) https://s3.amazonaws.com/com.hadoopinboston.inputdata/input.txt

Large: 500000 lines, 12M words (76 MB) https://s3.amazonaws.com/com.hadoopinboston.inputdata/input2.txt

Very Large: 85 million lines, (8 GB) https://s3.amazonaws.com/com.hadoopinboston.inputdata/all.txt

Mapper.py – mapper in python https://s3.amazonaws.com/com.hadoopinboston.scripts/mapper.py

Reducer.py – reducer in python https://s3.amazonaws.com/com.hadoopinboston.scripts/reducer-all.py

Mapper in R https://s3.amazonaws.com/com.hadoopinboston.scripts/mapper.R

Reducer in R https://s3.amazonaws.com/com.hadoopinboston.scripts/reducer.R

The four files of different sizes were created by Vipin to test out the time to run each one locally.

Page 6: Software Installation Deck Big Data Workshop Saturday March 10 th, 2012

LOCAL INSTALLATION:

R http://lib.stat.cmu.edu/R/CRAN/

R-Studio http://rstudio.org/

R and R-Studio Local Installation

Page 7: Software Installation Deck Big Data Workshop Saturday March 10 th, 2012

Hadoop Installation Mac/Linux

• Macbook – – Install ports package to get Hadoop (

www.macports.org). sudo port install hadoop (DONE!)

• Linux – – Use yum/apt-get package to get hadoop. sudo yum install hadoop (your mirror should have

hadoop binaries)

Please note that the local installation is for test and debug, and that ‘production’ jobs will be ran on the cloud.

Page 8: Software Installation Deck Big Data Workshop Saturday March 10 th, 2012

Hadoop Installation Windows

• Microsoft is working with Hortonworks on contributing to the Apache Hadoop project for Windows. Microsoft is working on a Community Technology Preview for Hadoop on Windows Azure (http://hadooponazure.com) and the release for on-premises installation is forthcoming. Those interested in running Hadoop on their own Windows hardware can follow http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/big-data-solution.aspx to sign up for the preview when it’s available.

• TODAY, it is possible to install Hadoop on Windows, but those distributions require Cygwin, whereas the upcoming release will not. There are some instructions for Windows (see for instance http://blog.sqltrainer.com/2012/01/installing-and-configuring-apache.html) that people can try.

Please note that the local installation is for test and debug, and that ‘production’ jobs will be ran on the cloud.

Page 9: Software Installation Deck Big Data Workshop Saturday March 10 th, 2012

Cloud

Page 10: Software Installation Deck Big Data Workshop Saturday March 10 th, 2012

• http://aws.amazon.com/

• The first example will be through Amazon's Elastic Map/Reduce. Similar in nature to:

• http://www.youtube.com/watch?v=kNsS9aDf6uE

Cloud Account

Page 12: Software Installation Deck Big Data Workshop Saturday March 10 th, 2012

R-Studio in the Cloud:

• http://www.r-bloggers.com/rstudio-in-the-cloud-for-dummies/

R or R-Studio in the Cloud:

• http://toreopsahl.com/2011/10/17/securely-using-r-and-rstudio-on-amazons-ec2/

R and R-Studio Cloud Access (No VM)

Page 13: Software Installation Deck Big Data Workshop Saturday March 10 th, 2012

Cloudera Hadoop Package• https://ccp.cloudera.com/display/SUPPORT/

Cloudera's+Hadoop+Demo+VM

• There are 3 options that relate to different Virtualization Software one of which also need to be installed (next slide)

• SSH Software (Windows) http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html

Virtual Manager with HadoopPlease note that these are 64-bit versions, and that the Virtualization Software will require a

laptop that supports virtualization. If you are unsure, one way this can be checked by looking at your BIOS and seeing if Virtualization is Enabled. Most chips support virtualization; however a

handful of MFG installed BIOS do not enable virtualization.

Page 14: Software Installation Deck Big Data Workshop Saturday March 10 th, 2012

• VMware Player: Jeffrey Uses This One in his Sessionhttp://downloads.vmware.com/d/info/desktop_end_user_computing/vmware_player/4_0

• KVM: http://www.linux-kvm.org/page/Main_Page

• VirtualBox: Jim uses this one.– https://www.virtualbox.org/

Virtual Manager with Hadoop

Jeffrey will be walking through this process.

Page 15: Software Installation Deck Big Data Workshop Saturday March 10 th, 2012

• https://github.com/RevolutionAnalytics/RHadoop/wiki/rmr

Session 6: R and Hadoop: rmr

Jeffrey will be walking through this process.

We realize the VM and R and Hadoop parts are very detailed, and that there may be questions on other workshop parts.

Following the last session we will try to have a post-workshop help session.