cluster computing for $0.27/hr using amazon ec2 and ipython notebook
DESCRIPTION
Cluster Computing on Amazon EC2 using IPython Notebook is a cheap way to add parallel processing abilities to your Big Data workflow.TRANSCRIPT
Cluster Computing Using IPython Notebook with Amazon EC2
http://randyzwitch.com
1. Launching EC2 Cluster Instance
1. Launch Spot Instance from Spot Instance Menu
http://randyzwitch.com
1b. Launch Spot Instance – Ubuntu Server 12.04LTS for HVM Instances
http://randyzwitch.com
1c. Launch Spot Instance – Choose cc2.8xl instance
http://randyzwitch.com
1d. Launch Spot Instance – Set bid price
http://randyzwitch.com
For max bid, set price that you’re comfortable paying to keep instance running. Cost has been pretty stable at $0.27/hr for a while
1e. Launch Spot Instance – Set security
http://randyzwitch.com
I generally open all ports and only allow my IP address as a simplistic security protocol, since this is a spot instance that I use for a few hours
1f. Launch Spot Instance – Launch
http://randyzwitch.com
1g. Launch Spot Instance – Pick .pem keys
http://randyzwitch.com
If you don’t specify a key pair, you can’t login to the instance!
1g. Launch Spot Instance – Wait for fulfillment
http://randyzwitch.com
If your spot request is fulfilled, it will take about 5-10 minutes to launch
FULL INSTRUCTIONS:HTTP://IPYTHON.ORG/IPYTHON-DOC/DEV/INTERACTIVE/PUBLIC_SERVER.HTML#NOTEBOOK-PUBLIC-SERVER
2. Installing & Configuring Python/IPython Using Anaconda
2a. Installing IPython – SSH into EC2 Instance
http://randyzwitch.com
SSH into EC2 instance, create /temp directory, then download Anaconda (64-bit, Linux).http://continuum.io/downloads
Run script after downloading to install Anaconda: bash Anaconda-1.8.0-Linux-x86_64.sh
2b. Installing IPython – Generate Password
http://randyzwitch.com
In IPython REPL, use the IPython.lib passwd() feature to create a password. Copy password to a text editor for later use.
(No, this is not a real password to use on my EC2 instance!)
2c. Installing IPython – Create nbserver profile
http://randyzwitch.com
Create an IPython profile called ‘nbserver’, which we will use as our profile to create the public Notebook server
2d. Installing IPython – Generate SSL certificate
http://randyzwitch.com
Create a self-signed SSL certificate so that we can use HTTPS on the IPython Notebook
2e. Installing IPython – Modify nbserver profile
http://randyzwitch.com
Navigate to the profile_nbserver directory, then modify the ipython_notebook_config.py file with your certificate location and password.
Place these commands at the top of the file; you don’t need to uncomment any of the lines generated when nbserver profile was created.
2f. Installing IPython – Launch IPython Notebook
http://randyzwitch.com
Launch IPython Notebook with the nbserver profile. At this point, we can now access IPython Notebook from our local browser!
3. Using IPython Notebook from Local Browser
3a. Accessing IPython Notebook – SSL Warning
http://randyzwitch.com
Use any modern browser to access the public DNS of your EC2 image. It is expected to see a warning, as we’re using a self-signed SSL certificate
Ex: https://ec2-54-205-25-4.compute-1.amazonaws.com:8888
3b. Accessing IPython Notebook – Enter Password
http://randyzwitch.com
Sign in using password that you set during the prior step (the actual password, not the SHA1 version)
3c. Accessing IPython Notebook – Success!
http://randyzwitch.com
At this point, you’ve got a fully functional Python clusterenvironment running on EC2, which you are accessing from your local browser
3c. Accessing IPython Notebook – Use 32 cores for ML
http://randyzwitch.com
Running a toy example from Scikit-Learn, we can specify use of 32 cores for the ExtraTreesClassifier