cluster computing for $0.27/hr using amazon ec2 and ipython notebook

22
Cluster Computing Using IPython Notebook with Amazon EC2 http://randyzwitch.com

Upload: randy-zwitch

Post on 10-May-2015

17.652 views

Category:

Technology


0 download

DESCRIPTION

Cluster Computing on Amazon EC2 using IPython Notebook is a cheap way to add parallel processing abilities to your Big Data workflow.

TRANSCRIPT

Page 1: Cluster Computing for $0.27/hr using Amazon EC2 and IPython Notebook

Cluster Computing Using IPython Notebook with Amazon EC2

http://randyzwitch.com

Page 2: Cluster Computing for $0.27/hr using Amazon EC2 and IPython Notebook

1. Launching EC2 Cluster Instance

Page 3: Cluster Computing for $0.27/hr using Amazon EC2 and IPython Notebook

1. Launch Spot Instance from Spot Instance Menu

http://randyzwitch.com

Page 4: Cluster Computing for $0.27/hr using Amazon EC2 and IPython Notebook

1b. Launch Spot Instance – Ubuntu Server 12.04LTS for HVM Instances

http://randyzwitch.com

Page 5: Cluster Computing for $0.27/hr using Amazon EC2 and IPython Notebook

1c. Launch Spot Instance – Choose cc2.8xl instance

http://randyzwitch.com

Page 6: Cluster Computing for $0.27/hr using Amazon EC2 and IPython Notebook

1d. Launch Spot Instance – Set bid price

http://randyzwitch.com

For max bid, set price that you’re comfortable paying to keep instance running. Cost has been pretty stable at $0.27/hr for a while

Page 7: Cluster Computing for $0.27/hr using Amazon EC2 and IPython Notebook

1e. Launch Spot Instance – Set security

http://randyzwitch.com

I generally open all ports and only allow my IP address as a simplistic security protocol, since this is a spot instance that I use for a few hours

Page 8: Cluster Computing for $0.27/hr using Amazon EC2 and IPython Notebook

1f. Launch Spot Instance – Launch

http://randyzwitch.com

Page 9: Cluster Computing for $0.27/hr using Amazon EC2 and IPython Notebook

1g. Launch Spot Instance – Pick .pem keys

http://randyzwitch.com

If you don’t specify a key pair, you can’t login to the instance!

Page 10: Cluster Computing for $0.27/hr using Amazon EC2 and IPython Notebook

1g. Launch Spot Instance – Wait for fulfillment

http://randyzwitch.com

If your spot request is fulfilled, it will take about 5-10 minutes to launch

Page 11: Cluster Computing for $0.27/hr using Amazon EC2 and IPython Notebook

FULL INSTRUCTIONS:HTTP://IPYTHON.ORG/IPYTHON-DOC/DEV/INTERACTIVE/PUBLIC_SERVER.HTML#NOTEBOOK-PUBLIC-SERVER

2. Installing & Configuring Python/IPython Using Anaconda

Page 12: Cluster Computing for $0.27/hr using Amazon EC2 and IPython Notebook

2a. Installing IPython – SSH into EC2 Instance

http://randyzwitch.com

SSH into EC2 instance, create /temp directory, then download Anaconda (64-bit, Linux).http://continuum.io/downloads

Run script after downloading to install Anaconda: bash Anaconda-1.8.0-Linux-x86_64.sh

Page 13: Cluster Computing for $0.27/hr using Amazon EC2 and IPython Notebook

2b. Installing IPython – Generate Password

http://randyzwitch.com

In IPython REPL, use the IPython.lib passwd() feature to create a password. Copy password to a text editor for later use.

(No, this is not a real password to use on my EC2 instance!)

Page 14: Cluster Computing for $0.27/hr using Amazon EC2 and IPython Notebook

2c. Installing IPython – Create nbserver profile

http://randyzwitch.com

Create an IPython profile called ‘nbserver’, which we will use as our profile to create the public Notebook server

Page 15: Cluster Computing for $0.27/hr using Amazon EC2 and IPython Notebook

2d. Installing IPython – Generate SSL certificate

http://randyzwitch.com

Create a self-signed SSL certificate so that we can use HTTPS on the IPython Notebook

Page 16: Cluster Computing for $0.27/hr using Amazon EC2 and IPython Notebook

2e. Installing IPython – Modify nbserver profile

http://randyzwitch.com

Navigate to the profile_nbserver directory, then modify the ipython_notebook_config.py file with your certificate location and password.

Place these commands at the top of the file; you don’t need to uncomment any of the lines generated when nbserver profile was created.

Page 17: Cluster Computing for $0.27/hr using Amazon EC2 and IPython Notebook

2f. Installing IPython – Launch IPython Notebook

http://randyzwitch.com

Launch IPython Notebook with the nbserver profile. At this point, we can now access IPython Notebook from our local browser!

Page 18: Cluster Computing for $0.27/hr using Amazon EC2 and IPython Notebook

3. Using IPython Notebook from Local Browser

Page 19: Cluster Computing for $0.27/hr using Amazon EC2 and IPython Notebook

3a. Accessing IPython Notebook – SSL Warning

http://randyzwitch.com

Use any modern browser to access the public DNS of your EC2 image. It is expected to see a warning, as we’re using a self-signed SSL certificate

Ex: https://ec2-54-205-25-4.compute-1.amazonaws.com:8888

Page 20: Cluster Computing for $0.27/hr using Amazon EC2 and IPython Notebook

3b. Accessing IPython Notebook – Enter Password

http://randyzwitch.com

Sign in using password that you set during the prior step (the actual password, not the SHA1 version)

Page 21: Cluster Computing for $0.27/hr using Amazon EC2 and IPython Notebook

3c. Accessing IPython Notebook – Success!

http://randyzwitch.com

At this point, you’ve got a fully functional Python clusterenvironment running on EC2, which you are accessing from your local browser

Page 22: Cluster Computing for $0.27/hr using Amazon EC2 and IPython Notebook

3c. Accessing IPython Notebook – Use 32 cores for ML

http://randyzwitch.com

Running a toy example from Scikit-Learn, we can specify use of 32 cores for the ExtraTreesClassifier