1433427145-setting up a virtual cluster - adm 201

15
Setting Up a Virtual Cluster For ADM 201 – Hadoop Operations: Cluster Administration

Upload: manish-kumar

Post on 10-Sep-2015

229 views

Category:

Documents


8 download

DESCRIPTION

hadoop cluster setup guide

TRANSCRIPT

  • Setting Up a Virtual Cluster

    For ADM 201 Hadoop Operations: Cluster Administration

  • May 2015

    PROPRIETARY AND CONFIDENTIAL INFORMATION 2 2015 MapR Technologies, Inc. All Rights Reserved.

    Setting Up a Virtual Cluster You will get the most out of the MapR online Hadoop training if you have access to a cluster to use for the course labs. Completing the hands-on labs will ensure that your learning time investment produces a valuable outcome. If you do not have access to a physical cluster, there are a few ways you can access clusters in a virtual environment. Two of these Google Cloud Platform and Amazon Web Services are presented below. Follow these instructions to set up a virtual cluster using the platform of your choice.

    Contents

    Setting Up a Virtual Cluster on GCP (Google Cloud Platform) 3

    Sign In to Your Google Account 3 Redeem the $500 credit and apply it to your account 3 Set Up Your Project 4 Install the Google Cloud SDK (Linux or Mac OS) 5 Install the Google Cloud SDK (Windows) 5 Download and Run MapR Scripts 6 SSH into the cluster 7 Managing your GCP Nodes 8

    Setting Up a Virtual Cluster on AWS (Amazon Web Services) 8

    Create an AWS Account 8 Configure Virtual Private Cloud (VPC) Networking 9 Create AWS Virtual Machine Instances for Hadoop Installation 10 Create an AWS VM Instance for NFS Access 12 Log in to AWS Nodes 14 Managing your AWS Nodes 14 Terminating Your Instances and EBS Storage 15

  • May 2015

    PROPRIETARY AND CONFIDENTIAL INFORMATION 3 2015 MapR Technologies, Inc. All Rights Reserved.

    Setting Up a Virtual Cluster on GCP (Google Cloud Platform) To make getting access to a virtual lab environment as easy as possible, MapR and Google have partnered to offer each training course user a $500 credit to be used for access to the Google Cloud Platform virtual environment.

    NOTE: Support for the Google Cloud Platform is provided by Google. If you encounter difficulties during the setup or use of the Google Cloud Platform, visit https://support.google.com/cloud.

    Sign In to Your Google Account To set up the GCP cluster, you must be signed in to a Google account. If you already have a Google account, sign in before continuing.

    If you do not have a Google account, sign up at https://console.developers.google.com, and follow the instructions to create a new Google account.

    Redeem the $500 credit and apply it to your account 1. Make sure you are signed into your Google account, and point your browser to:

    https://cloud.google.com/partners/partnercredit/index?promocode=MAPR&pcn=1

    2. Click the Redeem Here button.

    3. Complete the Google Cloud Platform Starter Credit form, populating all required fields. You must enter a valid email address to which you have ready access, as you will be sent a redemption code. When the fields are complete, enter the CAPTCHA phrase and click Submit.

    4. Check your email account for an email from [email protected], and click on the Redeem now button in the email.

    5. Read and agree to the terms of service.

  • May 2015

    PROPRIETARY AND CONFIDENTIAL INFORMATION 4 2015 MapR Technologies, Inc. All Rights Reserved.

    6. Click the create a billing account link, which will open a new browser tab (keep the Redeem promotion code tab open while you set up the billing account).

    NOTE: You will need to provide a credit card as part of the billing account, however, you will not be billed. When your $500 credit is depleted, your instances will be paused and youll have the option to upgrade to a paid account.

    7. Provide the required information, then click Accept and start free trial at the bottom of the

    page. This will automatically create a project called My First Project. Wait until the page refreshes and you see the Project Dashboard.

    8. Go back to the Redeem Promotion Code brower tab and click Continue. The Promotion Code field should populate automatically with the code that was sent to your email address.

    Click Continue, then Redeem, and then Done. This will bring you back to the Google Developers Console, in the billing view. You can view account billing any time by clicking the gear icon in the upper right and selecting Billing accounts.

    NOTE: It can take up to 48 hours for the credit to appear on your account, though it is generally much faster.

    Set Up Your Project Now that your account is set up, you can create a project for the MapR training labs.

    1. From the Select a project pull-down menu at the top of the screen, select Manage all projects. This will bring you to the project management view of the Developers Console.

    2. Click on the pen to the right of the My First Project line. Change the name to mapr-training,

    and click Rename.

    3. Click the project name to open the project dashboard. In the left-side pane, click APIs & auth and then APIs. This will take you to a list of APIs that can be enabled.

    4. In the Google Cloud APIs section, click Compute Engine API, then Enable API. Wait while the Google Compute Engine is enabled.

    5. In the left-side pane, click Overview and wait for the screen to refresh. Look for the Project ID at the top of the main pane (for example, Project ID: peak-orbit-96011). Remember the Project ID, as you will need it at a later step.

  • May 2015

    PROPRIETARY AND CONFIDENTIAL INFORMATION 5 2015 MapR Technologies, Inc. All Rights Reserved.

    Install the Google Cloud SDK (Linux or Mac OS) 1. Point your browser to https://cloud.google.com/sdk/, and click the Linux/Mac OS X link.

    2. Follow the Installation and Quick Start instructions provided.

    Install the Google Cloud SDK (Windows) 1. Point your browser to https://cloud.google.com/sdk/, and click the Alternative Methods link.

    2. Follow the instructions for Installation on Windows using Cygwin.

    3. Leave the default settings for the Cygwin installer (install from internet, root directory, install for all users, local package directory, direct connection), and pick a mirror site. This will launch a window that allows you to select Cygwin packages.

    NOTE: For these instructions to work properly, you must install the optional packages listed in the next step.

  • May 2015

    PROPRIETARY AND CONFIDENTIAL INFORMATION 6 2015 MapR Technologies, Inc. All Rights Reserved.

    4. For the following packages, toggle the installation setting from skip to the version number (for example, 2.1.4-1):

    o Devel > git: distributed version control system o Interpreters > python: python language interpreter o Net > curl: multi-protocol file transfer tool

    See the screenshot below for an example.

    5. Proceed with the installation and let it run until it finishes. This may take some time.

    6. Launch Cygwin from Start > All Programs > Cygwin > Cygwin Terminal.

    7. At the Cygwin command prompt enter:

    curl https://sdk.cloud.google.com | bash

    This downloads the Google Cloud SDK and will install it on your system.

    8. After the installation completes, restart the Cygwin terminal by typing exit at the prompt and re-launching Cygwin. Enter the following command at the terminal prompt to authenticate:

    gcloud auth login --no-launch-browser

    The command will generate a link, and then wait for you to enter a verification code. Follow these steps to authenticate:

    a. Copy the link that is generated by the command.

    b. Open a new web browser window and paste the link into the address bar. Grant the requested permissions. The page that displays will show your verification code.

    c. Copy the code and paste this back into the Cygwin terminal at the Enter verification code: prompt.

    Download and Run MapR Scripts The instructions below are the same for Linux, Mac OS X or Windows. Use Cygwin as the terminal program for Windows.

    1. Open a terminal window and download the MapR code repository that contains the scripts that spin up the preconfigured MapR clusters for you to complete your training exercises:

    $ git clone https://github.com/mapr/gce

    If the terminal prompts with a request to create a subdirectory called gce under the home directory, authorize the request.

  • May 2015

    PROPRIETARY AND CONFIDENTIAL INFORMATION 7 2015 MapR Technologies, Inc. All Rights Reserved.

    2. At the terminal prompt change to the gce directory, which contains the MapR scripts:

    $ cd gce

    3. At the terminal prompt, enter:

    $ unset CLOUDSDK_PYTHON

    4. At the terminal prompt, enter this command to make mapr-training the active project:

    $ gcloud config set project mapr-training

    5. If you have any anti-virus or firewall software running, disable that now.

    6. At the terminal prompt, enter the command below to launch a standard MapR cluster that can be used for training. Note the following:

    The argument is the Project ID of your Google Cloud project (for example, peak-orbit-96011). You can obtain the Project ID from the Overview section of the Google Developers Console.

    For , you can specify any name that you want.

    The command may generate some warnings of the format WARNING: Lists should be separated by commas. If these warnings appear, they can be ignored.

    Use this command to launch the cluster:

    ./launch-admin-training-cluster.sh -project --cluster --mapr-version 4.0.1 --config-file 4node_yarn.lst --image centos-6 --machine-type n1-standard-2 --persistent-disks 1x256

    SSH into the cluster 1. Once the cluster launch has completed, return to the Google cloud management console at

    https://console.developers.google.com/project.

    2. Click on the MapR training project to open it.

    3. From the left-side menu select Compute > Compute Engine > VM Instances.

    On the screen that appears, you will see a status view showing the nodes that you just set up. To the far right, there will be a button labeled SSH. To launch a new terminal window that will perform passwordless SSH login and authentication into the MapR cluster, simply click on the appropriate SSH button.

  • May 2015

    PROPRIETARY AND CONFIDENTIAL INFORMATION 8 2015 MapR Technologies, Inc. All Rights Reserved.

    Managing your GCP Nodes Google charges by the minute for instances as long as they are running. You dont need to keep your nodes running while you are not performing tasks in the lab. You can safely stop your instances while you are not using them and then restart them when needed. This will ensure that you are only charged for time when you are using your nodes to perform lab exercises. The Public IPs of your VMs may change when stopped, but the internal IPs will remain consistent. You should check the Public IP address and note any changes, but you will not need to re-check the VM hostnames.

    1. Log into your Google account.

    2. Point your browser to https://console.developers.google.com.

    3. From the Select a project pull-down menu at the top of the page, select your project.

    4. From the left-side navigation pane, select Compute > Compute Engine > VM Instances.

    5. Check the box next to any instance(s) you wish to start or stop. At the top of the page, click Stop to stop the selected instances, or Start to start them.

    When you are finished with a project, you can delete it. This will stop all billing and terminate any Compute Engine instances. To delete a project:

    1. Log into your Google account.

    2. Point your browser to https://console.developers.google.com. The dashboard should display your projects. If it does not, open the Select a project pull-down menu at the top of the page and select Manage all projects.

    3. Locate your project in the list, and click the trashcan icon on the far right.

    4. A pop-up window will appear explaining what happens when you delete a project. If you want to continue deletion, enter your Project ID in the field at the bottom, and click Delete Project.

    When a project is deleted, project owners will be notified by email. Any project owner can undo the deletion within 7 days. After the 7-day waiting period the project will be permanently deleted, including all data, billing and permissions information. After this point, the project cannot be restored.

    Setting Up a Virtual Cluster on AWS (Amazon Web Services) This procedure will show you how to create your lab environment in AWS for the MapR Hadoop online training. For a classroom or virtual instructor-led training session, these AWS environments will already be set up for you and your instructor will give you details on how to access your lab environment.

    Create an AWS Account If you already have an AWS account, you can skip to the next section. Note that to set up an AWS account, you will need to provide your email address, billing information (credit card), and a phone number where you can be contacted.

    1. Point your web browser to http://aws.amazon.com.

    2. Click the Sign Up button at the top right-hand side of the web page.

  • May 2015

    PROPRIETARY AND CONFIDENTIAL INFORMATION 9 2015 MapR Technologies, Inc. All Rights Reserved.

    3. Enter your email address, select the I am a new user radio button, and click Sign in using our secure server.

    4. Fill out the Login Credentials web form, and then click Continue.

    5. Fill out the Contact Information web form and click Create Account and Continue.

    6. Fill out the Payment Information web form and click Continue.

    7. Fill out the Identity Verification web form and click Call Me Now. Once you reply to the phone call using your 4-digit code from this web form, click Continue to select your Support Plan.

    8. Fill out the Support Plan web form (note you will not need support services from Amazon in order to run the labs in this class). Then click Continue.

    Your AWS account is now provisioned and you can begin setting up the virtual machines for your course.

    Configure Virtual Private Cloud (VPC) Networking AWS provides two types of network configurations: VPC and classic. The lab guides for MapR courses have been written using the recommended VPC network. The configuration steps are below.

    1. Point your web browser to http://aws.amazon.com.

    2. From the My Account drop-down list, select AWS Management Console.

    3. Enter your email address and password, and click Sign in using our secure server.

    4. In the Networking section of your AWS management console, click the VPC link.

    5. In the Virtual Private Cloud section of your navigation pane, click Your VPCs.

    6. Click Create VPC and fill out the web form as follows:

    a. Name tag: mapr-odt-vpc

    b. CIDR block: 10.0.0.0/16

    c. Tenancy: Default

    Then click Yes, Create.

    7. In the Virtual Private Cloud section of your navigation pane, click Subnets.

    8. Click Create Subnet and fill out the web form as follows:

    a. Name tag: mapr-odt-subnet

    b. VPC: mapr-odt-vpc

    c. Availability Zone: No Preference

    d. CIDR block: 10.0.0.0/24

    Then click Yes, Create.

    9. Select the mapr-odt-subnet checkbox and click Modify Auto-Assign Public IP.

    a. Select the Enable auto-assign Public IP checkbox.

    b. Click Save.

  • May 2015

    PROPRIETARY AND CONFIDENTIAL INFORMATION 10 2015 MapR Technologies, Inc. All Rights Reserved.

    10. In the Virtual Private Cloud section of your navigation pane, click Route Tables.

    11. Click Create Route table and fill out the web form as follows:

    a. Name tag: mapr-odt-routes

    b. VPC: mapr-odt-vpc

    Then click Yes, Create.

    12. In the Virtual Private Cloud section of your navigation pane, click Internet Gateways.

    13. Click Create Internet Gateway and fill out the web form as follows:

    a. Name tag: mapr-odt-gw

    b. Click Yes, Create

    c. Select the checkbox next to the mapr-odt-gw object and click Attach to VPC

    d. From the VPC drop-down list, select mapr-odt- vpc and click Yes, Attach

    14. In the Virtual Private Cloud section of your navigation pane, click Route Tables.

    15. Select the mapr-odt-routes object and select the Routes tab. Click Edit and fill out the web form as follows:

    a. Designation: 0.0.0.0/0

    b. Target: mapr-odt-gw

    Then click Save.

    16. In the Virtual Private Cloud section of your navigation pane, click Subnets.

    17. Select the mapr-odt-subnets object and select the Route Table tab. Click Edit and fill out the web form as follows:

    a. From the Change To drop-down list, select mapr-odt-routes.

    b. Click Save.

    Create AWS Virtual Machine Instances for Hadoop Installation You need to provision at least 3 virtual machines in AWS in order to complete the labs, and you can provision more if youd prefer. More VMs will allow you experiment with different cluster service layout plans and will give you better performance when running jobs. The VMs needed for the lab environment are not included in the Free Tier, however, and will accrue a nominal charge during the expected time to complete the lab exercises. More VMs will also result in a higher charge for their use. Read the Managing your Nodes section of this guide to learn about minimizing the EC2 use charges.

    1. Point your web browser to http://aws.amazon.com.

    2. From the My Account drop-down list, select AWS Management Console.

    3. Enter your email address and password, and click Sign in using our secure server.

    4. In the Compute section of your AWS management console, click the EC2 link.

  • May 2015

    PROPRIETARY AND CONFIDENTIAL INFORMATION 11 2015 MapR Technologies, Inc. All Rights Reserved.

    5. From the drop-down list in the upper right-hand corner of the EC2 web page (just to the right of your name), select the availability zone nearest to where you are physical located. Note that an availability zone will already be selected based on the contact information you provided when you provisioned your AWS account.

    6. In the INSTANCES section of the navigation page, click the Instances link.

    7. Click Launch Instance.

    8. In the Step 1: Choose an Amazon Machine Image web page, scroll down to the bottom of the page and select the 64-bit version of an image of Red Hat version 6.4 or 6.5. Note: Red Hat 7.0 is NOT currently supported.

    9. In the Step 2: Choose an Instance Type web page, select the checkbox for m3.large, then click Next: Configure Instance Details.

    10. In the Step 3: Configure Instance Details web page, fill out the form as follows:

    a. Number of instances: 3

    b. Purchasing option: leave Request Spot Instances unchecked

    c. Network: mapr-odt-vpc

    d. Subnet: mapr-odt-subnet

    e. Auto-assign Public IP: enabled

    f. IAM role: None

    g. Shutdown behavior: Stop

    h. Enable termination protection: Check the protect against accidental termination checkbox

    i. Monitoring: leave Enable CloudWatch detailed monitoring unchecked

    j. Tenancy: shared tenancy (multi-tenant hardware)

    Then click Next: Add Storage.

    11. In the Step 4: Add Storage web page:

    a. Click Add New Volume.

    b. Check the Delete on termination checkbox, and leave all the other defaults as they are.

    c. Repeat the above steps 2 more times to add a total of 3 EBS volumes to your instances.

    Then click Next: Tag Instance.

    12. In the Step 5: Tag Instance web page, enter mapr-install-node in the Value field. Click Next: Configure Security Group.

    13. In the Step 6: Configure Security Group web page, select the Create new security group radio button. Enter mapr-sg in the Security group name: field, and perform the following steps:

    a. Click Add Rule.

    b. From the Type drop-down list, select All TCP. From the Source drop-down list, select Anywhere.

  • May 2015

    PROPRIETARY AND CONFIDENTIAL INFORMATION 12 2015 MapR Technologies, Inc. All Rights Reserved.

    c. Click Add Rule.

    d. From the Type drop-down list, select All UDP. From the Source drop-down list, select Anywhere.

    e. Click Add Rule.

    f. From the Type drop-down list, select All ICMP. From the Source drop-down list, select Anywhere.

    g. Click Review and Launch.

    14. In the Step 7: Review Instance Launch web page, review your instance launch details and then click Launch.

    15. In the Select an existing key pair or create a new key pair pop-up window, perform one of the following steps:

    Select Create a new key pair and enter mapr-odt-keypair in the Key pair name: text field. Then, click the Download Key Pair button. OR

    Select Select an existing key pair and select the key pair from the key pair name drop-down list.

    IMPORTANT: make sure you save a copy of the new or existing key pair file in a location where you can reference it throughout your training. If you lose this file, you will lose access to your AWS instances and will have to create new ones.

    16. Click Launch Instances.

    17. In the Launch Status web page, click View Instances.

    18. Wait for the instances to appear in the running state, and for the status checks to complete.

    19. Log the IP address of the VMs for later use.

    Create an AWS VM Instance for NFS Access You will need to launch an instance that will serve as your NFS client. This is the simplest instance, and will qualify for Free Tier use. Use the following information to launch this instance in AWS.

    1. Point your web browser to http://aws.amazon.com.

    2. From the My Account drop-down list, select AWS Management Console.

    3. Enter your email address and password, and click Sign in using our secure server button.

    4. In the Compute section of your AWS management console, click the EC2 link.

    5. In the INSTANCES section of the navigation pane, click Instances.

    6. Click Launch an Instance.

    7. In the Step 1: Choose an Amazon Machine Image web page, scroll down to the bottom of the page and select the 64-bit version of an image of Red Hat version 6.4 or 6.5. Note: Red Hat 7.0 is NOT currently supported.

  • May 2015

    PROPRIETARY AND CONFIDENTIAL INFORMATION 13 2015 MapR Technologies, Inc. All Rights Reserved.

    8. In the Step 2: Choose an Instance Type web page, select the checkbox for t1.micro, then click Next: Configure Instance Details.

    9. In the Step 3: Configure Instance Details web page, fill out the form as follows:

    a. Number of instances: 1

    b. Purchasing option: leave Request Spot Instances unchecked

    c. Network: mapr-odt-vpc

    d. Subnet: mapr-odt-subnet

    e. Auto-assign Public IP: enabled

    f. IAM role: None

    g. Shutdown behavior: Stop

    h. Enable termination protection: Check the protect against accidental termination checkbox

    i. Monitoring: leave Enable CloudWatch detailed monitoring unchecked

    j. Tenancy: shared tenancy (multi-tenant hardware)

    Then click Next: Add Storage.

    10. In the Step 4: Add Storage web page, click Next: Tag Instance.

    11. In the Step 5: Tag Instance web page, enter MapR-NFS-node in the Value field. Click Next: Configure Security Group.

    12. In the Step 6: Configure Security Group web page:

    a. Select the select an existing security group radio button.

    b. Select the mapr-sg checkbox.

    c. Click Review and Launch.

    13. In the Step 7: Review Instance Launch web page, review your instance launch details and then click Launch.

    14. In the Select an existing key pair or create a new key pair pop-up window:

    a. Select select an existing key pair.

    b. From the key pair name drop-down list, select the mapr-odt-keypair key pair.

    c. Click the I acknowledge that I have access to the selected private key file (name), and that without this file, I wont be able to log into my instance checkbox.

    REMINDER: you must have a copy of this key file in a location where you can reference it throughout the training. If you lose this file, you will lose access to your AWS instances and will have to create new ones.

    15. Click Launch Instances. In the Launch Status web page, click View Instances.

    16. Wait for the instances to appear in the running state, and for the status checks to complete.

    17. Log the IP address of the VMs for later use.

  • May 2015

    PROPRIETARY AND CONFIDENTIAL INFORMATION 14 2015 MapR Technologies, Inc. All Rights Reserved.

    Log in to AWS Nodes In order to log in to your AWS nodes, you will need to use the SSH key pair that you downloaded when launching your instances. There is only one account on your RHEL 6.x instance, called ec2-user, which requires using the SSH key pair to log in.

    1. Open a terminal window on your computer.

    2. Navigate to the location where the SSH key pair file is saved.

    3. Change the permission of the SSH key pair file:

    $ chmod 600 mapr-odt-keypair.pem

    4. Log in as ec2-user:

    $ ssh i mapr-odt-keypair.pem ec2-user@

    5. Switch to user root:

    $ sudo s

    6. Determine and log the internal IP address of the VM instance, and save the result:

    # hostname

    7. Create a mapr user on this VM and set its password:

    # useradd mapr # passwd mapr

    8. Set the root user password:

    # passwd root

    9. Allow password authentication to the VM:

    # vi /etc/ssh/sshd_config

    Change PasswordAuthentication no to PasswordAuthentication yes. Save the changes and exit vi.

    10. Repeat steps 6 8 for all VM instances, and log the hostname of each instance.

    Now you have root access on your RHEL virtual machine instance, and you can proceed with the MapR Hadoop Operations labs.

    Managing your AWS Nodes AWS charges by the hour for instances as long as they are running. You dont need to keep your nodes running while you are not performing tasks in the lab. You can safely stop your instances while you are not using them and then restart them when needed. This will ensure that you are only charged for time when you are using your nodes to perform lab exercises. The Public IPs of your VMs may change when stopped, but the internal IPs will remain consistent. You should check the Public IP address and note any changes, but you will not need to re-check the VM hostnames.

    1. Point your browser to http://aws.amazon.com.

    2. Enter your email address and password, and click Sign in using our secure server.

  • May 2015

    PROPRIETARY AND CONFIDENTIAL INFORMATION 15 2015 MapR Technologies, Inc. All Rights Reserved.

    3. In the Compute section of your AWS management console, click the EC2 link.

    4. In the INSTANCES section of the left-side navigation pane, click the Instances link.

    5. Select the instances that you want to stop, click on Actions, and select Stop.

    6. Click Yes, Stop.

    To restart the instances, repeat these steps but select Start in step 5. Remember, you should check the Public IP settings of your VMs and note any changes to your IP address. The internal IP addresses will remain consistent, so the Passwordless ssh and Hadoop software will still function normally.

    Terminating Your Instances and EBS Storage When you are finished using your AWS nodes for the class exercises, you should terminate your nodes. If you did not select the Delete on Termination box when creating the storage for your nodes, then you will also need to terminate your EBS storage volumes.

    1. Point your browser to http://aws.amazon.com.

    2. Enter your email address and password and click Sign in using our secure server.

    3. In the Compute section of your AWS management console, click the EC2 link.

    4. In the INSTANCES section of the left-side navigation pane, click the Instances link.

    5. Disable termination protection on each instance, individually. Perform these steps on each instance, one at a time:

    a. Select the instance.

    b. Click Actions and select Change Termination Protection.

    c. Select Yes, Disable.

    Repeat these steps for each instance you want to terminate.

    6. Select the instances you want to terminate. Click Actions, and select Terminate.

    7. Click Yes, Terminate.

    If you did not select Delete on Termination when the storage was added, follow the steps below to delete the EBS storage volumes:

    1. In the ELASTIC BLOCK STORE section of the left-side navigation pane, click the Volumes link.

    2. Select the checkbox next to the volumes you want to remove.

    3. Click Actions, and select Delete Volumes.