practical guide of building a hpc cluster

Upload: puteh-solo

Post on 02-Jun-2018

227 views

Category:

Documents


2 download

TRANSCRIPT

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    1/30

    PRACTICAL GUIDE

    OF BUILDING A HPC

    CLUSTERAn Introduction

    Hafizal Yazid, Megat Harun M.A, Azraf Azman,

    Mohd Rizal M., Anwar A.R., Rafhayudi J.

    First Edition

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    2/30

    ii

    Contents

    Preface iv

    1. Introduction 1

    1.1 Strategy 1

    1.2 Hardware Requirement 1

    2. LINUX Environment and Commands 2

    2.1 File System Structure and Paths 2

    2.2 Basic Commands 3

    3. Preparation Steps 4

    3.1 Operating System Installation 4

    3.2 Ethernet Connection 5

    3.3 Setting Network Connection 6

    3.4 Network Connection for Master 6

    3.5 Network Connection for Nodes 7

    3.6 Connection Testing 8

    4. Setting up for Master and Nodes 9

    4.1 Master 9

    4.1.1 Edit ntsysv 9

    4.1.2 Edit rc.local file 9

    4.1.3 Edit hosts file 9

    4.1.4 Edit sync.sh file 10

    4.1.5 Edit bashrc file 11

    4.1.6 Edit ssh_config file 12

    4.1.7 Edit selinux file 12

    4.1.8 Edit yum.config file 12

    4.1.9 Download hpc.tar.gz 12

    4.2 Nodes 13

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    3/30

    iii

    4.2.1 Edit ntsysv 13

    4.2.2 Edit rc.local file 13

    4.2.3 Edit hosts file 14

    4.2.4 Edit ssh_config file 14

    4.2.5 Edit selinux file 15

    5. Assemble of Master and Nodes 15

    5.1 Master 15

    5.2 Nodes 20

    5.3 Add User 24

    Glossary 25

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    4/30

    iv

    Practical Guide of Building a HPC Cluster

    Preface

    Often a document is written by a Gurus whom tends to details out every single aspect

    in order to develop a solid understanding of the subject matter to the reader. There arealways pros and cons. A lengthy description might ruin the interest of the reader and too

    short might only delivered superficial knowledge. A balance between that is the aim of

    this reference and the best person to do it is the newbie. A handful terminology, strategy,

    instruction and list of important command which has been proven useful are the strength

    of this document. Perhaps this document could serves as foundation which attracts the

    readers to dwell on the subject matter.

    The intended readers are the one who has problem to be solved, as mine in the simulation

    of the radiation transport. As a researcher who involved in simulation work, using stand-

    alone PC could take a lengthy period of time. The advancement of PC today enables us to

    group them together and work as one unit or cluster. There are lots of references which

    sometimes create ambiguity instead of clear crystal information on building up the

    cluster. It is my hope to make this document as quick reference which provides sufficient

    clear information. By following the given steps should bring the readers to have their

    own cluster and finally solve their very own problems.

    A freeware CenTos 6.3 is used as the operating system and dedicated simulation software

    is used to test the parallel computational approach.

    Hafizal Yazid, PhD.

    January 1, 2014.

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    5/30

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    6/30

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    7/30

    3

    2.2 Basic commands

    Some useful commands are given in Table 2.2 as quick reference.

    Table 2.2 Commands and description

    Commands Description

    ls -l Display a long listing of the file information which consists ofpermissions, links, owner group, file size, creation date and file name.

    pwd The working directory is displayed.

    cd directoryname Change directory from present directory to another specified directory.

    cd .. Move back to previous directory.

    cd / Move to root directory.

    mkdir directoryname To make a new directory.

    rmrf filename Recursively remove files and directories.scpr filename Secure copy recursively.

    mv filename To move a file. File at the original location is not maintained.

    nano filename File editing using nano text editor.

    pico filename File editing using pico text editor.

    CTRL-o Save a file.

    CTRL-x To exit from the text editor.

    q Quit the running process.

    top -a To display the running processes and getting the process ID.

    kill -9 process id To kill processes and all its child processes based on process ID.

    cat /proc/cpuinfo To show cpu info.

    cat /proc/meminfo To show memory info.

    su To switch user.ssh nodename To switch between nodes (master/ slaves).

    chmod 770 filename Allow the user and group read, write and execute and others no access.

    chown newowner Change the owner of the directory.

    3. Preparation steps

    3.1 Operating system installation

    A Linux based operating system, CenTos 6.3 is used in this work. Currently, it is

    downloadable athttp://www.centos.org.It is a freeware, open source operating system. A

    step by step instruction is given below:

    BIOS is set to boot from DVD drive.

    Centos CD is inserted into DVD drive. Restart the PC.

    http://www.centos.org/http://www.centos.org/http://www.centos.org/http://www.centos.org/
  • 8/11/2019 Practical Guide of Building a HPC Cluster

    8/30

    4

    Select Install or upgrade Centos. Skip to media test. Select Next.

    Select English during installation process. Select U.S English for keyboard.

    Select Basic storage devices. Select Yes, discard any data.

    Name the computer. E.g master.island1.mint.gov.my. Select Next.

    Select city for time zone. E.g. Asia/Kuala Lumpur. Select Next.

    Key in Root password. Then confirm. Select Next.

    Select Use All Space for installation. Select Next.

    Select Write changes to disk. (formatting of hard disk).

    Select Software Development Workstation. Mark Centos at the bottom. Select Next.

    Dependency check..Installation CentOS 6 (1477 packages).

    Reboot. Welcome front page is displayed.

    Select Forward.

    Select Agree the licence agreement. Select Forward.

    First user is created. Username, Full Name, Password, Confirm Password are filled in.

    Select Forward.

    Select Enable Kdump. Select Yes to reboot. Select OK.

    By following the steps, CenTos 6.3 is successfully installed as the operating system. The

    same steps are followed for master and nodes. The only difference is the name of the

    computer. Before proceed the installation, it is advisable to fix the PC name. For

    example:

    Master: master.island1.mint.gov.my

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    9/30

    5

    Nodes: node1.island1.mint.gov.my

    node2.island1.mint.gov.my

    node3.island1.mint.gov.my

    Beside PC name, communication within cluster is achieved through internal address and

    the addresses are given for each PC. For example:

    Master: 192.168.0.1

    Node1: 192.168.0.2

    Node2: 192.168.0.3

    Node3: 192.168.0.4

    3.2 Ethernet connection

    For master, two Ethernet ports are used namely eth0 and eth1. For eth1 (built-in port), it

    is connected to the wall (outside network) and for eth0 (network card), it is connected to

    the Fast Ethernet Switch. For nodes, the Ethernet port, eth0 (built-in) is connected to the

    Fast Ethernet Switch. By using this kind of connection, only master is detected by theoutside network.

    3.3 Setting network connection

    If your PC has a static IP address given by your administrator, you have to use that IP in

    the setting. If that is not the case, then your PC has a dynamic IP address (typical case for

    safety reason). Anyhow, you can know your IP by using ifconfig command as shown in

    Figure 3.1. In this example, the IP address is 10.10.2.137.

    In this work, we assume our PC has a dynamic IP address as the above IP might be

    changed by the administrator. The next step is to set-up the network connection for each

    nodes and master. By right clicking on the network logon, the connection is accessed.

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    10/30

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    11/30

    7

    Select Apply, key-in password for root, Authenticate and Close.

    A step-by-step instruction is given below for System_eth1.

    Select System_eth1.

    Ensures the Connect automatically button is checked.

    Select IPv4 settings.

    Select Automatic (DHCP) addresses only.

    Key-in your DNS servers (example): 10.10.150.39, 8.8.8.8

    Ensures Require IPv4 button is checked.

    Ensures Available to all users is unchecked.

    Select Apply, key-in password for root, Authenticate and Close.

    3.5 Network connection for nodes

    For nodes, internal IP (eth0) only is considered. The instruction is given below.

    Select System_eth0.

    Ensures the Connect automatically button is checked.

    Select IPv4 settings.

    Select manual

    Address Netmask Gateway192.168.0.2 255.255.255.0 192.168.0.1

    Key-in your DNS servers (example): 10.10.150.39, 8.8.8.8

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    12/30

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    13/30

    9

    4.1.2 Edit rc.local file

    The file is opened using a text editor. In this work, nano text editor is used.

    Command: nano /etc/rc.local

    You shall see the below line.

    touch /var/lock/subsys/local

    Add in below lines.

    setenforce 0

    /etc/rc.d/init.d/iptables stop

    disable selinux

    Save and exit.8

    Command: Ctrl O and Ctrl X

    4.1.3 Edit hosts file

    Command: nano /etc/hosts

    You shall see the below lines.

    #e.g. 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4

    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

    Add in below lines.

    192.168.0.1 master master.island1.mint.gov.my

    192.168.0.2 node1 node1.island1.mint.gov.my

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    14/30

    10

    192.168.0.3 node2 node2.island1.mint.gov.my

    192.168.0.4 node3 node3.island1.mint.gov.my

    Save and exit.9

    4.1.4 Edit sync.sh file

    Command: CD /etc/cron.daily

    At the cron.daily directory, create sync.sh file.

    Command: nano

    Exit and Save as sync.sh file. The status of sync.sh file is checked.

    Command: lsl

    -rw -r - - r - -. 1 root root sync.sh

    Change the mode so that you can edit.

    Command: chmod 755 /etc/cron.daily/sync.sh

    Check again.

    Command: lsl

    -rwxr - xr - x . 1 root root sync.sh (green color)

    Now you can edit the file.

    Command: nano /etc/cron.daily/sync.sh

    Add in below lines.10

    for node in node1 node2 node3

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    15/30

    11

    do

    scp /etc/passwd $node:/etc/passwd

    scp /etc/shadow $node:/etc/shadow

    scp /etc/group $node:/etc/group

    scp /etc/hosts $node:/etc/hosts

    done

    Save and exit.11

    4.1.5

    Edit bashrc12 file

    Command: nano /etc/bashrc

    Add in below lines.

    alias pico=nano

    PATH=$PATH:/usr/local/maui/sbin:/usr/local/maui/bin

    Save and exit.

    4.1.6 Edit ssh_config file

    Command: nano /etc/ssh/ssh_config

    Make changes to:

    StrictHostKeyChecking no

    UserKnownHostsFile=/dev/null

    Save and exit.

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    16/30

    12

    4.1.7 Edit selinux file

    Command: nano /etc/sysconfig/selinux

    Add in below lines.

    SELINUXTYPE=targeted

    SELINUX=disabled

    Save and exit.

    4.1.8 Edit yum.config file

    Command: nano /etc/yum.conf

    Add in below lines.

    proxy=http://xxx.xx.xx:port

    For example: ( proxy=http://10.10.150.102:8080)

    4.1.9 Download hpc.tar.gz

    Download the file from the root directory [root@master~].

    Command: wget http://10.10.7.200/hpc.tar.gz15

    Command: scp address:/folder/filename address:/folder/16

    Untar the file at the current directory.

    Command: tarxf hpc.tar.gz

    The hpc file is then will be available. This hpc file contains files which are required in

    the cluster set up. Then switch to nodes.

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    17/30

    13

    4.2 Nodes

    4.2.1 Edit ntsysv7

    Command: ntsysv

    Disable all abrt, cups (for printer) and iptables (for firewall). Enable nfs.

    4.2.2 Edit rc.local file

    Mount master into nodes. For example mount 10.10.7.xxx:/home /home or mount

    master:/home /home.

    Command: nano /etc/rc.local

    You shall see the below line.

    touch /var/lock/subsys/local

    Add in below lines.

    setenforce 0

    /etc/rc.d/init.d/iptables stop

    disable selinux

    mount putehsolo:/home /home

    Save and exit.8

    Command: Ctrl O and Ctrl X

    4.2.3 Edit hosts file

    Command: nano /etc/hosts

    You shall see the below lines.

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    18/30

    14

    #e.g. 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4

    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

    Add in below lines.

    192.168.0.1 master master.island1.mint.gov.my

    192.168.0.2 node1 node1.island1.mint.gov.my

    192.168.0.3 node2 node2.island1.mint.gov.my

    192.168.0.4 node3 node3.island1.mint.gov.my

    Save and exit.9

    (Please ensures this file is similar to master)

    4.2.4 Edit ssh_config file

    Command: nano /etc/ssh/ssh_config

    Make changes to:

    StrictHostKeyChecking no

    UserKnownHostsFile=/dev/null

    Save and exit.

    4.2.5 Edit selinux13 file

    Command: nano /etc/sysconfig/selinux

    Add in below lines.

    SELINUXTYPE=targeted

    SELINUX=disabled

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    19/30

    15

    Save and exit.

    5. Assemble of master and nodes

    At this stage you can assemble them to form a cluster through the fast Ethernet switch.

    By default you can ping nodes from master and vice versa to check the connectivity. By

    using ssh command you can log in to a remote computer (from master to nodes or vice

    versa) and work on it remotely. For example:

    Command: ssh node1

    Switch back to master using the same command. For example:

    Command: ssh master

    Then the set-up is proceeds on the master in which previously has hpc file.

    5.1 Master

    Switch to root as the installation proceeds in root as user.

    Command: su

    As the hpc file is located in root directory, switch to root directory and list the files in that

    directory.

    Command: cd /root

    Command: ls

    [root@master ~]# ls

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    20/30

    16

    Open hpc file. There are many files inside that folder. Untar the torque-4.1.2 file.

    Command: tarxf torque-4.1.2

    The file will be extracted.

    [root@master hpc]# ls

    Change to torque-4.1.2 directory and there are many files inside that folder.

    Command: cd torque-4.1.2

    List the files in that folder.

    Command: ls

    [root@master torque-4.1.2]# ls

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    21/30

    17

    Now it is the time to set-up the torque. In this directory run configure, make and make

    install.

    Command: ./configure

    Command: make

    Command: make install

    Then list back the content of the directory.

    Command: ls

    It is now torque packages are listed.

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    22/30

    18

    Now switch to home, apps directory.

    Command: cd /home/apps

    At this directory, make directory named torque.

    Command: mkdir torque

    Copy torque packages into this directory.

    Command: scpr /root/hpc/torque-4.1.2/torque-package* /home/apps/torque

    Then copy mpich2-1.3.2p1 file into apps directory.

    Command: scpr /root/hpc/mpich2-1.3.2p1 /home/apps

    Proceed with the mpich2-1.3.2p1 installation. Go to apps directory and open mpich2-

    1.3.2p1 folder. In this folder, do the installation.

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    23/30

    19

    Command: ./configure

    Command: make

    Command: make install

    Then move back to torque-4.1.2 directory. Three files are required to be copied from

    torque-4.1.2 directory into etc directory.

    Command: cp /root/hpc/torque-4.1.2/contrib./init.d/pbs_mom /etc/init.d/

    Command: cp /root/hpc/torque-4.1.2/contrib./init.d/pbs_server /etc/init.d/

    Command: cp /root/hpc/torque-4.1.2/contrib./init.d/trqauthd /etc/init.d/

    Next check configuration of the files.

    Command: chkconfigadd pbs_server

    Command: chkconfigadd pbs_mom

    Command: chkconfigadd trqauthd

    Then declare the nodes in the server. If the nodes file does not exist, create the file.

    Command: nano /var/spool/torque/server_priv/nodes

    Key-in the following.

    master.island1.mint.gov.my np=2

    node1.island1.mint.gov.my np=2

    node2.island1.mint.gov.my np=2

    node3.island1.mint.gov.my np=2

    Save and exit.

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    24/30

    20

    Then check the server name.

    Command: nano /var/spool/torque/server_name

    Please ensure that in this case the server name is master.island1.mint.gov.my. If that is

    not the case, key in the server name.

    Next is to set-up maui.

    Command: nano /usr/local/maui/maui.cfg

    Key-in the following.

    SERVERHOST master

    Save and exit.

    Command: nano /etc/maui.d

    Key-in the following.

    MAUI_PREFIX=/usr/local/maui

    Save and exit.

    Copy maui.d from maui-3.3.1.

    Command: cp /root/hpc/maui-3.3.1/etc/maui.d /etc/rc.d/init.d/

    5.2 Nodes

    From master, move to node1.

    Command: ssh node1

    Copy hpc.tar.gz folder into the root directory.

    Command: scpr master:/root/hpc.tar.gz node1:/root/

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    25/30

    21

    Untar the file.

    Change the mode so that you can edit if necessary.

    Command: chmod 755 /root/hpc

    Check again.

    Command: lsl

    -rwxr - xr - x . 1 root root

    Now you can edit the file.

    Move to hpc/torque-4.1.2 folder.

    Command: cd /root/hpc/torque-4.1.2

    List the files in that folder.

    Command: ls

    [root@node1 torque-4.1.2]# ls

    Install torque-4.1.2 as previously in master.

    Then the torque packages will be listed.

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    26/30

    22

    Now switch to home, apps directory.

    Command: cd /home/apps

    At this directory, make directory named torque.

    Command: mkdir torque

    Copy torque packages into this directory.

    Command: scpr /root/hpc/torque-4.1.2/torque-package* /home/apps/torque

    Then copy mpich2-1.3.2p1 file into apps directory.

    Command: scpr /root/hpc/mpich2-1.3.2p1 /home/apps

    Proceed with the mpich2-1.3.2p1 installation. Go to apps directory and open mpich2-

    1.3.2p1 folder. In this folder, do the installation.

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    27/30

    23

    Command: ./configure

    Command: make

    Command: make install

    At this stage torque-4.1.2 and mpich2-1.3.2p1 has been properly configured at node1.

    Then declare the server in the nodes using config file.

    Command: nano /var/spool/torque/mom_priv/config

    Key-in the following.

    $pbserver master.island1.mint.gov.my

    $usecp *://

    $logevent 255

    Save and exit.

    Then check the server name.

    Command: nano /var/spool/torque/server_name

    Please ensure that in this case the server name is master.island1.mint.gov.my. If that is

    not the case, key in the server name.

    Move back to master and reboot master and nodes for changes to take effect.

    Then start HPC in master and nodes. In master do the following:

    1. /etc/rc.d/init.d/trqauthd start

    2. /etc/rc.d/init.d/pbs_server start

    3. /etc/rc.d/init.d/maui.d start

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    28/30

    24

    4. /etc/rc.d/init.d/pbs_mom start

    Then move to node to start pbs_mom as following:

    Command: /etc/rc.d/init.d/pbs_mom start or /etc/rc.d/init.d/pbs_mom

    Or just activate from master: ssh nodeX pbs_mom

    In master do the following:

    [root@master~]# pbs_servert create

    [root@master~]#qtermt quick

    Then test the server configuration.

    [root@master~]# qmgrc p s

    [root@master~]# qstatq

    [root@master~]# pbsnodesa

    5.3 Add user

    A new user is added in master using addUser.sh command. Then copy a file named

    keyless.sh from master to user directory. Then run the script in the user directory one

    time only using sh keyless.sh. Example:

    Command: ./keyless.sh master mtec 22

    Everytime when add a new user, please ensure the sync.sh is running to synchronize

    master and nodes.

    At this stage the cluster is ready to be installed with any applications that need to be run

    in parallel. Good luck.

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    29/30

  • 8/11/2019 Practical Guide of Building a HPC Cluster

    30/30

    26

    sshssh (SSH client) is a program for logging into a remote machine and for executing

    commands on a remote machine. It is intended to replace rlogin and rsh, and provide

    secure encrypted communications between two untrusted hosts over an insecure network.

    X11 connections and arbitrary TCP/IP ports can also be forwarded over the secure

    channel. ssh connects and logs into the specified hostnameThe user must prove his/her

    identity to the remote machine using one of several methods depending on the protocol

    version used.

    selinux Security-Enhanced Linux (SELinux) is a Linux kernel security module that

    provides the mechanism for supporting access control security policies, includingUnited

    States Department of Defense-style mandatory access controls(MAC). SELinux is a set

    of kernel modifications and user-space tools that can be added to various Linux

    distributions. Its architecture strives to separate enforcement of security decisions from

    the security policy itself and streamlines the volume of software charged with security

    policy enforcement. The key concepts underlying SELinux can be traced to several

    earlier projects by the United StatesNational Security Agency.It has been integrated into

    the Linux kernel mainline since version 2.6, on 8 August 2003.

    YumYum is nice package manager for RPM-based systems. If you are already using

    yum, you can set up OpenVZ yum repository and install/update OpenVZ software usingyum.

    http://en.wikipedia.org/wiki/Linux_kernelhttp://en.wikipedia.org/wiki/Linux_kernelhttp://en.wikipedia.org/wiki/Linux_Security_Moduleshttp://en.wikipedia.org/wiki/Linux_Security_Moduleshttp://en.wikipedia.org/wiki/United_States_Department_of_Defensehttp://en.wikipedia.org/wiki/United_States_Department_of_Defensehttp://en.wikipedia.org/wiki/United_States_Department_of_Defensehttp://en.wikipedia.org/wiki/United_States_Department_of_Defensehttp://en.wikipedia.org/wiki/Mandatory_access_controlhttp://en.wikipedia.org/wiki/Mandatory_access_controlhttp://en.wikipedia.org/wiki/National_Security_Agencyhttp://en.wikipedia.org/wiki/National_Security_Agencyhttp://en.wikipedia.org/wiki/National_Security_Agencyhttp://en.wikipedia.org/wiki/National_Security_Agencyhttp://en.wikipedia.org/wiki/Mandatory_access_controlhttp://en.wikipedia.org/wiki/United_States_Department_of_Defensehttp://en.wikipedia.org/wiki/United_States_Department_of_Defensehttp://en.wikipedia.org/wiki/United_States_Department_of_Defensehttp://en.wikipedia.org/wiki/Linux_Security_Moduleshttp://en.wikipedia.org/wiki/Linux_kernel