using the weizmann cluster nov 16 2015. overview weizmann cluster connection basics getting a...
TRANSCRIPT
Using the Weizmann Cluster
Nov 162015
Overview
• Weizmann Cluster• Connection Basics• Getting a Desktop View• Working on cluster machines• GPU
• For many more details, please visit: http://math96-lx/
Cluster Structure
• The Weizmann cluster• Two types of machines:
– Workstations: Math-lx : for lightweight interactive sessions (math01-lx, math02-lx ,…, math13-lx)
– Cluster Machines: MCluster01 , MCluster03 : clusters of strong machines designed for the “heavy lifting” (multiple cores, lots of RAM)
– In MCluster01 we have some GPU machines (more on this later)
General Flow
1. Connect to a workstation2. get a GUI using VNC3. Connect to a cluster machine via the
workstation4. Do some work
Connecting to a Workstation:Putty
• SSH : a protocol enabling a secure connection to another host
• Putty is a windows SSH client. • To connect* to one of the math-lx
machines:– Launch Putty– In the session->Host name field, enter
e.g: math05-lx• In the terminal window, enter
user/password as required*Assuming you’re inside the Lab
Getting a Desktop GUI: > VNCserver
• VNC allows remote connection to a desktop• Install VNC on your windows machine. Available
clients: UltraVNC, TightVNC• On remote machine, start a VNC server, as follows:
> vncserver_:<display>_-geometry_<Width>x<Height>
• Example> vncserver :500 -geometry <1280>x<1024>
• Without the <display argument> a port will be chosen arbitrarily and VNC will output the chosen number
These are spaces!
VNC continued
• The <display> number allows concurrent vnc servers (e.g for different users).
• Once the VNC server is running, run UltraVNC on your PC.
• enter in the “vnc server” field : • <your_host>:N where
N=5900+DisplayNumber• Connect• Congratulations! You just started anInteractive session
Ending a session:> vncserver -kill
• Closing a VNC viewer will not end you session! • Instead, enter >vncserver –kill :N
Where N is the chosen display port.
Working on the Cluster
• The machines on the cluster are split into different queues, according to the memory, number of cores, etc. of the machines
• Usually you do not connect explicitly to a specific machine, but request a machine from one of the queues, depending on your needs
• To see the queue load and available queues, go to : http://math96-lx/?page_id=390
• To see load of all machines on a cluster
Connecting to a Cluster Machine
• To request a cluster machine, use:>ssh -X mcluster01 qsh -q amd64g.q -display HOST$DISPLAY
• This does the following:– Submits an interactive session request to be
fullfilled by a machine from the specified queue , in this case amd64g.q
– Forwards the graphics to your host and display• For a GPU machine the queue is gpu.q >ssh -X mcluster01 qsh -q gpu.q -display HOST$DISPLAY• If you need a specific node, add -l hostname=n82
GPU machines
• Currently 6 GPU machines:– node 82 - 2 x 12GB GPU – node 81 - 2 x 12GB GPU– node 80 - 2 x 12GB GPU– node 79 - 2 x 12GB GPU– node 78 - 1 x 6GB GPU– node 73 - 1 x 12GB GPU + 1 x 6GB GPU
512GB RAM
256GB RAM
Monitoring GPU memory• GPU memory is a shared• In principle, as long as there’s free memory, it
supports more concurrent users• nvidia-smi• To see continuously: use watch>watch –n .5 nvidia-smi(this will update the status every 0.5 seconds)
Summary
• Putty + VNC connect to math_-lx• Ssh+qsh : connect to cluster machine• Ssh+qsh+gpu.q : connect to GPU machine• No need to memorize! Helpful scripts on
course website – http://www.wisdom.weizmann.ac.il/~
vision/courses/2016_1/DNN/utils.html• http://math96-lx is your friend!