introduction to hpc resources for bcb 660
Post on 06-Feb-2016
54 Views
Preview:
DESCRIPTION
TRANSCRIPT
Introduction to HPC resources for BCB 660
Nirav Merchantnirav@email.arizona.eduwww.iplantcollaborative.org
What is Parallel Computing ? General overview of HPC systems Overview of batch system (and why we need
them) Getting started with Ranger Understanding the default user environment Introduction to modules (and why we need
them) Submitting your first job (and monitoring it) Moving your data in and out of HPC systems Q/A
Topic Coverage
von Neumann Architecture Named after the Hungarian mathematician John
von Neumann who first authored the general requirements for an electronic computer in his 1945 papers.
Since then, virtually all computers have followed this basic design of:
Memory (RAM) Control Unit (CPU) Arithmetic Logic Unit (ALU) Input/Output (Keyboard)
What is computing ?
What does it look like (your computer) ?
Image courtesy Univ. of Washington
Parallel computing: use of multiple processors or computers working together on a common task.
Each processor works on part of the problem Processors can exchange information
What is Parallel Computing?
A good introduction to concepts for parallel programing is at:https://computing.llnl.gov/tutorials/parallel_comp/
Traditional software is written to execute serially i.e. one task at a time running on one CPU
As the size of data (tasks) is increasing we need to utilize multiple CPU’s
Size of data also has implications on how much RAM and disk space is required for the task (we need more RAM or disk that fits on one computer)
Why we need it
HPC systems: Not very different
Image courtesy TACC at Univ of Texas
HPC: High Performance Computing = Super Computing
Node: One self contained computer (many of which are connected together to form a “cluster”)
CPU = Socket = Processor = Cores Interconnect: networking between Nodes
(can be fiber optic, or regular ethernet like your computers) e.g. Infiniband or GigE
Some Terminology (Jargon) of HPC
Scalability: Ability to use additional resources to execute tasks faster
Embarrassingly Parallel: Data Parallel tasks where each task is independent and not much communication or coordination is required among tasks
Observed Speedup: “wall time” taken for serial task divided by wall time for parallel task
More Terminology (Jargon) of HPC
Shared memory All CPU (processors) have access to shared RAM
Distributed memory Each CPU (processor) has its own local memory,
but can be connected to others nodes via fast interconnect
Types of HPC
Limits of single CPU computing Performance Available memory (Disk and RAM)
Parallel computing allows one to: Execute Tasks that don’t fit on a single CPU Complete tasks in a reasonable time
Again Please check: https://computing.llnl.gov/tutorials/parallel_comp/
for basic intro to parallel computing concepts
Again why do we need it ?
Compute power 504 Teraflops 3,936 four socket nodes 62,976 cores, 2.0 GHz AMD Opteron
Memory 125Terabytes 2GB/core, 32 GB/node
Disk subsystem 1.7 PB Storage (Lustre Parallel File System) 1 PB in /work filesystem
Interconnect 8 Gb/s InfiniBand
Lonestar and others machines have similar (much larger specs)
RANGER
HOME Store your source code and build your executables here Use $HOME to reference your home directory in scripts
WORK Store large files here This file system is NOT backed up, use $ARCHIVE for important
files! Use $WORK to reference this directory in scripts
SCRATCH Store large input or output files here – TEMPORARILY This file system is NOT backed up, use $ARCHIVE for important
files! Use $SCRATCH to reference this directory in scripts
ARCHIVE Massive, long-term storage and archive system Check with staff before using this on your account
Filesystem Access
Limits on your filesystem
How is it connected
Please visit the TACC new user guide for RANGER
You will pick up many hints that will make your life MUCH easier for running tasks on TACC resources
http://www.tacc.utexas.edu/user-services/user-guides/ranger-user-guide
http://goo.gl/0xyN5 (same as above)
MUST READ THIS
With multiple users we need a way to organize tasks
We need a way to assign suitable resources to the tasks (track, prioritize)
With multiple software we need a way to deal with conflicts in version and dependency per tasks
Batch scheduler user on all TACC systems is SGE (Sun Grid Engine) now owned by Oracle.
Batch, Module system
Batch submission
RANGER: Queue Options
Common SGE commands
Lets get working
ssh trainXXX@ranger.tacc.utexas.edu
Module Commands
Compbio stack/modules
Modules are for global use, hard to get cutting edge code as modules (limited staff time)
You can always compile and use your own versions without waiting for a module to be built
When possible, build your applications from source rather than running pre-compiled binaries
If you choose to use “make Install”, you will need to modify the “configure” script to change where it is installed
./configure --prefix=$HOME/bin For best performance, use the the intel compilers For best compatibility, use the gcc compilers More in “bleeding edge s/w” slide
But my favorite app is …
Number of cores and nodes to use is set with:
#$ -pe Nway 16*M N represents the number of cores to utilize
per node Ranger: 1≤N≤16Lonestar:1≤N≤12
M is the number of nodes to utilize The TOTAL number of cores used is thus:
N*M
Preparing for tasks
Preparing a job submission
Some more SGE options
http://genomics.tacc.utexas.edu/projects/ls4compbio/wiki
http://goo.gl/QYnIo (same url as above just short)
Lets look at the tutorial section towards the end of the page
Working with bleeding edge s/w
More from that page
SCP will work well for most smaller files Specialized options (bbcp and gridftp need
special end point installation) As you get larger files (10Gb+) it gets time
consuming to move it around Easier to move your data into iPlant data store
from your desktop/server (parallel transfers) Pull that data where you need (and push more
into it) Command line and GUI options (including
dropbox for science)
Getting data in and out
Details at:
http://goo.gl/4xzhA Connecting from RANGER module load irods iinit Answer the prompts using info from
above link You are now connected (without future
need of passwords to iPlant data store)
iPlant data store
From RANGER
After loading irods module i.e module load irods
You have many tasks that you want to run and they are naturally parallel (“embarrassingly parallel” )
Parametric Job Launcher: a simple utility for submitting multiple serial applications simultaneously.
% module load launcher 2 key components:
paramlist execution command launcher.sge job submission script
Parametric Launcher
Check http://genomics.tacc.utexas.edu/projects/ls4
compbio/wiki/TACC_NGS_Course_Practical_1
http://goo.gl/YBHKx
Look at the shrimp_launcher.sge for ideas
Parametric Launcher
TACC Staff for slides Matt Vaughn Michael Gonzalez And many more
URL http://www.tacc.utexas.edu/user-services/us
er-guides/
Gratitude
top related