anintroductiontogauss - .wiki* the#compute#resources#(cpu’s,memory)are#shared#across#...

25
An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

Upload: others

Post on 15-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

An  Introduction  to  Gauss    

Paul  D.  Baines  University  of  California,  Davis  

November  20th  2012  

Page 2: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

*  http://wiki.cse.ucdavis.edu/support:systems:gauss  *  12  node  compute  cluster  (2  x  16  cores  per  node)  *  1  TB  storage  per  node  *  ~  11  TB  storage  on  head  node    *  64GB  RAM  per  node  *  Total  416  cores  (inc.  head  node)  

What  is  Gauss?  

Page 3: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

*  Running  large  numbers  of  independent  jobs  *  Running  long-­‐running  jobs  *  Running  jobs  involving  parallel  computing  *  Running  large-­‐memory  jobs    

What  is  Gauss  good  for?  

Page 4: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

*  Running  simple,  fast  jobs  (just  use  your  laptop)  *  Running  interactive  R  sessions  *  Running  GPU-­‐based  calculations  

What  Gauss  is  not  designed  for…  

Page 5: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

*  Create  your  public/private  key  (see  Wiki  for  details)  *  Provide  CSE  with  your  public  key  and  campus  username  

(via  email  to  [email protected])    *  Log  in  to  Gauss  via  ssh:  

 (e.g.,  ssh  –X  [email protected])  *  When  you  ssh  into  Gauss,  you  log  in  to  the  head  node  *  If  you  just  directly  type  R  at  the  command  line,  you  will  be  

running  R  on  the  head  node    *  (Please  do  not  do  this  é!)  *  To  use  the  compute  nodes  you  submit  jobs  via  SLURM  *  SLURM  manages  which  jobs  runs  on  which  nodes  

Gauss  Overview  

Page 6: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

Head  Node  

SLURM  

Compute  Node  1  

Compute  Node  2  

Compute  Node  3  

Compute  Node  …  

Compute  Node  12  

Gauss  Structure  

Page 7: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

Important  commands  to  know:  

*  sbatch    (submit  a  job  to  Gauss)  *  sarray  (submit  an  array  job  to  Gauss)  *  squeue  (check  the  status  of  running  jobs)  *  scancel  (cancel  a  job)  

Examples  (more  detailed  examples  later):    squeue      #  view  all  running  jobs  squeue  –u  pdbaines  #  check  all  jobs    scancel  –u  pdbaines  #  cancel  all  of  pdbaines’  jobs  scancel  19213      #  cancel  job  19213    

SLURM  Basics  

Page 8: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

*  The  compute  resources  (CPU’s,  memory)  are  shared  across  all  Gauss  users.  *   When  users  submit  jobs,  SLURM  allocates  resources.  *  You  must  be  sure  to  request  sufficient  resources  (e.g.,  

cores,  memory)  for  your  jobs  to  run  *  Resource  requests  are  made  when  submitting  your  job  (via  

your  sbatch  or  sarray  scripts)  *  Resources  are  allocated  as  per  user  requests,  but  strict  

limits  are  not  enforced  *  If  you  use  more  memory  than  you  requested  it  can  

~massively~  slow  down  yours  (and  others)  jobs!  *  To  check  the  memory  usage  of  your  jobs  you  can  use  the  

‘myjobs’  command  (see  examples  later)  

Resource  Allocation  on  Gauss  

Page 9: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

*  Gauss  is  a  shared  resource  –  your  bad  code  can  (potentially)  ruin  someone  elses  simulation!  *  Test  your  code  thoroughly  before  running  large  jobs  *  Make  sure  you  request  the  correct  amount  of  resources  for  your  jobs  *  Regularly  check  memory  usage  for  long-­‐running  jobs  *  Be  considerate  of  others!    

Gauss  Etiquette  

Page 10: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

*  To  use  Gauss  you  need  to  know  some  basic  Linux  commands  (these  work  on  a  Mac  terminal  too)  

*  You  should  already  be,  or  quickly  get,  familiar  with  the  following  commands:  

ls,  cd,  cp,  mv,  rm,  pwd,  cat,  tar,  grep    *  It  helps  if  you  learn  how  to  use  a  command  line  editor  such  as  vim  or  nano.  (hint:  use  vim  J)  

Aside:  Linux  Basics  

Page 11: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

Bob  has  been  given  a  large  dataset  by  a  collaborator  and  told  to  analyze  it  in.  The  dataset  is  large  and  the  job  will  take  about  3  days  to  complete  so  he  doesn’t  want  to  use  his  laptop!    Bob  can  submit  the  job  on  Gauss,  and  keep  on  working  on  other  stuff  in  the  meantime.  

Ways  to  use  Gauss:  Example  1  

Page 12: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

Code  files:    bob_example_1.R  bob_example_1.sh    To  submit:    sbatch  bob_example_1.sh  

Example  1  cont…  

Page 13: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

Example  1  Code:  SLURM  script  

Page 14: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

*  How  do  you  know  how  much  memory  to  request?  Run  small  trial  jobs!  

*  Use  the  ‘myjobs’  command  e.g.,    

pdbaines@gauss:~/Examples/Example_3$  myjobs  Tue  Nov  20  10:27:45  PST  2012  -­‐  pdbaines  has  jobs  running  on:  c0-­‐11  jobs  for  pdbaines  on  c0-­‐11  USER        PID  %CPU  %MEM            VSZ      RSS  TTY            STAT  START      TIME  COMMAND  pdbaines  13932  99.0    0.3  408424  216492  ?              R        10:25      3:12  R  pdbaines  13949  99.1    0.3  434308  242336  ?              R        10:25      3:12  R  pdbaines  13975  99.1    0.2    367720  175780  ?              R        10:25      3:12  R  pdbaines  13995  99.1    0.3    425100  233172  ?              R        10:25      3:12  R    VSZ  and  RSS  give  a  rough  indication  of  how  much  memory  your  job  is  using  (in  Kb)  e.g.,  The  above  R  jobs  are  using  ~350-­‐450Mb  each.    

Allocating  Resources  

Page 15: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

Bob  has  been  given  3  more  datasets  to  analyze  by  his  collaborator  (or  three  new  analyses  to  perform  on  the  same  dataset).      He  just  needs  to  set  up  the  same  thing  as  example  1  multiple  times.  

Ways  to  use  Gauss:  Example  2  

Page 16: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

Code  files:    bob_example_2A.R,      bob_example_2B.R,    bob_example_2C.R  bob_example_2A.sh,  bob_example_2B.sh,  bob_example_2C.sh,    To  submit:    sbatch  bob_example_2A.sh  sbatch  bob_example_2B.sh  sbatch  bob_example_2B.sh    

Example  2  cont…  

Page 17: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

Example  2  Code:  SLURM  script  

Page 18: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

Bob  has  developed  a  new  methodology  for  analyzing  super-­‐complicated  data.    He  wants  to  run  a  simulation  to  prove  to  the  world  how  awesome  his  method  is  compared  to  his  competitors  methods.    He  decides  to  simulate  100  datasets,  and  analyze  each  of  them  with  his  method,  and  his  competitors  methods.  This  is  done  using  an  array  job.  

Ways  to  use  Gauss:  Example  3  

Page 19: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

*  Bob  writes  an  R  script  to  randomly  generate  and  analyze  one  dataset  at  a  time  *  He  would  like  to  run  the  script  100  times  on  Gauss  *  To  do  this,  he  write  a  shell  script  to  submit  to  SLURM  *  Each  run  must  use  a  different  random  seed,  o/w  he  will  analyze  

the  same  dataset  100  times!  *  He  will  also  need  to  write  an  R  script  to  combine  the  results  from  

all  100  jobs    *  He  will  also  need  a  shell  script  to  submit  the  post-­‐processing  

portion  of  the  analysis  *  (Note:  I  have  described  this  process  in  detail  on  the  Gauss  page  

of  the  CSE  Wiki:  http://wiki.cse.ucdavis.edu/support:systems:gauss)  

Example  3  cont…  

Page 20: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

Code  files:    bob_example_3.R  Bob_post_process.R    To  submit:    sarray  bob_example_3.sh  sbatch  bob_post_process.sh      

Example  3  cont…  

Page 21: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

Example  3:  SLURM  script  

Page 22: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

Example  3:  Modified  R  Code  

Page 23: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

*  To  copy  results  back  from  Gauss  to  your  laptop:  

*  Archive  them  e.g.,  tar  –cvzf  all_results.tar.gz  my_results/    •  Copy  them  by  either  using  a  file  transfer  (sftp)  program,  or,  

just  use  the  command  line  (Linux/Mac  users)  e.g.,      scp  [email protected]:~/all_results.tar.gz  ./  

Retrieving  your  results  

Page 24: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

*  Gauss  can  be  setup  to  run  parallel  computing  jobs  using  MPI,  OpenMP  etc.  *  SLURM  submit  files  need  to  be  modified  to  specify  number  of  tasks,  CPU’s,  memory  per  CPU  etc.  *  New  (free)  software  can  be  installed  on  Gauss  at  your  request  by  emailing  help@cse  

More  Advanced  Usage  

Page 25: AnIntroductiontoGauss - .wiki* The#compute#resources#(CPU’s,memory)are#shared#across# allGauss#users. * #When#users#submit#jobs,SLURM#allocates#resources. * You#must#be#sure#to#request#sufficient#resources#

Pre-­‐requisite  Linux  skills:  *  http://code.google.com/edu/tools101/linux/basics.html    Gauss/SLURM  Links:  *  http://wiki.cse.ucdavis.edu/

support:general:security:ssh#moving_and_copying_keys  *  http://wiki.cse.ucdavis.edu/support:faq:getting_started  *  http://wiki.cse.ucdavis.edu/support:systems:gauss  *  http://wiki.cse.ucdavis.edu  *  http://www.sph.umich.edu/biostat/computing/cluster/slurm.html  *  https://computing.llnl.gov/linux/slurm/faq.html  *  https://computing.llnl.gov/linux/slurm/documentation.html  

References