roadrunner supercluster university of new mexico -- national computational science alliance paul...

27
Roadrunner Supercluster University of New Mexico -- National Computational Science Alliance Paul Alsing

Upload: cecil-potter

Post on 14-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Roadrunner Supercluster

University of New Mexico --National Computational Science Alliance

Paul Alsing

23 September 1999

Cactus Workshop2

Alliance/UNM Roadrunner SuperCluster

23 September 1999

Cactus Workshop3

Alliance/UNM Roadrunner SuperCluster

• Strategic Collaborations with Alta Technologies Intel Corp.

• Node configuration Dual 450MHz Intel Pentium II processors 512 KB cache, 512 MB ECC SDRAM 6.4 GB IDE hard drive Fast Ethernet and Myrinet NICs

23 September 1999

Cactus Workshop4

Alliance / UNM Roadrunner

• Interconnection Networks Control: 72-port Fast Ethernet

Foundry switch with 2 Gigabit Ethernet uplinks

Data: Four Myrinet Octal 8-port switches

Diagnostic: Chained serial ports

23 September 1999

Cactus Workshop5

A Peek Inside Roadrunner

23 September 1999

Cactus Workshop6

Roadrunner System Software

• Redhat Linux 5.2 (6.0)• SMP Linux kernel 2.2.12• MPI (Argonne’s MPICH 1.1.2)• Portland Group Compiler Suite• Myricom GM Drivers (1.086) and • MPICH-GM (1.1.2.7)• Portable Batch Scheduler (PBS)

• HPF Parallel Fortran for clusters• F90 Parallel SMP Fortran 90• F77 Parallel SMP Fortran 77• CC Parallel SMP C/C++• DBG symbolic debugger• PROF performance profiler

23 September 1999

Cactus Workshop8

Roadrunner System Libraries

• BLAS• LAPACK• ScaLAPACK• Petsc• FFTw• Cactus• Globus Grid Infrastructure

23 September 1999

Cactus Workshop9

Parallel Job Scheduling

• Node-based resource allocation• Job monitoring and auditing• Resource reservations

23 September 1999

Cactus Workshop10

Computational Grid

• National Technology Grid• Globus Infrastructure

Authentication Security Heterogenous environments Distributed applications Resource monitoring

23 September 1999

Cactus Workshop11

For more information:

• Contact Informationhttp://www.alliance.unm.edu/[email protected]

• To Apply for an Account http://www.alliance.unm.edu/accounts [email protected]

23 September 1999

Cactus Workshop12

Easy to Use

rr% ssh -l username rr.alliance.unm.edu rr% mpicc -o prog helloWorld.c rr% qsub -I -l nodes=64

r021 % mpirun prog

23 September 1999

Cactus Workshop13

Job Monitoring with PBS

23 September 1999

Cactus Workshop14

Roadrunner Performance

23 September 1999

Cactus Workshop15

Roadrunner Ping-Pong Time

23 September 1999

Cactus Workshop16

Roadrunner Bandwidth

23 September 1999

Cactus Workshop17

Applications on RR

• MILC QCD (Bob Sugar, Steve Gottlieb) A body of high performance research software for doing

SU(3) and SU(2) lattice gauge theory on several different (MIMD) parallel computers in current use

• ARPI3D (Dan Weber) 3-D numerical weather prediction model to simulate the rise

of a moist warm bubble in a standard atmosphere• AS-PCG (Danesh Tafti)

2-D Navier Stokes solver• BEAVIS (Marc Ingber, Andrea Mammoli)

1994 Gordon Bell Prize-winning dynamic simulation code for particle-laden, viscous suspensions

23 September 1999

Cactus Workshop18

Applications: CACTUS

• 3D Numerical Relativity Toolkit for Computational Astrophysics(Courtesy of Gabrielle Allen and Ed Seidel)

• Roadrunner performance under the Cactus application benchmark shows near-perfect scalability.

23 September 1999

Cactus Workshop19

CACTUS Performance

(Graphs - courtesy of O. Wehrens)

23 September 1999

Cactus Workshop20

CACTUS Scaling

(Graphs - courtesy of O. Wehrens)

23 September 1999

Cactus Workshop21

CACTUS: The evolution of a pure gravitational wave

A subcritical Brill wave (Amplitude=4.5), showing the Newman-Penrose Quantity as volume rendered 'glowing clouds'. The lapse function is shown as a height field in the bottom part of the picture.

(Courtesy of Werner Benger)

23 September 1999

Cactus Workshop22

• TeraScale computing• “A SuperCluster in every lab”• Efficient use of SMP nodes• Scalable interconnection networks• High-performance I/O• Advanced programming models for

hybrid (SMP and Grid-based) clusters

23 September 1999

Cactus Workshop23

Exercises

• Login to Roadrunner% ssh roadrunner.alliance.unm.edu -l cactusXX

• Request interactive session% qsub -I -l nodes=n

• Create Myrinet Node-Configuration File% gmpiconf $PBS_NODEFILE (to use 1 CPU per node)% gmpiconf2 $PBS_NODEFILE (to use 2 CPUs per node)

• Run Job% mpirun cactus_wave wavetoyf90.par (on 1 CPU per node)% mpirun -np 2*n cactus wavetoyf90.par (on 2 CPUs per node)

23 September 1999

Cactus Workshop24

Compiling Cactus: WaveToy

• Login to Roadrunner% ssh roadrunner.alliance.unm.edu -l cactusXX

• .cshrc #MPI (season to taste) #setenv MPIHOME /usr/parallel/mpich-eth.pgi #ethernet/Portland Grp #setenv MPIHOME /usr/parallel/mpich-eth.gnu #ethernet/GNU setenv MPIHOME /usr/parallel/mpich-gm.pgi #myrinet/Portland Grp #setenv MPIHOME /usr/parallel/mpich-gm.gnu #myrinet/GNU

• if you modify .cshrc make sure to source .cshrc; rehash echo $MPIHOME #should read /usr/parallel/mpich-gm.pgi

23 September 1999

Cactus Workshop25

Compiling Cactus: WaveToy

• Create WaveToy configuration% gmake wave F90=pgf90 MPI=MPICH MPICH_DIR=$MPIHOME

• Compile WaveToy % gmake wave% cd ~/Cactus/exeCopy all .par files into this directory (not necessary)% foreach file (`find ~/Cactus -name “*.par” -print`) foreach> cp $file . foreach> end

23 September 1999

Cactus Workshop26

Running WaveToy on RoadRunner

• Run wave interactively on RoadRunner PBS job scheduler: request interactive nodes% qsub -I -l nodes=4 (note: -I = interactive)

• Note: prompt changes from a front-end node name like [cactus01@rr exe] to an compute-node name for e.g. [cactus01@r034 exe]

• Note: you should compile on the front-end and run on the compute nodes (open 2 windows)

PBS job scheduler: setup a node-configuration file% gmpiconf $PBS_NODEFILE

• Note: cat ~/.gmpi/conf-xxxx.rr will contain specific node names

Run the job from ~/Cactus/exe % mpirun cactus_wave wavetoyf90.par % mpirun -np 2 cactus_wave wavetoyf90.par

23 September 1999

Cactus Workshop27

Running WaveToy on RoadRunner

• Run wave batch on RoadRunner PBS script: (call it, for e.g.) wave.pbs#PBS -l nodes=4# pbs script for wavetoy: 1 processor per nodegmpiconf $PBS_NODEFILEmpirun ~/Cactus/exe/cactus_wave wavetoyf90.par #(use full path)

%Submit batch PBS job% qsub wave.pbs% 234.44 (PBS responds with your job_id #)% qstat -a (check status of your job)% qstat -n (check status, and see the nodes you are on)% qdel 234.44 (remove job from queue)% dsh killall cactus_wave (if things hang, mess up, etc…)