deliverable d5.5 report on icarus visualization … able to support both parallel-mpi traffic and...

– DELIVERABLE D5.5 – Report on ICARUS visualization

cluster installation

John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS)

– 02 May 2011 –

David Le Touzé

Zone de texte

– DELIVERABLE D6.5 (ex D5.5) –

NextMuSE

2 Next generation Multi-mechanics Simulation Environment

a Future and Emerging Technologies FP7-ICT European project

NextMuSE



Cluster configuration

The original EIGER vizualization and analysis cluster (installed in April 2010) includes 19 nodes based on the six-core dual-socket AMD Istanbul Opteron 2427 processor and running at 2.2 GHz. Four nodes are reserved to specific tasks: one for the login, one for the administration and two for the file system IO routing; leaving 15 nodes to which we have now added (March 2011) added an extension of four nodes based on the 12-core dual socket AMD Magny-Cours Opteron 6174 running at 2.2 GHz. Standard nodes offers 24 GB of main system memory, whereas fat (memory) nodes and extension nodes offer 48 GB per node. We therefore get a total of 276 cores and 664 GB of memory. In addition to the CPUs, every node hosts two GPU cards, GeForce or Tesla. The latest nodes come with Fermi cards providing 448 cuda cores each and have either 3 or 6 GB of memory onboard. More details are given in Table 1.

For the high speed network interconnect, the cluster relies on a dedicated Infiniband QDR fabric infrastructure, able to support both parallel-MPI traffic and parallel data file system traffic to IO nodes. In addition, a commodity 10 GbE LAN ensures interactive login access, home, project and application file sharing among the cluster nodes. A standard 1 GbE administration network is also reserved for cluster management purposes.

Altair PBS Professional V 10.2 is the main batch queuing system installed and supported on the cluster. A CSCS user project has been created which allows external partners to access the cluster

The accounting system has a partition reserved for NextMuSE so that CPU hours consumed by (external) NextMuSE users can be automatically recorded.

Node configuration (extension)

Nodes are dual socket nodes with 48 GB of memory. As shown in Figure 1, one socket has 32 GB of memory whereas the other one has 16 GB, NUMA effects must therefore be considered when using more than the amount of memory a single socket provides, ie. 32 or 16 GB. Nevertheless each core comes with a L1 cache of 64 KB, a L2 cache of 512 KB and a shared L3 cache of 10 MB (2x6 MB but only 10 MB visible).

Figure 1: Magny-Cours Node Topology

NextMuSE



Node Name

Node Type

CPU Type # cores per node

# sockets per node

Memory per node

CPU frequency

GPU type

# GPU per node

eiger160 login AMD Istanbul 12 2 24 GB 2.2 Ghz Matrox 1

eiger170 admin AMD Istanbul 6 1 8 GB 2.2 Ghz Matrox 1

eiger180 gpfs AMD Istanbul 12 2 24 GB 2.2 Ghz Matrox 1

eiger181 gpfs AMD Istanbul 12 2 24 GB 2.2 Ghz Matrox 1

1 eiger200 vis AMD Istanbul 12 2 24 GB 2.2 Ghz GTX 285 1







8 eiger207 visfat AMD Magny-Cours 24 2 48 GB 2.2 Ghz M2050 2

9 eiger208 visfat AMD Magny-Cours 24 2 48 GB 2.2 Ghz M2050 2

10 eiger209 visfat AMD Magny-Cours 24 2 48 GB 2.2 Ghz C2070 2

11 eiger210 visfat AMD Magny-Cours 24 2 48 GB 2.2 Ghz C2070 2

12 eiger220 visfat AMD Istanbul 12 2 48 GB 2.2 Ghz GTX 285 1




16 eiger240 a.d.n. AMD Istanbul 12 2 24 GB 2.2 Ghz S1070 2

17 eiger241 a.d.n. AMD Istanbul 12 2 24 GB 2.2 Ghz S1070 2

18 eiger242 a.d.n. AMD Istanbul 12 2 24 GB 2.2 Ghz C2070 2

19 eiger243 a.d.n. AMD Istanbul 12 2 24 GB 2.2 Ghz C2070 2

Table 1: Eiger System Configuration with newly installed NextMuSE Extension

NextMuSE



MPI configuration

The default MPI distribution installed on the system is MVAPICH2, which provides a good and reliable implementation of MPI over InfiniBand – more details are available on http://mvapich.cse.ohio-state.edu.

Below are presented benchmark of the measured bandwidth and latency between eiger nodes using inter-process communication or intra-process communication. Note that an additional kernel module is used for one-copy intra-node message passing, optimizing the performance for this type of configuration. Between two nodes with an Infiniband QDR 4X link, the theoretical bandwidth is expected to be 4 GB/s, here the achieved bandwidth appears to be only 3 GB/s. Note also that since these measurements were made, the system has been constantly updated and the acheivable bandwidth should be slightly higher, though on the newly installed nodes, bandwidth is slightly lower due to the internal hardware configuration.

Inter-node Two Sided Operations (OFA-IB-Nemesis)

Intra-node Two Sided Operations (KNEM)

NextMuSE



Remote access configuration

Remote access to the cluster is provided using either ssh through the main CSCS front-end machine ELA, or using remote desktop viewer solutions such as TurboVNC or TigerVNC which allow the use of OpenGL applications (e.g. ParaView) at a reliable frame rate. An example of the connection procedure using the TurboVNC software is available at the following address: http://user.cscs.ch/systems/dalco_sm_system_eiger/eiger_as_visualization_facility/remote_visualization_access_procedure/index.html. Below is a screen-shot of what any NextMuSE partner should be able to get:

Launching parallel paraview server jobs

Additional information on how to configure paraview to launch reverse connection jobs for HPC visualization is available via the pv-meshless wiki https://hpcforge.org/plugins/mediawiki/wiki/pv-meshless/index.php/Launching_ParaView_on_HPC_Machines pv-meshless is a ParaView plugin developed by CSCS which forms the main host for the SPH analysis modules developed within the NextMuSE project.

deliverable d5.5 report on icarus visualization … able to support both parallel-mpi traffic and...

Documents