mpwide: a light-weight communication library for wide area message passing and code coupling

14
MPWide: A communication library for wide area message passing Derek Groen Centre for Computational Science

Upload: university-college-london

Post on 06-Jul-2015

133 views

Category:

Technology


3 download

DESCRIPTION

MPWide is a light-weight communication library for wide area message passing and code coupling. We originally developed MPWide to efficiently exchange data between supercomputers, allowing us to run a cosmological simulation in parallel across multiple machines in different countries. However, we also used MPWide for other purposes, such as coupling the 3D HemeLB bloodflow solver. Overall MPWide allows users to obtain excellent performance, incurring as little as 7% overhead for the cosmological simulations and 1% overhead in the multiscale bloodflow simulations. We manage to obtain this performance despite the presence of background traffic and, in some cases, non-optimized network configurations. I gave this presentation during the MAPPER Seasonal School in Barcelona in 2013. For more information, please feel free to contact me on Twitter or refer to my webpage.

TRANSCRIPT

Page 1: MPWide: A light-weight communication library for wide area message passing and code coupling

MPWide: A communication library for wide area message passing

Derek GroenCentre for Computational Science

Page 2: MPWide: A light-weight communication library for wide area message passing and code coupling

Overview

The networking landscape Using wide area networks MPWide Example applications Uses for multiscale modelling Questions

Page 3: MPWide: A light-weight communication library for wide area message passing and code coupling

The networking landscape

The networks connecting grid sites and supercomputers are highly heterogeneous.

− Configurations differ at end points.− Shared paths vs. Dedicated paths− Optical interconnects vs. Regular interconnects.

AB

C

Page 4: MPWide: A light-weight communication library for wide area message passing and code coupling

The networking landscape

Fundamental issue: Networks configurations tend to be node-specific, not path-specific.

− What do we do when a node has multiple paths? (most nodes nowadays do)

AB

C

Page 5: MPWide: A light-weight communication library for wide area message passing and code coupling

Using wide area networks (WANs)

Solution 1: Apply a homogeneous configuration for all paths.

− Could work for nodes with similar path lengths. Not common for WAN communication nodes.

− Inefficient for the TCP protocol, where the optimal config is dependent on the path length.

− Requires admin privileges on all end-points.

AA

A

Page 6: MPWide: A light-weight communication library for wide area message passing and code coupling

Using WANs

Solution 2: Adopt a different protocol.− May accomodate heterogeneous configs.− New protocol, new list of potential issues.− Interplay between protocols on shared networks.− Time-consuming and politically heavyweight

process.

խՐ

ե

Page 7: MPWide: A light-weight communication library for wide area message passing and code coupling

Using WANs

Solution 3: User-space tuning through software.− Limited space for tuning.

Some adjustments require admin rights.

− Use TCP protocol and existing configurations.− No special privileges required.

AB

CX XY ZY Z

Page 8: MPWide: A light-weight communication library for wide area message passing and code coupling

MPWide

MPWide is a communication library which allows for user-space tuning of individual paths.

For each path it can:− Use 1 or multiple tcp streams.

Good performance obtained with up to 128 streams/path.

− Configure different buffer and packet sizes.− Apply software-based packet pacing to reduce load.

Also improves performance on long networks (Yoshino et al. 2008).

Page 9: MPWide: A light-weight communication library for wide area message passing and code coupling

Example: cosmological N-body

One simulation, parallelized across supercomputers.

Uses the SUSHI code, whichis a cross-site adaptation of GreeM.

Models dark matter structure formation over 13.4 billion years.

Algorithm: Tree + Particle-mesh. Adaptive load-balancing

between sites. →

Page 10: MPWide: A light-weight communication library for wide area message passing and code coupling

Example: cosmological N-body

Using 2 to 4 supercomputers simultaneously.− Up to 2048 cores total.

MPI within each site. Custom MPWide

connections between sites. MPWide Forwarder procs

bypass connectivity restrictions.

2048 cores, 3 sites, 7% comm. overhead.

10 Gbps lightpath

Page 11: MPWide: A light-weight communication library for wide area message passing and code coupling

Example: multiscale bloodflow

Page 12: MPWide: A light-weight communication library for wide area message passing and code coupling

Example: multiscale bloodflow

pyNS (1D) coupled to HemeLB (3D). 400.000 time steps, 4000 velocity exchanges

− with 1.2% comm. overhead (512+1 cores, 2298 s),− and 5% comm. overhead (2048+1 cores, 907 s.).

Page 13: MPWide: A light-weight communication library for wide area message passing and code coupling

Uses for multiscale modelling

Can be used for performance critical cyclic coupling over wide area networks.

− High-performance, simple low-level interface. Contains an mpw-cp file transfer client to

accelerate file-based couplings. Supports C, C++, Python. Trivial to install and intended for users without

administrative privileges. Is being integrated into MUSCLE 2 to improve

its coupling performance.

Page 14: MPWide: A light-weight communication library for wide area message passing and code coupling

Thank you!

MPWide website:− http://castle.strw.leidenuniv.nl/software/mpwide.html

More on the multiscale bloodflow application:− Groen et al., Interface Focus 3(2), 2013.

More on the cosmological N-body application:− Groen et al., INFOCOMP 2011, ArXiv:1109.5559.

Thanks go out to Steven Rieder, Simon Portegies Zwart, Tomoaki Ishiyama, Keigo Nitadori, Joris Borgdorff, Rupert Nash and the MAPPER consortium as a whole.