xsim – the extreme-scale simulator€¦ · xsim – the extreme-scale simulator . janko...

35
www.bsc.es xSim – The Extreme-Scale Simulator Janko Strassburg Severo Ochoa Seminar @ BSC, 28 Feb 2014

Upload: others

Post on 29-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

www.bsc.es

xSim – The Extreme-Scale Simulator

Janko Strassburg

Severo Ochoa Seminar @ BSC, 28 Feb 2014

Page 2: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Motivation

Future exascale systems are predicted to have hundreds of thousands of nodes, thousands of processors per node. Limited application scalability due to sequential parts, synchronizing communication and other bottlenecks

Investigating performance of parallel applications at scale is an important component of HPC hardware/software co-design Behaviour on future architectures Performance impact of architecture choices

Page 3: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Overview

Several existing simulators include JCAS, BigSim, Dimemas, MuPi – Limitations (Run time, no of concurrent threads executed, ..)

Highly scalable solution – trade off accuracy in exchange for scalability – Nodes oversubscribed for simulation – Highly accurate simulations are extremely slow and less scalable

“The Extreme-scale Simulator permits running an HPC application in a controlled environment with millions of concurrent execution threads while observing its performance in a simulated extreme-scale HPC system using architectural models and virtual timing.”

Page 4: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Overview Parallel discrete event simulation (PDES) to emulate the behaviour of various architecture models Execution of real applications, algorithms or their models atop a simulated HPC environment for: – Performance evaluation, including identification of resource contention and

underutilization issues – Investigation at extreme scale, beyond the capabilities of existing simulation

efforts

© S. Boehm and C. Engelmann. xSim: The Extreme-Scale Simulator. HPCS 2011, Istanbul, Turkey, July 4-8, 2011.

Page 5: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Overview Combining highly oversubscribed execution, a virtual MPI, and a time-accurate PDES (Parallel discrete event simulation) PDES uses the native MPI and simulates virtual processors The virtual processors expose a virtual MPI to applications Multithreaded MPI implementation needed (e.g. Open MPI “--enable-mpi-thread-multiple”)

© 2010 IEEE Cluster Co-Design Workshop

Page 6: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Overview The simulator is a library Utilizes PMPI to intercept MPI calls and to hide the PDES Easy to use: – Replace the MPI header – Compile and link with the

simulator library – Run the MPI program

Support for C and Fortran MPI applications

© 2010 IEEE Cluster Co-Design Workshop

Page 7: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Overview

xSim is designed like a traditional performance tool – Interposition library sitting between MPI application and MPI library – Uses simulated wall clock time for measurement – Performance data extracted based on processor and network model – MPI performance tool interface (PMPI)

Supports – Simulated MPI point-to-point communication (essential calls) – Simulated MPI data types, groups, communicators, collective

communication (full) – 81 simulated MPI calls for each C and Fortran – ULFM MPI extensions

Page 8: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Comparison to Dimemas

Online simulator, change in model requires rerun – Batch simulations through scripts, configuration files, command line

options – Application model available

Oversubscribes nodes – Larger simulations than underlying system possible

Support of multi threaded MPI implementations – Fault Tolerance support through Open MPI ULFM – Version with locks available, albeit significantly slower

• No two calls in MPI comm at the same time • Non blocking calls -> blocking calls

Page 9: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Simulation Models

Processor model – Based on actual execution time of underlying hardware – Scaled up for simulated processor speed – Heterogeneous cores with differing speeds

Support for various network architecture models – Analyze existing hardware conditions / experiment with differing

architectures – Latency and bandwidth restrictions – Hierarchical combinations

• Network on chip • Network on node

– Sender/Receiver contention simulation – Full contention not supported due to scalability reasons

Page 10: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Simulation Models

Application model – Similar to MPI trace replays – Same timing and communication behaviour – Certain resources not needed to scale with simulation

• Memory usage – Advances virtual time for application between MPI calls – Execute MPI calls without actually sending data (no need for buffers)

Operating system noise simulation File system model – Currently in development – Read/Write delay, access time, congestion, …

Page 11: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Network Models

Unidirectional Ring Star

Tree Mesh

Page 12: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Network Models

Torus

Twisted Torus

Page 13: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Network Models

Twisted Torus with Toroidal Jump

Twisted Torus with Toroidal Degree

Page 14: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

General Usage of xSim

Add header files – #include xsim-c.h – #include xsim-f.h

Recompile and link with library – Library flag “-lxsim” – Programming language interface flags “-lxsim-c” or “-lxsim-f”

Run application in the simulator – mpirun -np <real process count> <application> <application args>

-xsim-np <virtual process count> <xsim args>

Page 15: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Examples – Hello World

Scaling hello world from 1000 to 100,000,000 cores – Native system : 12-core 2-processor 39-node Gig. Ethernet – Simulated system : 100,000,000 processor Gigabit Ethernet

xSim runs on up to 936 AMD Opteron cores and 2.5 TB RAM – 468 or 936 cores needed for 100,000,000 simulated processes – 100,000,000 x 8 kB = 800 GB in virtual MPI process stack

936 cores Simulated Time

936 xSim runtime

Page 16: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Examples – Basic Network Model

Model allows to define network architecture, latency and bandwidth Basic star network Model can be set to 0μs and ∞Gbps as baseline 50μs and 1Gbps roughly represented the native test environment – 4 Intel dual-core 2.13GHz nodes with 2GB of memory each – Ubuntu 8.04 64-bit Linux – Open MPI 1.4.2 with multi-threading support

© 2010 IEEE Cluster Co-Design Workshop

Page 17: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Example – Processor Model

Model allows to scale relative speed to different processor Basic scaling model Model can be set to 1.0x for baseline numbers MPI hello world scales to 1M+ VPs on 4 nodes with 4GB total stack (4kB/VP) Simulation (application) – Constant execution time – <1024 VPs: Noisy clock

Simulator – >256 VPs: Output buffer

issues

© 2010 IEEE Cluster Co-Design Workshop

Simulated 0µs/∞Gbps/1.0x

xSim run time 0µs/∞Gbps/1.0x

Page 18: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Example – Basic PI Monte Carlo Solver

Network model: – Star, 50μs and 1Gbps

Processor model – 1x (32kB stack/VP) – 0.5x (32kB stack/VP)

Simulation – Perfect scaling

Simulator – <= 8 VPs: 0% overhead on the 8 processor cores – >= 4096 VPs: comm. load dominates

© 2010 IEEE Cluster Co-Design Workshop

xSim run time 50µs/1Gbps/1.0x

xSim run time 50µs/1Gbps/0.5x

Simulated time 50µs/1Gbps/1.0x Simulated time 50µs/1Gbps/0.5x

Page 19: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Examples – NAS Parallel Benchmark

Scaling CG and EP class B problems – 1 to 128 simulated cores

Native system: 4 core, 2 processor, 16 node

Page 20: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Examples – NAS Parallel Benchmark

Scaling CG and EP class A problems – CG 1 to 4096 simulated cores – EP 1 to 16384 cores

Native system: 4 core, 2 processor, 16 node

CG.B Simulated time CG.B Total run time EP.B Total run time EP.B Simulated time

Page 21: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Examples – MCMI Core Scaling

960 Core system 240 cores for simulation due to memory bandwidth restrictions

Page 22: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Linear behaviour up to 2000x2000 matrix size Slight degradation for larger problem sizes

Examples – MCMI Problem Scaling

Page 23: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Simulator also gathers MPI statistics Linear increase of exchanged messages

Examples – MCMI MPI Message Count Scaling

Page 24: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems
Page 25: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Fault Tolerance Properties

Fault tolerance is a property of a program, not of an API specification or an implementation. Within certain constraints, MPI can provide a useful context for writing application programs that exhibit significant degrees of fault tolerance.

Page 26: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Advanced Features – Resilience and Fault Tolerance

xSim fully supports error handling within simulated MPI – Default MPI error handlers – User-defined MPI error handlers – MPI_Abort()

Simulated abort terminates simulation and provides – Performance results – Source of abort – Time of abort

Page 27: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Advanced Features – Resilience and Fault Tolerance

Developing and debugging of FT applications – Permits injection of MPI process failures – Propagate/detection/notification of failures within simulation – Handle application-level checkpoint/restart

Observation of application behaviour and performance under failure possible Support for User-Level Failure Mitigation (ULFM) extension – Investigate and develop Algorithm-Based Fault Tolerance (ABFT)

applications – xSim is the first performance tool to support ULFM and ABFT

Page 28: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

User-Level Failure Mitigation (ULFM)

Fault-tolerant MPI extension Proposed by MPI 3.0 Fault Tolerance Working Group To be presented and voted upon in the MPI forum these coming months for integration in upcoming MPI 3.1 standard “Minimal set of changes necessary for applications and libraries to include fault tolerance techniques and to construct more forms of fault tolerance (transactions, strongly consisten collectives, etc.)”

Page 29: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

User-Level Failure Mitigation (ULFM)

Three main concepts – Simplicity

• API easy to use and understand – Flexibility

• API to allow for varied fault tolerant models to be built as external libraries – Absence of deadlock

• No MPI call (point-to-point or collective) can block indefinitely after a failure • Calls must either succeed or raise an MPI error

Default error handler needs to be changed to use ULFM – On at least MPI_COMM_WORLD from

MPI_ERRORS_ARE_FATAIL to MPI_ERRORS_RETURN or custom MPI Errorhandler

Page 30: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

User-Level Failure Mitigation (ULFM)

Exceptions raised – MPI_ERR_PROC_FAILED – MPI_ERR_PROC_FAILED_PENDING – MPI_ERR_REVOKED

Acknowledge – MPI_Comm_failure_ack – MPI_Comm_failure_get_acked

Handling – MPI_Comm_shrink – MPI_Comm_revoke – MPI_Comm_agree – MPI_Comm_iagree

Page 31: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

ULFM in xSim

MPI_Comm_revoke() – Linear broadcast of failure notification through the simulated runtime – No matching receive – Releases any waited on send or receive

request with MPI_ERR_PROC_FAILED | MPI_ERR_PROC_FAILED_PENDING

MPI_Comm_shrink() – Two-phase commit protocol to establish the list of failed MPI ranks – Fault-tolerant linear reduction and broadcast operations

MPI_Comm_agree() – Agreement on a single value (logical AND operation by live members) – Fault-tolerant linear MPI_Allreduce() implementation

MPI_Comm_failure_ack() and MPI_Comm_failure_get_acked() – Failure registry per rank and per communicator with low memory

overhead (bit arrays)

Page 32: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems
Page 33: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems
Page 34: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems
Page 35: xSim – The Extreme-Scale Simulator€¦ · xSim – The Extreme-Scale Simulator . Janko Strassburg . Severo Ochoa Seminar @ BSC, 28 Feb 2014 . Motivation . Future exascale systems

Conclusions The Extreme-scale Simulator (xSim) is a performance investigation toolkit Uses oversubscription to model systems larger than underlying hardware Supports processor, network, application and noise models – File system model under development

First performance toolkit to support MPI process failure injection, checkpoint/restart and ULFM Forecast behaviour on varying systems possible Time and resource saving via simulation