practical model-checking method for verifying correctness of mpi programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Practical Model-Checking MethodFor Verifying Correctness of MPI Programs

Salman Pervez, Ganesh Gopalakrishnan, Robert M. Kirby, Robert PalmerSchool of Computing

University of Utah

Rajeev Thakur, William GroppMathematics and Computer Science Division

Argonne National Laboratory


• Concurrent algorithms are notoriously hard to design and verify.

• Formal methods, and in particular finite-state model checking, provide a means of reasoning about concurrent algorithms.

• Principle advantages of modeling checking approach:- Provides formal framework for reasoning- Allows coverage – examination of all possible process interleavings

• Principle challenges of modeling checking approach:- Requires modeling step- Can lead to “state explosion”

Thesis of the Talk

Thesis: In-Situ modeling checking with dynamic partial-order reduction provides the advantages of the

model checking approach while ameliorating the challenges.2/28


Why MPI is Complex: Collision of Features

– Send

– Receive

– Send / Receive

– Send / Receive / Replace

– Broadcast

– Barrier

– Reduce

– Rendezvous mode

– Blocking mode

– Non-blocking mode

– Reliance on system buffering

– User-attached buffering

– Restarts/Cancels of MPI Operations

– Non Wildcard receives– Wildcard receives– Tag matching – Communication spaces

An MPI program is an interesting (and legal)combination of elementsfrom these spaces

3/28


Conventional Debugging of MPI

• Inspection– Difficult to carry out on MPI programs (low level notation)

• Simulation Based– Run given program with manually selected inputs– Can give poor coverage in practice

• Simulation with runtime heuristics to find bugs– Marmot: Timeout based deadlocks, random executions– Intel Trace Collector: Similar checks with data checking– TotalView: Better trace viewing – still no “model checking”(?)– We don’t know if any formal coverage metrics are offered

4/28


What is Model Checking?

Navier-Stokes Equations are a mathematical model of fluid flow physics

“V&V” – Validation and Verification“Validate Models, Verify Codes”

“Formal models” can be generated eitherautomatically or by a modeler whichtranslate and abstract algorithms

and implementations.

5/28


Related work on FV for MPI programs

• Main related work is that by Siegel and Avrunin

• Provide synchronous channel theorems for blocking and non-blocking MPI constructs– Deadlocks caught iff caught using synchronous channels

• Provide a state-machine model for MPI calls– Have built a tool called MPI_Spin that uses C extensions to

Promela to encode MPI state-machine

• Provide a symbolic execution approach to check computational results of MPI programs

• Define a static POR algorithm which ameliorates challenge 2.– Schedules processes in a canonical order– Schedules sends when receives posted – sync channel effect– Wildcard receives handled through over-approximation

6/28


Traditional Execution Checking Versus Model Checking

“Execution Checking”

“Model Checking”

In current practice, concrete executions on a few diverse platforms are often used to verifyalgorithms/codes.

Consequence: Many feasible executions mightnot be manifested.

Model checking forces all executions of a judiciously down-scaled model to be examined.

Current focus of our research: minimize modeling effort and error.

7/28


Solution – Runtime (i.e. “In Situ”) Model Checking

• Pioneered by Patrice Godefroid (at Bell labs) • Developed in the context of his Verisoft project. He called it Runtime model checking.

• Godefroid created the dynamic partial-order reduction algorithm in 2005

“In Situ” Model Checking

Fundamental challenges of model checking:• Model creation (and validation)• Managing state explosion

Ameliorate first challengeby running instrumentedversions of the code.

Ameliorate second challengeby pruning the state-space based upon independenceof operations.

8/28


Process 0 Process 1 Process 2 Process 3

Scheduler

Socket Communication

Our Contribution: In Situ Model Checker For MPI

ConsiderWildcard

Receives andTheir Interleaving

9/28


Code to handle MPI_Win_unlock (in general, this is how every

MPI_SomeFunc is structured…) MPI_Win_unlock(arg1, arg2...argN) {

sendToSocket(pID, Win_unlock, arg1,...,argN);

while(recvFromSocket(pID) != go-ahead)

MPI_Iprobe(MPI_ANY_SOURCE, 0, MPI_COMM_WORLD...);

return PMPI_Win_unlock(arg1, arg2...argN);

}

An innocuous Progress-Engine “Poker”Introduced for handling one-sided MPI

10/28


Current MPI Constructs Examined

• MPI Constructs Examined:– MPI_Init– MPI_Send– MPI_Ssend– MPI_Recv– MPI_Barrier– MPI_Finalize– MPI_Win_lock– MPI_Win_unlock– MPI_Put– MPI_Get– MPI_Accumulate

11/28

Required creating code whichcommunicated with scheduler.

Required understanding howthe progress engine worked with MPICH (with adjustmentsto the scheduler to employ thisinformation judiciously).


0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize


Process P0 Process P1

Current Position: NULL / NULL

Scheduler Options: P0:0 and P1:0

Scheduler Choice:

MPI One-Sided Example

12/28





Current Position: NULL / NULL


Scheduler Choice: P1:0


13/28





Current Position: NULL / P1:0


Scheduler Choice:


14/28









15/28







Scheduler Choice:


16/28









17/28







Scheduler Choice:


18/28









19/28







Scheduler Choice:


20/28






Scheduler Options: P0:0

Scheduler Choice:


21/28





Current Position: P0:0 / P1:4

Scheduler Options: P0:1

Scheduler Choice: P0:1 – P0:4


22/28





Current Position: P0:4 / P1:4


Scheduler Choice:


Does it matter which choiceIt makes? Are these

independent?

23/28


Partial-Order Reduction

• With 3 processes, the size of an interleaved state space is p3=27

• Partial-order reduction explores representative sequences from each equivalence class

• Delays the execution of independent transitions

• In this example, it is possible to “get away” with 7 states (one interleaving)

24/28


Full = { … }Enabled = {…}Backtrack = {…}



Transition 1

Transition 2

Transition 3

Run the “instrumented” programto populate the full set of transitionsand the enabled set of transitions at each state.

Dynamic Partial-Order Reduction

Given enabled sets E, we want to find backset sets Bsuch that B is a proper subset of E and such that B capturesrepresentatives of all equivalentexecutions (under the notion ofIndependence)

25/28


MPI Functions Dependence

MPI_Init

MPI_Send

MPI_Ssend

MPI_Recv

MPI_Barrier

MPI_Win_lock

MPI_Win_unlock

MPI_Win_free

MPI_Finalize

None

MPI_Send, MPI_Ssend, MPI_Recv

MPI_Send, MPI_Ssend, MPI_Recv

MPI_Send, MPI_Ssend

None

None

MPI_Win_unlock

None

None

Defining Dependence

26/28


Program Number of Procs

Interleavings

without DPOR

Interleavings with DPOR

Byte-range (reduced depth)

2 2289 119

Byte-range

(full depth)

2 - 1522

Example Benefits: One-Sided Byte-Range Protocol

27/28


• Formal methods, and in particular finite-state model checking, provide a means of reasoning about concurrent algorithms.

• Principle challenges of modeling checking approach:- Requires modeling step- Can lead to “state explosion”

Both of which can be ameliorated by In-Situ Model Checking

Future Work:• Expand number of MPI Primitives (and the corresponding dependence table)• Exploit code-slicing to remove ancillary operations

Funding Acknowledgements:

• NSF (CSR–SMA: Toward Reliable and Efficient Message Passing Software Through Formal Analysis)• Microsoft (Formal Analysis and Code Generation Support for MPI)• Office of Science – Department of Energy

Summary

28/28

practical model-checking method for verifying correctness of mpi programs

Documents

situ model checking

mpi callshave

practical model

scaled model

mpi statemachineprovide

statemachine model

mpi programsmain related

situ modeling