casc this work was performed under the auspices of the u.s. department of energy by university of...
TRANSCRIPT
CASC
This work was performed under the auspices of the U.S. Department of Energyby University of California Lawrence Livermore National Laboratoryunder contract No. W-7405-Eng-48. UCRL-PRES-XXXXXX.
Introducing Cooperative Parallelism
John May, David Jefferson Nathan Barton, Rich Becker, Jarek Knap
Gary Kumfert, James Leek, John TannahillLawrence Livermore National Laboratory
presented to the CCA Forum 25 Jan 2007
Outline
Challenges for massively parallel programming Cooperative parallel programming model Applications for cooperative parallelism Cooperative parallelism and Babel Ongoing work
Massive parallelism strains SPMD
New techniques needed to fill the gap
Increasingly difficult to make all processors work in lock-step
— Lack of inherent parallelism— Load balance
New techniques need richer programming model than pure SPMD with MPI
— Adaptive sampling— Multi-model simulation (e.g.,
components) Fault tolerance requires
better process management— Need smaller unit of granularity
for failure recovery, checkpoint/restart
Parallel symponentusing MPI internally
Ad hoc symponentcreation and communication
Runtime system
Introducing Cooperative Parallelism
Computational job consists of multiple interacting “symponents”— Large parallel (MPI) jobs or single processes— Created and destroyed dynamically— Appear as objects to each other— Communicate through remote method invocation (RMI)
Apps can add symponents incrementally Designed to complement MPI, not replace it!
Cooperative parallelism features
Three synchronization styles for RMI—Blocking (caller waits for return)—Nonblocking (caller checks later for result)—One-way (caller dispatches request and has no
further interaction) Target of RMI can be a single process or a parallel
job, with parameters distributed to all tasks Closely integrated with Babel framework
—Symponents written in C, C++, Fortran, F90, Java, and Python interact seamlessly
—Developer writes interface description files to specify RMI interfaces
—Exceptions propagated from remote methods—Object-oriented structure lets symponents inherit
capabilities and interfaces from other symponents
Benefits of cooperative parallelism
Easy subletting of work improves load balance Simple model for expressing task-based parallelism
(rather than data parallelism) Nodes can be suballocated dynamically Dynamic management of symponent jobs supports
fault tolerance—Caller notified of failing symponents; can re-
launch Existing stand-alone applications can be modified
and combined as discrete modules
But what about MPI?
Cooperative parallelism—Dynamic management of symponents—Components are opaque to each other—Communication is connectionless, ad-hoc,
interrupting and point-to-point MPI and MPI-2
—Mostly-static process management (MPI-2 can spawn processes but not explicitly terminate them)
—Tasks are typically highly-coordinated—Communication is connection-oriented and either
point-to-point or collective; MPI-2 supports remote memory access
Well-balanced work
Serverproxy
Servers for unbalanced work
Applications: Load balancing
Divide work into well-balanced and unbalanced parts Run balanced work as a regular MPI job Set up pool of servers to handle unbalanced work
— Server proxy assigns work to available servers Tasks with extra work can sublet it in parallel so they can catch
up to less-busy tasks
Coarse scale model
Serverproxy
Fine scale servers
Unknown function
Interpolated valuesNewly computed value
Previously computed values
Applications: Adaptive sampling
Multiscale model, similar to AMR— BUT: Can use different models at different scales
Fine-scale computations requested from remote servers to improve load balance
Initial results cached in a database Later computations check cached results and interpolate if
accuracy is acceptable
Master
Completedsimulation
Completedsimulation
Activesimulation
Activesimulation
Activesimulation
Applications: Parameter studies
Master process launches multiple parallel components to complete a simulation, each with different parameters
Existing simulation codes can be wrapped to form components
Master can direct study based on accumulated results, launch new components as others complete
Applications: Federated simulations
Components modeling separate physical entities interact Potential example: Ocean, atmosphere, sea ice
— Each modeled in a separate job— Interactions communicated through RMI (N-by-M parallel
RMI is future work)
Cooperative parallelism and Babel
Babel gives Co-op— Language interoperability, SIDL, object-oriented model— RMI, including exception handling
Co-op adds— Symponent launch, monitoring, and termination— Motivation and resources for extending RMI— Patterns for developing task-parallel applications
Babel and Cooperative Parallelism teams are colocated and share some staff
Status: Runtime software
Prototype runtime software working on Xeon, Opteron and Itanium systems
Competed 1360-CPU demonstration in September Beginning to do similar-scale runs on new platforms Planning to port to IBM systems this year Ongoing work to enhance software robustness and
documentation
Status: Applications
Ongoing experiments with material modeling application—Demonstrated 10-100X speedups on >1000
processors using adaptive sampling Investigating use in parameter study application In discussions with several other apps groups at
LLNL Also looking for possible collaborations with groups
outside LLNL
Contacts: John May ([email protected]), David Jefferson ([email protected])
http://www.llnl.gov/casc/coopParallelism/