fault tolerant mpimeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · fault...

25
Fault Tolerant MPI Fault Tolerant MPI Protocols and Implementations Protocols and Implementations Brian J. Argauer Brian J. Argauer Stephen R. Byers Stephen R. Byers May 23, 2006 May 23, 2006 Multiple Processor Systems EECC 756 Multiple Processor Systems EECC 756 Dr. Muhammad Shaaban Dr. Muhammad Shaaban

Upload: others

Post on 25-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

Fault Tolerant MPIFault Tolerant MPIProtocols and ImplementationsProtocols and Implementations

Brian J. ArgauerBrian J. ArgauerStephen R. ByersStephen R. Byers

May 23, 2006May 23, 2006

Multiple Processor Systems EECC 756Multiple Processor Systems EECC 756Dr. Muhammad ShaabanDr. Muhammad Shaaban

Page 2: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

OutlineOutline

Motivation for Fault ToleranceMotivation for Fault ToleranceTechniquesTechniques

CheckpointCheckpointMessage LoggingMessage Logging

ImplementationsImplementationsMPICHMPICH--V1V1MPICHMPICH--V2V2MPICHMPICH--VCLVCL

CoCheck FrameworkCoCheck FrameworkConclusionConclusion

Page 3: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

Fault Tolerant MotivationFault Tolerant Motivation

Current trend toward Current trend toward larger clusters, larger clusters, distributed and distributed and GRID computingGRID computingSource of FailureSource of Failure

NodesNodesNetworkNetworkHuman FactorsHuman Factors

Thousands of nodes Thousands of nodes reduces MTBF to reduces MTBF to hours or minuteshours or minutes

Page 4: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

Techniques: CheckpointTechniques: CheckpointCapture entire state of taskCapture entire state of task

Application, Stack, Allocated memory, etc.Application, Stack, Allocated memory, etc.Program FailureProgram Failure

Kill survivorsKill survivorsRestart from last consistent and complete set of checkpointsRestart from last consistent and complete set of checkpoints

CoordinatedCoordinatedExpensiveExpensiveAll tasks stop message passingAll tasks stop message passingWrite to disks simultaneouslyWrite to disks simultaneouslyContinue Message PassingContinue Message Passing

UncoordinatedUncoordinatedNodes Checkpoint at different timesNodes Checkpoint at different timesInIn--flight messages retained via explicit loggingflight messages retained via explicit logging

Page 5: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

Techniques: Message LoggingTechniques: Message Logging

Pessimistic LogPessimistic LogTransaction loggingTransaction loggingNo incoherent states can be reachedNo incoherent states can be reachedCan handle an unbounded number of faultsCan handle an unbounded number of faults

Optimistic LogOptimistic LogLog messagesLog messagesAssume part of log lost when faults occurAssume part of log lost when faults occurEither rollback entire application if too many faults or Either rollback entire application if too many faults or assume only 1 fault at a time can occur in systemassume only 1 fault at a time can occur in system

Causal LogCausal Log

Page 6: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

Technique DecisionTechnique Decision

Checkpointing is efficient with low fault frequency!Checkpointing is efficient with low fault frequency!Message logging is efficient higher fault frequency!Message logging is efficient higher fault frequency!

Page 7: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

MPICHMPICH--V IntroductionV Introduction

Traditional MPITraditional MPIStatic ResourcesStatic ResourcesLimited Error HandingLimited Error HandingNode failure stalls or slows down other nodesNode failure stalls or slows down other nodes

MPICHMPICH--VVResearch effort to provide MPI implementation Research effort to provide MPI implementation based on MPICHbased on MPICHAutomatic fault tolerant MPI libraryAutomatic fault tolerant MPI libraryImplementations: MPICHImplementations: MPICH--V1, MPICHV1, MPICH--V2, V2, MPICHMPICH--VCausal, MPICHVCausal, MPICH--VCLVCL

Page 8: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

Fault Tolerant OverviewFault Tolerant Overview

MPICH-VCLMPICH-V1/V2

FT-MPI

Automatic Non-Automatic

Co-Check

MPICH-V/Causal

Page 9: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

MPICHMPICH--V1 GoalsV1 Goals

Volatility ToleranceVolatility ToleranceRedundancyRedundancyTask MigrationTask Migration

Highly DistributedHighly DistributedScalable Scalable Asynchronous Asynchronous CheckpointingCheckpointingNo Global No Global SynchronizationSynchronization

InterInter--administration administration domain domain communicationscommunications

Security Tools for Security Tools for GRID DeploymentGRID DeploymentUse nonUse non--protected protected relay between client relay between client and server nodes if and server nodes if client and server client and server both fire walledboth fire walled

Page 10: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

MPICHMPICH--V1 OverviewV1 Overview

Designed from standard MPI Designed from standard MPI ImplementationImplementationRun existing MPI applications without Run existing MPI applications without modificationmodificationSuitable for very large scale Suitable for very large scale computing using heterogeneous computing using heterogeneous networksnetworksUncoordinated CheckpointUncoordinated CheckpointRemote Pessimistic Message LoggingRemote Pessimistic Message Logging

Page 11: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

MPICHMPICH--V1 ArchitectureV1 Architecture

Checkpoint ServerCheckpoint ServerStore and provide Store and provide task imagestask imagesImages sent to CS Images sent to CS as generated by as generated by nodesnodesImage clone of Image clone of running process on running process on given nodegiven node

Page 12: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

MPICHMPICH--V1 ArchitectureV1 Architecture

Channel MemoryChannel MemoryStorage of inStorage of in--transit transit messagesmessagesRepository servicesRepository services

DispatcherDispatcherResource SchedulingResource SchedulingTask ManagementTask Management

Page 13: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

MPICHMPICH--V1 PerformanceV1 Performance

Page 14: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

Fault Tolerant OverviewFault Tolerant Overview

MPICH-VCLMPICH-V1/V2

FT-MPI

Automatic Non-Automatic

Co-Check

MPICH-V/Causal

Page 15: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

MPICHMPICH--V2V2Pessimistic Logging for large clustersPessimistic Logging for large clustersUncoordinated CheckpointUncoordinated CheckpointNodes store messages they send locallyNodes store messages they send locallyEvent Loggers store sequence of received messages for each nodeEvent Loggers store sequence of received messages for each node

Page 16: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

Fault Tolerant OverviewFault Tolerant Overview

MPICH-VCLMPICH-V1/V2

FT-MPI

Automatic Non-Automatic

Co-Check

MPICH-V/Causal

Page 17: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

MPICHMPICH--VCLVCLNewest MPICHNewest MPICH--VVDesigned for extra low latency dependent applicationsDesigned for extra low latency dependent applicationsCoordinated CheckpointCoordinated Checkpoint

Page 18: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

MPICHMPICH--VCL PerformanceVCL Performance

NAS Benchmark BT NAS Benchmark BT Class B, 25 Nodes, Class B, 25 Nodes, Fast EthernetFast EthernetPerformance Performance crossover point crossover point between checkpoint between checkpoint and message and message logging: 1 fault every logging: 1 fault every 3 minutes3 minutes

Page 19: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

Fault Tolerant OverviewFault Tolerant Overview

MPICH-VCLMPICH-V1/V2

FT-MPI

Automatic Non-Automatic

Co-Check

MPICH-V/Causal

Page 20: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

CoCheck FrameworkCoCheck Framework

Abstraction FrameworkAbstraction FrameworkAbove message passing layerAbove message passing layerEasily adaptable and portable to different Easily adaptable and portable to different MPI implementations through the use of MPI implementations through the use of wrapper functionswrapper functionsProvides consistencyProvides consistencyConsiders checkpointing & process Considers checkpointing & process migrationmigrationtuMPItuMPI

Page 21: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

State Consistency ProblemState Consistency Problem

Processes A,B,CProcesses A,B,CCircles = Events, Arrows = Message SendingCircles = Events, Arrows = Message SendingS, S’, S’’ = Checkpoint SnapshotsS, S’, S’’ = Checkpoint SnapshotsNotice S’’ is inconsistentNotice S’’ is inconsistent

Page 22: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

Clearing Communication LinesClearing Communication Lines

Uses Coordinator ProcessUses Coordinator ProcessSends “ReadySends “Ready--Message” Message” (RM) when checkpoint or (RM) when checkpoint or migration is needed. migration is needed. If process receives RM, If process receives RM, assumes no more assumes no more communicationcommunicationOnce all RMs are Once all RMs are received... can checkpoint received... can checkpoint or migrateor migrateOn restart… check for On restart… check for messages in buffermessages in buffer

Page 23: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

Performance & Future ResearchPerformance & Future Research

Single processor migration Single processor migration resultsresults

vs. num processors & size of vs. num processors & size of checkpoint imagecheckpoint image

8 Machines8 MachinesMix of Sun SparcStation 2 Mix of Sun SparcStation 2 and Sparc 10and Sparc 10

Dominating FactorDominating FactorImage SizeImage Size

Future ConsiderationsFuture ConsiderationsAutomatic load performance Automatic load performance and balancingand balancing

Page 24: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

ConclusionConclusion

Current trendCurrent trendIncreasing cluster size Increasing cluster size Lower MTBFLower MTBFFault tolerance increasingly importantFault tolerance increasingly important

Fault tolerant implementations in MPI offer Fault tolerant implementations in MPI offer assortment of solutionsassortment of solutionsNew research yielding new improvements & New research yielding new improvements & ideas to enhance efficiency and robustness of ideas to enhance efficiency and robustness of fault tolerant systems.fault tolerant systems.

Page 25: Fault Tolerant MPImeseec.ce.rit.edu/756-projects/spring2006/d3/2/fault... · 2006. 5. 23. · Fault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002)

Questions?Questions?References:References:

G. Bosilca, A. Bouteiller, F. Cappello, S. Djilali, G. Fedak, C.G. Bosilca, A. Bouteiller, F. Cappello, S. Djilali, G. Fedak, C. Germain, T. Herault, P. Germain, T. Herault, P. Lemarinier, O. Lodygensky, F. Magniette, V. Neri, A. Selikhov Lemarinier, O. Lodygensky, F. Magniette, V. Neri, A. Selikhov MPICHMPICH--V: Toward a Scalable V: Toward a Scalable Fault Tolerant MPI for Volatile NodesFault Tolerant MPI for Volatile Nodes, LRI, Université de Paris Sud, Orsay, France (2002) , LRI, Université de Paris Sud, Orsay, France (2002) IEEE.IEEE.G. Stellner, G. Stellner, CoCheck: Checkpointing and Process Migration for MPICoCheck: Checkpointing and Process Migration for MPI, Institüt fur Informatik , Institüt fur Informatik der Tecnischen Universität München, Müchen, Germanyder Tecnischen Universität München, Müchen, GermanyG.G. FaggFagg,, Fault TolerantFault Tolerant MPIMPI,, LinuxLinux Magazine (Magazine (NovemberNovember 2004). 2004). [Online]. Available: [Online]. Available: http://www.linuxhttp://www.linux--mag.com/index2.php?option=com_content&task=view&id=1781&Itemid=2mag.com/index2.php?option=com_content&task=view&id=1781&Itemid=2070&pop=1&pag070&pop=1&page=0e=0A. Bouteiller, P.A. Bouteiller, P. LemarinierLemarinier, G., G. KraqezikKraqezik, F., F. CappelloCappello,, Coordinated CheckpointCoordinated Checkpoint versus versus Message Log forMessage Log for Fault TolerantFault Tolerant MPIMPI . LRI, Université de Paris Sud, Orsay, France. LRI, Université de Paris Sud, Orsay, FranceW. Gropp, E. Lusk, W. Gropp, E. Lusk, Fault Tolerance in MPI ProgramsFault Tolerance in MPI Programs, Argonne National Laboratory, Argone , Argonne National Laboratory, Argone ILILA. Bouteiller, T. Herault, G. Krawezik, P. Lemarinier, F. CappelA. Bouteiller, T. Herault, G. Krawezik, P. Lemarinier, F. Cappello lo MPICHMPICH--V ProjectV Project : A : A Multiprotocol Automatic Fault Tolerant MPIMultiprotocol Automatic Fault Tolerant MPI, INRIA/LRI, Université de Paris Sud, Orsay, , INRIA/LRI, Université de Paris Sud, Orsay, France.France.MPICHMPICH--V IntroductionV Introduction. . [Online]. Available: http://mpich[Online]. Available: http://mpich--v.lri.fr/index.php?section=intro&subsection=introv.lri.fr/index.php?section=intro&subsection=intro