a performance comparison of dsm, pvm, and mpi paul werstein mark pethick zhiyi huang

18
A Performance A Performance Comparison of Comparison of DSM, PVM, and DSM, PVM, and MPI MPI Paul Werstein Paul Werstein Mark Pethick Mark Pethick Zhiyi Huang Zhiyi Huang

Upload: dayna-edwards

Post on 04-Jan-2016

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang

A Performance A Performance Comparison of Comparison of DSM, PVM, and DSM, PVM, and

MPIMPIPaul WersteinPaul Werstein

Mark PethickMark Pethick

Zhiyi HuangZhiyi Huang

Page 2: A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang

IntroductionIntroduction

Relatively little is known about the Relatively little is known about the performance of Distributed Shared performance of Distributed Shared

Memory systems compared to Memory systems compared to Message Passing systems.Message Passing systems.

We compare the performance of the We compare the performance of the TreadMarks DSM system with two TreadMarks DSM system with two popular message passing systems, popular message passing systems,

MPICH-MPI, and PVM.MPICH-MPI, and PVM.

Page 3: A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang

IntroductionIntroduction

Three applications are compared, Three applications are compared, Mergesort, Mandelbrot Set Mergesort, Mandelbrot Set

Generation, and Backpropergation Generation, and Backpropergation Neural Network. Neural Network.

Each application represents a Each application represents a different class of problem.different class of problem.

Page 4: A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang

TreadMarks DSMTreadMarks DSM

Provides locks and barriers as Provides locks and barriers as primitives.primitives.

Uses Lazy Release Consistency.Uses Lazy Release Consistency. Granularity of sharing is a page.Granularity of sharing is a page. Creates page differentials to avoid Creates page differentials to avoid

the false sharing effect.the false sharing effect. Version 1.0.3.3Version 1.0.3.3

Page 5: A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang

Parallel Virtual MachineParallel Virtual Machine

Provides concept of a virtual parallel Provides concept of a virtual parallel machine.machine.

Exists as a daemon on each node.Exists as a daemon on each node. Inter-process communication is Inter-process communication is

mediated by the daemons.mediated by the daemons. Design for flexibility.Design for flexibility. Version 3.4.3.Version 3.4.3.

Page 6: A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang

MPICH - MPIMPICH - MPI

Standard interface for developing Standard interface for developing Message Passing Applications.Message Passing Applications.

Primary design goal is performance.Primary design goal is performance. Primarily defines communications Primarily defines communications

primitives.primitives. MPICH is a reference platform for MPICH is a reference platform for

the MPI standard.the MPI standard. Version 1.2.4Version 1.2.4

Page 7: A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang

SystemSystem

32 Node Linux Cluster32 Node Linux Cluster 800mhz Pentium with 256 MB 800mhz Pentium with 256 MB Redhat 7.2Redhat 7.2 100mbit Ethernet100mbit Ethernet

Results determined for 1, 2, 4, 8, 16, Results determined for 1, 2, 4, 8, 16, 24, and 32 processes.24, and 32 processes.

Page 8: A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang

MergesortMergesort

Parallelisation Strategy used is Parallelisation Strategy used is Divide and Conqueror.Divide and Conqueror.

Synchronisation between pairs of Synchronisation between pairs of nodes.nodes.

Loosely Synchronous class problem.Loosely Synchronous class problem.• Coarse grained synchronisationCoarse grained synchronisation

• Irregular synchronisation Irregular synchronisation points.points.

• Alternate phases of Alternate phases of computation and computation and communication.communication.

Page 9: A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang

Mergesort Results (1)Mergesort Results (1)

Page 10: A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang

Mergesort Results (2)Mergesort Results (2)

Page 11: A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang

Mandelbrot SetMandelbrot Set

Strategy used is Data Partitioning.Strategy used is Data Partitioning. Work Pool is used as computation Work Pool is used as computation

time of sections differs.time of sections differs. Work Pool size >= 2 * num Work Pool size >= 2 * num

processes.processes. Embarrassingly Parallel class Embarrassingly Parallel class

problem.problem.• May involve complex computation, May involve complex computation, but there is very little communication.but there is very little communication.• Give indication of performance Under Give indication of performance Under ideal conditions.ideal conditions.

Page 12: A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang

Mandelbrot Set ResultsMandelbrot Set Results

Page 13: A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang

Neural Network (1)Neural Network (1)

Strategy is Data Partitioning.Strategy is Data Partitioning. Each processor trains the network Each processor trains the network

on a subsection of the data set.on a subsection of the data set. Changes are summed and applied at Changes are summed and applied at

the end of each epoch.the end of each epoch. Requires large data sets to be Requires large data sets to be

effective.effective..

Page 14: A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang

Neural Network (2)Neural Network (2)

Synchronous class problem.Synchronous class problem.

• Characterised by algorithm that Characterised by algorithm that carries out the same operation on all carries out the same operation on all points in the data set.points in the data set.• Synchronisation occurs at regular Synchronisation occurs at regular points.points.• Often applies to problems that use Often applies to problems that use data partitioning.data partitioning.• A large number of problems appear A large number of problems appear to belong to the synchronous class.to belong to the synchronous class.

Page 15: A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang

Neural Network Results Neural Network Results (1)(1)

Page 16: A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang

Neural Network Results Neural Network Results (2)(2)

Page 17: A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang

Neural Network Results Neural Network Results (3)(3)

Page 18: A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang

ConclusionConclusion

In general the performance of DSM In general the performance of DSM is poorer than that of MPICH or is poorer than that of MPICH or PVM.PVM.

Main reasons identified are:Main reasons identified are:• The increased use of memory The increased use of memory associated with the creation of page associated with the creation of page differentials.differentials.• False sharing affect due to the False sharing affect due to the granularity of sharing.granularity of sharing.• Differential accumulation in the Differential accumulation in the gather operation.gather operation.