tutorial: the zoltan toolkit

90
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. Tutorial: The Zoltan Toolkit Karen Devine and Cedric Chevalier Sandia National Laboratories, NM Umit Çatalyürek Ohio State University CSCAPES Workshop, June 2008

Upload: colin-neal

Post on 01-Jan-2016

39 views

Category:

Documents


3 download

DESCRIPTION

Tutorial: The Zoltan Toolkit. Karen Devine and Cedric Chevalier Sandia National Laboratories, NM Umit Çatalyürek Ohio State University CSCAPES Workshop, June 2008. Outline. High-level view of Zoltan Requirements, data models, and interface Dynamic Load Balancing and Partitioning - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Tutorial: The Zoltan Toolkit

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy’s National Nuclear Security Administration

under contract DE-AC04-94AL85000.

Tutorial: The Zoltan Toolkit

Karen Devine and Cedric Chevalier

Sandia National Laboratories, NM

Umit Çatalyürek

Ohio State University

CSCAPES Workshop, June 2008

Page 2: Tutorial: The Zoltan Toolkit

Slide 2

Outline

• High-level view of Zoltan• Requirements, data models, and interface• Dynamic Load Balancing and Partitioning• Matrix Ordering• Graph Coloring• Utilities• Alternate Interfaces• Future Directions

Page 3: Tutorial: The Zoltan Toolkit

Slide 3

The Zoltan Toolkit

Unstructured Communication

Data Migration Matrix Ordering

Dynamic Load Balancing

Distributed Data Directories

A B C

0 1 0

D E F

2 1 0

G H I

1 2 1

• Library of data management services for unstructured, dynamic

and/or adaptive computations.

Graph Coloring

Page 4: Tutorial: The Zoltan Toolkit

Slide 4

Zoltan System Assumptions

• Assume distributed memory model.• Data decomposition + “Owner computes”:

– The data is distributed among the processors.– The owner performs all computation on its data.– Data distribution defines work assignment.– Data dependencies among data items owned by different

processors incur communication.

• Requirements: – MPI– C compiler– GNU Make (gmake)

Page 5: Tutorial: The Zoltan Toolkit

Slide 5Zoltan SupportsMany Applications

• Different applications, requirements, data structures.

Multiphysics simulations

x bA

=

Linear solvers & preconditioners

Adaptive mesh refinement

Crash simulations

Particle methods

Parallel electronics networks

12

VsSOURCE_VOLTAGE

12

RsR

12 Cm012

C

12

Rg02R

12

Rg01R

12 C01

C

12 C02

C12

L2

INDUCTOR

12L1

INDUCTOR

12R1

R

12R2

R

12

RlR

12

Rg1R

12

Rg2R

12 C2

C

12 C1

C

12 Cm12

C

Page 6: Tutorial: The Zoltan Toolkit

Slide 6

Zoltan Interface Design

• Common interface to each class of tools.• Tool/method specified with user parameters.

• Data-structure neutral design.– Supports wide range of applications and data structures.– Imposes no restrictions on application’s data structures.– Application does not have to build Zoltan’s data

structures.

Page 7: Tutorial: The Zoltan Toolkit

Slide 7

Zoltan Interface

• Simple, easy-to-use interface.– Small number of callable Zoltan functions.

– Callable from C, C++, Fortran.

• Requirement: Unique global IDs for objects to

be partitioned/ordered/colored. For example:– Global element number.

– Global matrix row number.

– (Processor number, local element number)

– (Processor number, local particle number)

Page 8: Tutorial: The Zoltan Toolkit

Slide 8

Zoltan Application Interface• Application interface:

– Zoltan queries the application for needed info.• IDs of objects, coordinates, relationships to other objects.

– Application provides simple functions to answer queries.– Small extra costs in memory and function-call overhead.

• Query mechanism supports…– Geometric algorithms

• Queries for dimensions, coordinates, etc.

– Hypergraph- and graph-based algorithms • Queries for edge lists, edge weights, etc.

– Tree-based algorithms • Queries for parent/child relationships, etc.

• Once query functions are implemented, application can access all Zoltan functionality.

– Can switch between algorithms by setting parameters.

Page 9: Tutorial: The Zoltan Toolkit

Slide 9

Zoltan Application Interface

Initialize Zoltan(Zoltan_Initialize,

Zoltan_Create)

Select Method and Parameters

(Zoltan_Set_Params)

Register query functions(Zoltan_Set_Fn)

Re-partition(Zoltan_LB_Partition)

COMPUTE

Move data(Zoltan_Migrate)

Clean up (Zoltan_Destroy)

APPLICATION

Zoltan_LB_Partition:• Call query functions.• Build data structures.• Compute new

decomposition.• Return import/export

lists.

Zoltan_Migrate:• Call packing query

functions for exports.• Send exports.• Receive imports.• Call unpacking query

functions for imports.

ZOLTAN

Page 10: Tutorial: The Zoltan Toolkit

Slide 10

Zoltan Query Functions

General Query Functions

ZOLTAN_NUM_OBJ_FN Number of items on processor

ZOLTAN_OBJ_LIST_FN List of item IDs and weights.

Geometric Query Functions

ZOLTAN_NUM_GEOM_FN Dimensionality of domain.

ZOLTAN_GEOM_FN Coordinates of items.

Hypergraph Query Functions

ZOLTAN_HG_SIZE_CS_FN Number of hyperedge pins.

ZOLTAN_HG_CS_FN List of hyperedge pins.

ZOLTAN_HG_SIZE_EDGE_WTS_FN Number of hyperedge weights.

ZOLTAN_HG_EDGE_WTS_FN List of hyperedge weights.

Graph Query Functions

ZOLTAN_NUM_EDGE_FN Number of graph edges.

ZOLTAN_EDGE_LIST_FN List of graph edges and weights.

Page 11: Tutorial: The Zoltan Toolkit

Slide 11Example zoltanSimple.c: ZOLTAN_OBJ_LIST_FN

void exGetObjectList(void *userDefinedData, int numGlobalIds, int numLocalIds, ZOLTAN_ID_PTR gids, ZOLTAN_ID_PTR lids, int wgt_dim, float *obj_wgts, int *err){ /* ZOLTAN_OBJ_LIST_FN callback function.** Returns list of objects owned by this processor.** lids[i] = local index of object in array.*/ int i; for (i=0; i<NumPoints; i++) { gids[i] = GlobalIds[i]; lids[i] = i; } *err = 0; return;

}

Page 12: Tutorial: The Zoltan Toolkit

Slide 12Example zoltanSimple.c: ZOLTAN_GEOM_MULTI_FN

void exGetObjectCoords(void *userDefinedData, int numGlobalIds, int numLocalIds, int numObjs, ZOLTAN_ID_PTR gids, ZOLTAN_ID_PTR lids, int numDim, double *pts, int *err){ /* ZOLTAN_GEOM_MULTI_FN callback.** Returns coordinates of objects listed in gids and lids.*/ int i, id, id3, next = 0; if (numDim != 3) { *err = 1; return; } for (i=0; i<numObjs; i++){ id = lids[i]; if ((id < 0) || (id >= NumPoints)) { *err = 1; return; } id3 = lids[i] * 3; pts[next++] = (double)(Points[id3]); pts[next++] = (double)(Points[id3 + 1]); pts[next++] = (double)(Points[id3 + 2]); }}

Page 13: Tutorial: The Zoltan Toolkit

Slide 13

More Details on Query Functions• void* data pointer allows user data structures to be used in all

query functions.– To use, cast the pointer to the application data type.

• Local IDs provided by application are returned by Zoltan to

simplify access of application data. – E.g. Indices into local arrays of coordinates.

•ZOLTAN_ID_PTR is pointer to array of unsigned integers,

allowing IDs to be more than one integer long.– E.g., (processor number, local element number) pair.– numGlobalIds and numLocalIds are lengths of each ID.

• All memory for query-function arguments is allocated in Zoltan.

void ZOLTAN_GET_GEOM_MULTI_FN(void *userDefinedData, int numGlobalIds, int numLocalIds, int numObjs, ZOLTAN_ID_PTR gids, ZOLTAN_ID_PTR lids, int numDim, double *pts, int *err)

Page 14: Tutorial: The Zoltan Toolkit

Slide 14

Using Zoltan in Your Application

1. Decide what your objects are. Elements? Grid points? Matrix rows? Particles?

2. Decide which tools (partitioning/ordering/coloring/utilities) and class of method (geometric/graph/hypergraph) to use.

3. Download Zoltan. http://www.cs.sandia.gov/Zoltan

4. Write required query functions for your application. Required functions are listed with each method in Zoltan

User’s Guide.

5. Call Zoltan from your application.6. #include “zoltan.h” in files calling Zoltan.7. Edit Zoltan configuration file and build Zoltan.8. Compile application; link with libzoltan.a.

mpicc application.c -lzoltan

Page 15: Tutorial: The Zoltan Toolkit

Slide 15

Partitioning and Load Balancing• Assignment of application data to processors for parallel

computation.• Applied to grid points, elements, matrix rows, particles, ….

Page 16: Tutorial: The Zoltan Toolkit

Slide 16

Partitioning Interface

Zoltan computes the difference (Δ) from current distributionChoose between:a) Import lists (data to import from other procs)b) Export lists (data to export to other procs)c) Both (the default)

err = Zoltan_LB_Partition(zz, &changes, /* Flag indicating whether partition changed */&numGidEntries, &numLidEntries,&numImport, /* objects to be imported to new part */&importGlobalGids, &importLocalGids, &importProcs, &importToPart, &numExport, /* # objects to be exported from old part */&exportGlobalGids, &exportLocalGids, &exportProcs, &exportToPart);

Page 17: Tutorial: The Zoltan Toolkit

Slide 17

Static Partitioning

• Static partitioning in an application:– Data partition is computed.– Data are distributed according to partition map.– Application computes.

• Ideal partition:– Processor idle time is minimized.– Inter-processor communication costs are kept low.

• Zoltan_Set_Param(zz, “LB_APPROACH”, “PARTITION”);

InitializeApplication

PartitionData

DistributeData

ComputeSolutions

Output& End

Page 18: Tutorial: The Zoltan Toolkit

Slide 18Dynamic Repartitioning (a.k.a. Dynamic Load Balancing)

InitializeApplication

PartitionData

RedistributeData

ComputeSolutions& Adapt

Output& End

• Dynamic repartitioning (load balancing) in an application:– Data partition is computed.– Data are distributed according to partition map.– Application computes and, perhaps, adapts.– Process repeats until the application is done.

• Ideal partition:– Processor idle time is minimized.– Inter-processor communication costs are kept low.– Cost to redistribute data is also kept low.

• Zoltan_Set_Param(zz, “LB_APPROACH”, “REPARTITION”);

Page 19: Tutorial: The Zoltan Toolkit

Slide 19Zoltan Toolkit:Suite of Partitioners

• No single partitioner works best for all applications.– Trade-offs:

• Quality vs. speed.• Geometric locality vs. data dependencies.• High-data movement costs vs. tolerance for remapping.

• Application developers may not know which partitioner is best for application.

• Zoltan contains suite of partitioning methods.– Application changes only one parameter to switch

methods.• Zoltan_Set_Param(zz, “LB_METHOD”, “new_method_name”);

– Allows experimentation/comparisons to find most effective partitioner for application.

Page 20: Tutorial: The Zoltan Toolkit

Slide 20Partitioning Algorithms in the Zoltan Toolkit

Recursive Coordinate Bisection (Berger, Bokhari)Recursive Inertial Bisection (Taylor, Nour-Omid)

Zoltan Graph PartitioningParMETIS (U. Minnesota)

Jostle (U. Greenwich)

Hypergraph PartitioningHypergraph Repartitioning PaToH (Catalyurek & Aykanat)

Geometric (coordinate-based) methods

Combinatorial (topology-based) methods

Space Filling Curve Partitioning (Warren&Salmon, et al.)

Refinement-tree Partitioning (Mitchell)

Page 21: Tutorial: The Zoltan Toolkit

Slide 21

1st cut

2nd

2nd

3rd

3rd3rd

3rd

Recursive Coordinate Bisection

• Zoltan_Set_Param(zz, “LB_METHOD”, “RCB”);• Berger & Bokhari (1987).• Idea:

– Divide work into two equal parts using a cutting plane orthogonal to a coordinate axis.

– Recursively cut the resulting subdomains.

Page 22: Tutorial: The Zoltan Toolkit

Slide 22

Geometric Repartitioning

• Implicitly achieves low data redistribution

costs.• For small changes in data, cuts move only

slightly, resulting in little data redistribution.

Page 23: Tutorial: The Zoltan Toolkit

Slide 23

Applications of Geometric Methods

Parallel Volume Rendering

Crash Simulationsand Contact Detection

Adaptive Mesh RefinementParticle Simulations

Page 24: Tutorial: The Zoltan Toolkit

Slide 24

RCB Advantages and Disadvantages• Advantages:

– Conceptually simple; fast and inexpensive.– All processors can inexpensively know entire partition (e.g.,

for global search in contact detection).– No connectivity info needed (e.g., particle methods).– Good on specialized geometries.

• Disadvantages:– No explicit control of communication costs.– Mediocre partition quality. – Can generate disconnected subdomains for complex

geometries.– Need coordinate information.

SLAC’S 55-cell Linear Accelerator with couplers:One-dimensional RCB partition reduced runtime up to 68% on 512 processor IBM SP3. (Wolf, Ko)

Page 25: Tutorial: The Zoltan Toolkit

Slide 25Variations on RCB : Recursive Inertial Bisection

• Zoltan_Set_Param(zz, “LB_METHOD”, “RIB”);• Simon, Taylor, et al., 1991• Cutting planes orthogonal to principle axes of

geometry.• Not incremental.

Page 26: Tutorial: The Zoltan Toolkit

Slide 26Space-Filling Curve Partitioning (SFC)

• Zoltan_Set_Param(zz, “LB_METHOD”, “HSFC”);• Space-Filling Curve (Peano, 1890):

– Mapping between R3 to R1 that completely fills a domain.– Applied recursively to obtain desired granularity.

• Used for partitioning by …– Warren and Salmon, 1993, gravitational simulations.– Pilkington and Baden, 1994, smoothed particle

hydrodynamics.– Patra and Oden, 1995, adaptive mesh refinement.

Page 27: Tutorial: The Zoltan Toolkit

Slide 27

9

20

19

18

17

16

15

14

1312

1110

8

7

65

4

321

9

20

19

18

17

16

15

14

1312

1110

8

7

65

4

321

9

20

19

18

17

16

15

14

1312

1110

8

7

65

4

321

SFC Algorithm

• Run space-filling curve through domain.• Order objects according to position on curve.• Perform 1-D partition of curve.

Page 28: Tutorial: The Zoltan Toolkit

Slide 28SFC Advantagesand Disadvantages

• Advantages:– Simple, fast, inexpensive.– Maintains geometric locality of objects in processors.– All processors can inexpensively know entire partition (e.g., for

global search in contact detection).– Implicitly incremental for repartitioning.

• Disadvantages:– No explicit control of communication costs.– Can generate disconnected subdomains.– Often lower quality partitions than RCB.– Geometric coordinates needed.

Page 29: Tutorial: The Zoltan Toolkit

Slide 29

hp-refinement mesh; 8 processors.Patra, et al. (SUNY-Buffalo)

Applications using SFC

• Adaptive hp-refinement finite element methods.– Assigns physically close elements to same processor.– Inexpensive; incremental; fast.– Linear ordering can be used

to order elements for efficient memory access.

Page 30: Tutorial: The Zoltan Toolkit

Slide 30For geometric partitioning (RCB, RIB, HSFC), use …

General Query Functions

ZOLTAN_NUM_OBJ_FN Number of items on processor

ZOLTAN_OBJ_LIST_FN List of item IDs and weights.

Geometric Query Functions

ZOLTAN_NUM_GEOM_FN Dimensionality of domain.

ZOLTAN_GEOM_FN Coordinates of items.

Hypergraph Query Functions

ZOLTAN_HG_SIZE_CS_FN Number of hyperedge pins.

ZOLTAN_HG_CS_FN List of hyperedge pins.

ZOLTAN_HG_SIZE_EDGE_WTS_FN Number of hyperedge weights.

ZOLTAN_HG_EDGE_WTS_FN List of hyperedge weights.

Graph Query Functions

ZOLTAN_NUM_EDGE_FN Number of graph edges.

ZOLTAN_EDGE_LIST_FN List of graph edges and weights.

Page 31: Tutorial: The Zoltan Toolkit

Slide 31Auxiliary Capabilities for Geometric Methods

• Zoltan can store cuts from RCB, RIB, and HSFC

inexpensively in each processor.– Zoltan_Set_Param(zz, “KEEP_CUTS”, “1”);

• Enables parallel geometric search without communication.– Useful for contact detection, particle methods, rendering.

1st cut

2nd

2nd

3rd

3rd3rd

3rd

*Determine the part/processor

owning region with a given point.Zoltan_LB_Point_PP_Assign

1st cut

2nd

2nd

3rd

3rd3rd

3rd

Determine all parts/processors overlapping a given region.Zoltan_LB_Box_PP_Assign

Page 32: Tutorial: The Zoltan Toolkit

Slide 32

Graph Partitioning

• Represent problem as a weighted graph.– Vertices = objects to be partitioned.– Edges = dependencies between two

objects.– Weights = work load or amount of

dependency.

• Partition graph so that …– Parts have equal vertex weight.– Weight of edges cut by part boundaries is

small.

• Zoltan_Set_Param(zz, “LB_METHOD”, “GRAPH”);• Zoltan_Set_Param(zz, “GRAPH_PACKAGE”, “ZOLTAN”); or Zoltan_Set_Param(zz, “GRAPH_PACKAGE”, “PARMETIS”);

• Kernighan, Lin, Schweikert, Fiduccia, Mattheyes, Simon, Hendrickson, Leland, Kumar, Karypis, et al.

Page 33: Tutorial: The Zoltan Toolkit

Slide 33

Graph Repartitioning• Diffusive strategies (Cybenko, Hu,

Blake, Walshaw, Schloegel, et al.)– Shift work from highly loaded

processors to less loaded neighbors.– Local communication keeps data

redistribution costs low.

• Multilevel partitioners that account for data redistribution costs in refining partitions (Schloegel, Karypis)– Parameter weights application communication vs.

redistribution communication.

101010

10

2030

30

10

10

20

2020

20

Partition

coarse graph

Refine partitionaccounting for

current part assignment

Coarsen graph

Page 34: Tutorial: The Zoltan Toolkit

Slide 34Applications using Graph Partitioning

x bA

=

Linear solvers & preconditioners(square, structurally symmetric systems)

Finite Element Analysis

Multiphysics andmultiphase simulations

Page 35: Tutorial: The Zoltan Toolkit

Slide 35Graph Partitioning:Advantages and Disadvantages

• Advantages:– Highly successful model for mesh-based PDE problems.– Explicit control of communication volume gives higher

partition quality than geometric methods.– Excellent software available.

• Serial: Chaco (SNL)Jostle (U. Greenwich)METIS (U. Minn.)Party (U. Paderborn)Scotch (U. Bordeaux)

• Parallel: Zoltan (SNL)ParMETIS (U. Minn.)PJostle (U. Greenwich)

• Disadvantages:– More expensive than geometric methods.– Edge-cut model only approximates communication volume.

Page 36: Tutorial: The Zoltan Toolkit

Slide 36For graph partitioning, coloring & ordering, use …

General Query Functions

ZOLTAN_NUM_OBJ_FN Number of items on processor

ZOLTAN_OBJ_LIST_FN List of item IDs and weights.

Geometric Query Functions

ZOLTAN_NUM_GEOM_FN Dimensionality of domain.

ZOLTAN_GEOM_FN Coordinates of items.

Hypergraph Query Functions

ZOLTAN_HG_SIZE_CS_FN Number of hyperedge pins.

ZOLTAN_HG_CS_FN List of hyperedge pins.

ZOLTAN_HG_SIZE_EDGE_WTS_FN Number of hyperedge weights.

ZOLTAN_HG_EDGE_WTS_FN List of hyperedge weights.

Graph Query Functions

ZOLTAN_NUM_EDGE_FN Number of graph edges.

ZOLTAN_EDGE_LIST_FN List of graph edges and weights.

Page 37: Tutorial: The Zoltan Toolkit

Slide 37

A

Graph Partitioning Model

A

Hypergraph Partitioning Model

Hypergraph Partitioning• Zoltan_Set_Param(zz, “LB_METHOD”, “HYPERGRAPH”);• Zoltan_Set_Param(zz, “HYPERGRAPH_PACKAGE”, “ZOLTAN”); or

Zoltan_Set_Param(zz, “HYPERGRAPH_PACKAGE”, “PATOH”);

• Alpert, Kahng, Hauck, Borriello, Çatalyürek, Aykanat, Karypis, et al.• Hypergraph model:

– Vertices = objects to be partitioned.– Hyperedges = dependencies between two or more objects.

• Partitioning goal: Assign equal vertex weight while minimizing hyperedge cut weight.

Page 38: Tutorial: The Zoltan Toolkit

Slide 38

Best Algorithms Paper Award at IPDPS07“Hypergraph-based Dynamic Load Balancing for Adaptive Scientific

Computations”Çatalyürek, Boman, Devine, Bozdag, Heaphy, & Riesen

Hypergraph Repartitioning• Augment hypergraph with data redistribution costs.

– Account for data’s current processor assignments.

– Weight dependencies by their size and frequency of use.

• Partitioning then tries to minimize total communication volume:

Data redistribution volume

+ Application communication volume

Total communication volume

• Data redistribution volume: callback returns data sizes.– Zoltan_Set_Fn(zz, ZOLTAN_OBJ_SIZE_MULTI_FN_TYPE,

myObjSizeFn, 0);

• Application communication volume = Hyperedge cuts * Number

of times the communication is done between repartitionings.– Zoltan_Set_Param(zz, “PHG_REPART_MULTIPLIER”, “100”);

Page 39: Tutorial: The Zoltan Toolkit

Slide 39

Hypergraph Applications

Circuit Simulations

1

2

VsSOURCE_VOLTAGE

12

RsR

12 Cm012

C

12

Rg02R

1

2

Rg01R

12 C01

C

12 C02

C12

L2

INDUCTOR

12L1

INDUCTOR

12R1

R

12R2

R

1

2

RlR

1

2

Rg1R

12

Rg2R

12 C2

C

12 C1

C

12 Cm12

C

Linear programming for sensor placement

x bA

=

Linear solvers & preconditioners(no restrictions on matrix structure)

Finite Element Analysis

Multiphysics andmultiphase simulations

Data Mining

Page 40: Tutorial: The Zoltan Toolkit

Slide 40Hypergraph Partitioning:Advantages and Disadvantages

• Advantages:– Communication volume reduced 30-38% on average

over graph partitioning (Catalyurek & Aykanat).• 5-15% reduction for mesh-based applications.

– More accurate communication model than graph partitioning.

• Better representation of highly connected and/or non-homogeneous systems.

– Greater applicability than graph model.• Can represent rectangular systems and non-symmetric

dependencies.

• Disadvantages:– Usually more expensive than graph partitioning.

Page 41: Tutorial: The Zoltan Toolkit

Slide 41For hypergraph partitioning and repartitioning, use …

General Query Functions

ZOLTAN_NUM_OBJ_FN Number of items on processor

ZOLTAN_OBJ_LIST_FN List of item IDs and weights.

Geometric Query Functions

ZOLTAN_NUM_GEOM_FN Dimensionality of domain.

ZOLTAN_GEOM_FN Coordinates of items.

Hypergraph Query Functions

ZOLTAN_HG_SIZE_CS_FN Number of hyperedge pins.

ZOLTAN_HG_CS_FN List of hyperedge pins.

ZOLTAN_HG_SIZE_EDGE_WTS_FN Number of hyperedge weights.

ZOLTAN_HG_EDGE_WTS_FN List of hyperedge weights.

Graph Query Functions

ZOLTAN_NUM_EDGE_FN Number of graph edges.

ZOLTAN_EDGE_LIST_FN List of graph edges and weights.

Page 42: Tutorial: The Zoltan Toolkit

Slide 42Or can use graph queries to build hypergraph.

General Query Functions

ZOLTAN_NUM_OBJ_FN Number of items on processor

ZOLTAN_OBJ_LIST_FN List of item IDs and weights.

Geometric Query Functions

ZOLTAN_NUM_GEOM_FN Dimensionality of domain.

ZOLTAN_GEOM_FN Coordinates of items.

Hypergraph Query Functions

ZOLTAN_HG_SIZE_CS_FN Number of hyperedge pins.

ZOLTAN_HG_CS_FN List of hyperedge pins.

ZOLTAN_HG_SIZE_EDGE_WTS_FN Number of hyperedge weights.

ZOLTAN_HG_EDGE_WTS_FN List of hyperedge weights.

Graph Query Functions

ZOLTAN_NUM_EDGE_FN Number of graph edges.

ZOLTAN_EDGE_LIST_FN List of graph edges and weights.

Page 43: Tutorial: The Zoltan Toolkit

Slide 43

ComputationMemory

Multi-criteria Load-balancing

• Multiple constraints or objectives– Compute a single partition that is good

with respect to multiple factors.• Balance both computation and memory.

• Balance meshes in loosely coupled physics.

• Balance multi-phase simulations.

– Extend algorithms to multiple weights• Difficult. No guarantee good solution exists.

• Zoltan_Set_Param(zz, “OBJ_WEIGHT_DIM”, “2”);– Available in RCB, RIB and

ParMETIS graph partitioning.– In progress in Hypergraph

partitioning.

Page 44: Tutorial: The Zoltan Toolkit

Slide 44

Heterogeneous Architectures• Clusters may have different types of processors.• Assign “capacity” weights to processors.

– E.g., Compute power (speed).– Zoltan_LB_Set_Part_Sizes(…);

• Note: Can use this function to specify part sizes for any purpose.

• Balance with respect to processor capacity.

• Hierarchical partitioning: Allows different partitioners at different architecture levels.

– Zoltan_Set_Param(zz, “LB_METHOD”, “HIER”);– Requires three additional callbacks

to describe architecture hierarchy.• ZOLTAN_HIER_NUM_LEVELS_FN• ZOLTAN_HIER_PARTITION_FN• ZOLTAN_HIER_METHOD_FN

Entire System

...Processor Processor

Core Core...Core Core...

Page 45: Tutorial: The Zoltan Toolkit

Slide 45

Sparse Matrix Ordering problem

• When solving sparse linear systems with

direct methods, non-zero terms are created

during the factorization process (A→LLt ,

A→LDLt or A→LU).• Fill-in depends on the order of the unknowns.

– Need to provide fill-reducing orderings.

Page 46: Tutorial: The Zoltan Toolkit

Slide 46

Fill Reducing ordering

• Combinatorial problem, depending on only the

structure of the matrix A:– We can work on the graph associated with A.

• NP-Complete, thus we deal only with

heuristics.• Most popular heuristics:

– Minimum Degree algorithms (AMD, MMD, AMF …)

– Nested Dissection

Page 47: Tutorial: The Zoltan Toolkit

Slide 47

A

S

BA S B

Nested dissection (1)

• Principle [George 1973]– Find a vertex separator S in graph.

– Order vertices of S with highest available indices.

– Recursively apply the algorithm to the two separated subgraphs A and B.

Page 48: Tutorial: The Zoltan Toolkit

Slide 48

Nested dissection (2)

•Advantages:– Induces high quality block decompositions.

• Suitable for block BLAS 3 computations.

– Increases the concurrency of computations.

• Compared to minimum degree algorithms.• Very suitable for parallel factorization.

– It’s the scope here: parallel ordering is for parallel factorization.

Page 49: Tutorial: The Zoltan Toolkit

Slide 49

Matrix ordering within Zoltan

• Computed by third party libraries:– ParMETIS

– Scotch (more specifically PT-Scotch, the parallel part)

– Easy to add another one.

• The calls to the external ordering library are

transparent for the user, and thus Zoltan’s call

can be a standard way to compute ordering.

Page 50: Tutorial: The Zoltan Toolkit

Slide 50

Zoltan ordering architecture

Page 51: Tutorial: The Zoltan Toolkit

Slide 51

Ordering interface in Zoltan

•Compute ordering with one function:

Zoltan_Order•Output provided:

–New order of the unknowns (direct permutation), available in two forms:

• one is the new number in the interval [0,N-1]; • the other is the new order of Global IDs.

–Access to elimination tree, “block” view of the ordering.

Page 52: Tutorial: The Zoltan Toolkit

Slide 52

Experimental results (1)

•Metric is OPC, the operation count of Cholesky

factorization.•Largest matrix ordered by PT-Scotch: 83 millions of

unknowns on 256 processors (CEA/CESTA).•Some of our largest test graphs.

GraphSize (x1000) Average

OSS Description|V| |E| degree

audikw1 944 38354 81.28 5.48E+123D mechanics mesh,

Parasol

cage15 5154 47022 18.24 4.06E+16 DNA electrophoresis, UF

quimonda07 8613 29143 6.76 8.92E+10Circuit simulation,

Quimonda

23millions 23114 175686 7.6 1.29E+14 CEA/CESTA

Page 53: Tutorial: The Zoltan Toolkit

Slide 53

Experimental results (2)Test Number of processes

case 2 4 8 16 32 64

audikw1

OPTS 5.73E+12 5.65E+12 5.54E+12 5.45E+12 5.45E+12 5.45E+12

OPM 5.82E+12 6.37E+12 7.78E+12 8.88E+12 8.91E+12 1.07E+13

tPTS 73.11 53.19 45.19 33.83 24.74 18.16

tPM 32.69 23.09 17.15 9.80 5.65 3.82

Page 54: Tutorial: The Zoltan Toolkit

Slide 54

Experimental results (3)Test Number of processes

case 2 4 8 16 32 64

cage15

OPTS 4.58E+16 5.01E+16 4.64E+16 4.94E+16 4.58E+16 4.50E+16

OPM 4.47E+16 6.64E+16 † 7.36E+16 7.03E+16 6.64E+16

tPTS 540.46 427.38 371.70 340.78 351.38 380.69

tPM 195.93 117.77 † 40.30 22.56 17.83

Page 55: Tutorial: The Zoltan Toolkit

Slide 55

Experimental results (4)

•ParMETIS crashes for all other graphs.

Test Number of processes

case 2 4 8 16 32 64

quimonda07

OPTS - - 5.80E+10 6.38E+10 6.94E+10 7.70E+10

tPTS - - 34.68 22.23 17.30 16.62

23millions

OPTS 1.45E+14 2.91E+14 3.99E+14 2.71E+14 1.94E+14 2.45E+14

tPTS 671.60 416.45 295.38 211.68 147.35 103.73

Page 56: Tutorial: The Zoltan Toolkit

Slide 56

Summary of Matrix Ordering

• Zoltan provides access to efficient parallel

ordering for sparse matrices (especially with

Scotch).• Zoltan provides a standard way to call parallel

ordering.• Zoltan will provide also its own ordering tool

in the future, dealing with ordering for non-

symmetric problems.

Page 57: Tutorial: The Zoltan Toolkit

Slide 57

Zoltan Graph Coloring

• Parallel distance-1 and distance-2 graph coloring.• Graph built using same application interface and code

as graph partitioners.• Generic coloring interface; easy to add new coloring

algorithms.• Algorithms

– Distance-1 coloring: Bozdag, Gebremedhin, Manne, Boman, Catalyurek, EuroPar’05, JPDC’08.

– Distance-2 coloring: Bozdag, Catalyurek, Gebremedhin, Manne, Boman, Ozguner, HPCC’05, SISC’08 (in submission).

Page 58: Tutorial: The Zoltan Toolkit

Slide 58

Distance-1 Graph Coloring

• Problem (NP-hard) Color the vertices of a graph with as few colors as possible such that no two adjacent vertices receive the same color.

• Applications– Iterative solution of sparse linear systems– Preconditioners– Sparse tiling– Eigenvalue computation– Parallel graph partitioning

Page 59: Tutorial: The Zoltan Toolkit

Slide 59

Distance-2 Graph Coloring• Problem (NP-hard)

Color the vertices of a graph with as few colors as possible such that a pair of vertices connected by a path on two or less edges receives different colors.

• Applications– Derivative matrix computation in numerical optimization– Channel assignment– Facility location

• Related problems– Partial distance-2 coloring– Star coloring

Page 60: Tutorial: The Zoltan Toolkit

Slide 60

A Parallel Coloring Framework

• Color vertices iteratively in rounds using a

first fit strategy• Each round is broken into supersteps

– Color a certain number of vertices

– Exchange recent color information

• Detect conflicts at the end of each round• Repeat until all vertices receive consistent

colors

Page 61: Tutorial: The Zoltan Toolkit

Slide 61

Coloring Interface in Zoltan

• Both distance-1 and distance-2 coloring

routines can be invoked by Zoltan_Color

function.

• The colors assigned to the objects are

returned in an array.

Page 62: Tutorial: The Zoltan Toolkit

Slide 62

Experimental Results

Page 63: Tutorial: The Zoltan Toolkit

Slide 63

Other Zoltan Functionality

• Tools needed when doing dynamic load balancing:– Data Migration– Unstructured Communication Primitives– Distributed Data Directories

• All functionality described in Zoltan User’s Guide.– http://www.cs.sandia.gov/Zoltan/ug_html/ug.html

Page 64: Tutorial: The Zoltan Toolkit

Slide 64

Zoltan Data Migration Tools• After partition is computed, data must be moved to new

decomposition.– Depends strongly on application data structures.– Complicated communication patterns.

• Zoltan can help!– Application supplies query functions to pack/unpack data.– Zoltan does all communication to new processors.

Page 65: Tutorial: The Zoltan Toolkit

Slide 65Using Zoltan’sData Migration Tools

• Required migration query functions:– ZOLTAN_OBJ_SIZE_MULTI_FN:

• Returns size of data (in bytes) for each object to be exported to a new processor.

– ZOLTAN_PACK_MULTI_FN: • Remove data from application data structure on old processor;• Copy data to Zoltan communication buffer.

– ZOLTAN_UNPACK_MULTI_FN: • Copy data from Zoltan communication buffer into data structure on new

processor.

• int Zoltan_Migrate(struct Zoltan_Struct *zz, int num_import, ZOLTAN_ID_PTR import_global_ids, ZOLTAN_ID_PTR import_local_ids, int *import_procs, int *import_to_part, int num_export, ZOLTAN_ID_PTR export_global_ids, ZOLTAN_ID_PTR export_local_ids, int *export_procs, int *export_to_part);

Page 66: Tutorial: The Zoltan Toolkit

Slide 66

Graph-baseddecomposition

RCBdecomposition

Zoltan_Comm_Do

Zoltan_Comm_Do_Reverse

Zoltan Unstructured Communication Package

• Simple primitives for efficient irregular communication.– Zoltan_Comm_Create: Generates communication plan.

• Processors and amount of data to send and receive.

– Zoltan_Comm_Do: Send data using plan.• Can reuse plan. (Same plan, different data.)

– Zoltan_Comm_Do_Reverse: Inverse communication.

• Used for most communication in Zoltan.

Page 67: Tutorial: The Zoltan Toolkit

Slide 67Example Application: Crash Simulations

RCB

Graph-based

RCB

RCB mapped to time 0

1.6 ms

RCB

RCB mapped to time 0

3.2 ms

•Multiphase simulation: – Graph-based decomposition of elements for finite element calculation.

– Dynamic geometric decomposition of surfaces for contact detection.– Migration tools and Unstructured Communication package map

between decompositions.

Page 68: Tutorial: The Zoltan Toolkit

Slide 68

• Helps applications locate off-processor data.• Rendezvous algorithm (Pinar, 2001).

– Directory distributed in known way (hashing) across processors.

– Requests for object location sent to processor storing the object’s directory entry.

A B C

0 1 0

D E F

2 1 0

G H I

1 2 1

Processor 0 Processor 1 Processor 2

Directory Index Location

Zoltan Distributed Data Directory

A F

C

B

E

I

G H

D

Processor 0

Processor 1

Processor 2

Page 69: Tutorial: The Zoltan Toolkit

Slide 69

Alternate Interfaces to Zoltan• C++ and F90 interfaces in Zoltan.

• Isorropia package in Trilinos solver toolkit.– Epetra Matrix interface to Zoltan partitioning.

• B = Isorropia::Epetra::create_balanced_copy(A, params);

– Trilinos v9 will also include ordering and coloring interfaces in Isorropia.

– SciDAC TOPS-2 CET.

• ITAPS iMesh interface to Zoltan.– New iMeshP parallel mesh interface to be incorporated in

FY09.– SciDAC ITAPS CET.

Page 70: Tutorial: The Zoltan Toolkit

Slide 70

Current Work

• Two-dimensional matrix partitioning.• Performance improvements for hypergraph

partitioning.• Multi-criteria hypergraph partitioning.• Non-symmetric matrix ordering.

Page 71: Tutorial: The Zoltan Toolkit

Slide 71

For More Information...

• Zoltan Home Page– http://www.cs.sandia.gov/Zoltan

– User’s and Developer’s Guides

– Download Zoltan software under GNU LGPL.

• Email:– {kddevin,ccheval,egboman}@sandia.gov

[email protected]

Page 72: Tutorial: The Zoltan Toolkit

Slide 72

The End

Page 73: Tutorial: The Zoltan Toolkit

Slide 73

Configuring and Building Zoltan• Create and enter the Zoltan directory:

– gunzip zoltan_distrib_v3.0.tar.gz – tar xf zoltan_distrib_v3.0.tar– cd Zoltan

• Configure and make Zoltan library– Not autotooled; uses manual configuration file.– “make zoltan” attempts a generic build;

library libzoltan.a is in directory Obj_generic.– To customize your build:

• cd Utilities/Config; cp Config.linux Config.your_system• Edit Config.your_system• cd ../..• setenv ZOLTAN_ARCH your_system• make zoltan• Library libzoltan.a is in Obj_your_system

Page 74: Tutorial: The Zoltan Toolkit

Slide 74

Config fileDEFS =RANLIB = ranlibAR = ar r

CC = mpicc -WallCPPC = mpic++ INCLUDE_PATH =DBG_FLAGS = -gOPT_FLAGS = -OCFLAGS = $(DBG_FLAGS)

F90 = mpif90LOCAL_F90 = f90F90CFLAGS = -DFMANGLE=UNDERSCORE -DNO_MPI2FFLAGS =SPPR_HEAD = spprinc.mostF90_MODULE_PREFIX = -IFARG = farg_typical

MPI_LIB =MPI_LIBPATH =

PARMETIS_LIBPATH = -L/Users/kddevin/code/ParMETIS3_1PARMETIS_INCPATH = -I/Users/kddevin/code/ParMETIS3_1#PATOH_LIBPATH = -L/Users/kddevin/code/PaToH#PATOH_INCPATH = -I/Users/kddevin/code/PaToH

Page 75: Tutorial: The Zoltan Toolkit

Slide 75

Example Graph Callbacksvoid ZOLTAN_NUM_EDGES_MULTI_FN(void *data, int num_gid_entries, int num_lid_entries, int num_obj, ZOLTAN_ID_PTR global_id, ZOLTAN_ID_PTR local_id, int *num_edges, int *ierr);

Proc 0 Input from Zoltan: num_obj = 3 global_id = {A,C,B} local_id = {0,1,2}

Output from Application on Proc 0: num_edges = {2,4,3} (i.e., degrees of vertices A, C, B) ierr = ZOLTAN_OK

A

B C

D E

Proc 0

Proc 1

Page 76: Tutorial: The Zoltan Toolkit

Slide 76

Example Graph Callbacksvoid ZOLTAN_EDGE_LIST_MULTI_FN(void *data, int num_gid_entries, int num_lid_entries, int num_obj, ZOLTAN_ID_PTR global_ids, ZOLTAN_ID_PTR local_ids, int *num_edges, ZOLTAN_ID_PTR nbor_global_id, int *nbor_procs, int wdim, float *nbor_ewgts, int *ierr);

Proc 0 Input from Zoltan: num_obj = 3 global_ids = {A, C, B} local_ids = {0, 1, 2} num_edges = {2, 4, 3} wdim = 0 or EDGE_WEIGHT_DIM parameter value

Output from Application on Proc 0: nbor_global_id = {B, C, A, B, E, D, A, C, D} nbor_procs = {0, 0, 0, 0, 1, 1, 0, 0, 1} nbor_ewgts = if wdim then {7, 8, 8, 9, 1, 3, 7, 9, 5} ierr = ZOLTAN_OK

A

B C

D E

Proc 0

Proc 1

87

9

5 31

2

Page 77: Tutorial: The Zoltan Toolkit

Slide 77Example Hypergraph Callbacks

void ZOLTAN_HG_SIZE_CS_FN(void *data, int *num_lists, int *num_pins, int *format, int *ierr);

Output from Application on Proc 0: num_lists = 2 num_pins = 6 format = ZOLTAN_COMPRESSED_VERTEX (owned non-zeros per vertex) ierr = ZOLTAN_OK

OR

Output from Application on Proc 0: num_lists = 5 num_pins = 6 format = ZOLTAN_COMPRESSED_EDGE (owned non-zeros per edge) ierr = ZOLTAN_OK

Vertices

Proc 0 Proc 1

A B C D

a X X

b X X

c X X

d X X

e X X X

f X X X XH

yper

edg

es

Page 78: Tutorial: The Zoltan Toolkit

Slide 78Example Hypergraph Callbacks

void ZOLTAN_HG_CS_FN(void *data, int num_gid_entries, int nvtxedge, int npins, int format, ZOLTAN_ID_PTR vtxedge_GID, int *vtxedge_ptr, ZOLTAN_ID_PTR pin_GID, int *ierr);

Proc 0 Input from Zoltan: nvtxedge = 2 or 5 npins = 6 format = ZOLTAN_COMPRESSED_VERTEX or ZOLTAN_COMPRESSED_EDGE

Output from Application on Proc 0: if (format = ZOLTAN_COMPRESSED_VERTEX) vtxedge_GID = {A, B} vtxedge_ptr = {0, 3} pin_GID = {a, e, f, b, d, f} if (format = ZOLTAN_COMPRESSED_EDGE) vtxedge_GID = {a, b, d, e, f} vtxedge_ptr = {0, 1, 2, 3, 4} pin_GID = {A, B, B, A, A, B} ierr = ZOLTAN_OK

Vertices

Proc 0 Proc 1

A B C D

a X X

b X X

c X X

d X X

e X X X

f X X X XH

yper

edg

es

Page 79: Tutorial: The Zoltan Toolkit

Slide 79

Simple Example

• Zoltan/examples/C/zoltanSimple.c• Application data structure:

– int MyNumPts; • Number of points on processor.

– int *Gids; • array of Global ID numbers of points on processor.

– float *Pts; • Array of 3D coordinates of points on processor (in same

order as Gids array).

Page 80: Tutorial: The Zoltan Toolkit

Slide 80Example zoltanSimple.c: Initialization

/* Initialize MPI */ MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &me); MPI_Comm_size(MPI_COMM_WORLD, &nprocs);

/* ** Initialize application data. In this example, ** create a small test mesh and divide it across processors */

exSetDivisions(32); /* rectilinear mesh is div X div X div */

MyNumPts = exInitializePoints(&Pts, &Gids, me, nprocs);

/* Initialize Zoltan */ rc = Zoltan_Initialize(argc, argv, &ver);

if (rc != ZOLTAN_OK){ printf("sorry...\n"); free(Pts); free(Gids); exit(0); }

Page 81: Tutorial: The Zoltan Toolkit

Slide 81Example zoltanSimple.c: Prepare for Partitioning

/* Allocate and initialize memory for Zoltan structure */ zz = Zoltan_Create(MPI_COMM_WORLD);

/* Set general parameters */ Zoltan_Set_Param(zz, "DEBUG_LEVEL", "0"); Zoltan_Set_Param(zz, "LB_METHOD", "RCB"); Zoltan_Set_Param(zz, "NUM_GID_ENTRIES", "1"); Zoltan_Set_Param(zz, "NUM_LID_ENTRIES", "1"); Zoltan_Set_Param(zz, "RETURN_LISTS", "ALL");

/* Set RCB parameters */ Zoltan_Set_Param(zz, "KEEP_CUTS", "1"); Zoltan_Set_Param(zz, "RCB_OUTPUT_LEVEL", "0"); Zoltan_Set_Param(zz, "RCB_RECTILINEAR_BLOCKS", "1");

/* Register call-back query functions. */ Zoltan_Set_Num_Obj_Fn(zz, exGetNumberOfAssignedObjects, NULL); Zoltan_Set_Obj_List_Fn(zz, exGetObjectList, NULL); Zoltan_Set_Num_Geom_Fn(zz, exGetObjectSize, NULL); Zoltan_Set_Geom_Multi_Fn(zz, exGetObject, NULL);

Page 82: Tutorial: The Zoltan Toolkit

Slide 82Example zoltanSimple.c: Partitioning

Zoltan computes the difference (Δ) from current distributionChoose between:a) Import lists (data to import from other procs)b) Export lists (data to export to other procs)c) Both (the default)

/* Perform partitioning */ rc = Zoltan_LB_Partition(zz,

&changes, /* Flag indicating whether partition changed */ &numGidEntries, &numLidEntries,

&numImport, /* objects to be imported to new part */ &importGlobalGids, &importLocalGids, &importProcs, &importToPart,

&numExport, /* # objects to be exported from old part */ &exportGlobalGids, &exportLocalGids, &exportProcs, &exportToPart);

Page 83: Tutorial: The Zoltan Toolkit

Slide 83Example zoltanSimple.c: Use the Partition

/* Process partitioning results; ** in this case, print information; ** in a "real" application, migrate data here. */ if (!rc){ exPrintGlobalResult("Recursive Coordinate Bisection", nprocs, me, MyNumPts, numImport, numExport, changes); } else{ free(Pts); free(Gids); Zoltan_Destroy(&zz); MPI_Finalize(); exit(0); }

Page 84: Tutorial: The Zoltan Toolkit

Slide 84Example zoltanSimple.c: Cleanup

/* Free Zoltan memory allocated by Zoltan_LB_Partition. */ Zoltan_LB_Free_Part(&importGlobalGids, &importLocalGids, &importProcs, &importToPart); Zoltan_LB_Free_Part(&exportGlobalGids, &exportLocalGids, &exportProcs, &exportToPart);

/* Free Zoltan memory allocated by Zoltan_Create. */ Zoltan_Destroy(&zz);

/* Free Application memory */ free(Pts); free(Gids);

/********************** ** all done *********** **********************/

MPI_Finalize();

Page 85: Tutorial: The Zoltan Toolkit

Slide 85

Performance Results

• Experiments on Sandia’s Thunderbird cluster.– Dual 3.6 GHz Intel EM64T processors with 6 GB RAM.

– Infiniband network.

• Compare RCB, HSFC, graph and hypergraph

methods.• Measure …

– Amount of communication induced by the partition.

– Partitioning time.

Page 86: Tutorial: The Zoltan Toolkit

Slide 86

Test Data

SLAC *LCLS Radio Frequency Gun

6.0M x 6.0M23.4M nonzeros

Xyce 680K ASIC StrippedCircuit Simulation

680K x 680K2.3M nonzeros

Cage15 DNAElectrophoresis

5.1M x 5.1M99M nonzeros

SLAC Linear Accelerator2.9M x 2.9M

11.4M nonzeros

Page 87: Tutorial: The Zoltan Toolkit

Slide 87

0.0E+00

2.0E+05

4.0E+05

6.0E+05

8.0E+05

1.0E+06

1.2E+06

1.4E+06

1.6E+06

2 4 8 16 32 64 128 256 512 1024

Number of Processors

Communication Volume

0.0E+00

2.0E+05

4.0E+05

6.0E+05

8.0E+05

1.0E+06

1.2E+06

1.4E+06

2 4 8 16 32 64 128 256 512 1024

Number of Processors

Communication Volume

0.0E+00

2.0E+06

4.0E+06

6.0E+06

8.0E+06

1.0E+07

1.2E+07

1.4E+07

2 4 8 16 32 64 128 256 512 1024

Number of Processors

Communication Volume

0.0E+00

2.0E+04

4.0E+04

6.0E+04

8.0E+04

1.0E+05

1.2E+05

2 4 8 16 32 64 128 256 512 1024

Number of Processors

Communication Volume

Communication Volume: Lower is Better

Cage15 5.1M electrophoresis

Xyce 680K circuitSLAC 6.0M LCLS

SLAC 2.9M Linear Accelerator

Number of parts = number of processors.

RCB

Graph

Hypergraph

HSFC

Page 88: Tutorial: The Zoltan Toolkit

Slide 88

0.1

1

10

100

1000

1 2 4 8 16 32 64 128 256 512 1024

Number of Processors

Partitioning time (secs)

0.01

0.1

1

10

100

1000

1 2 4 8 16 32 64 128 256 512 1024

Number of Processors

Partitioning time (secs)

NOTE: PARMETIS DID NOT SATISFY BALANCE CONSTRAINT

0.0E+00

2.0E+04

4.0E+04

6.0E+04

8.0E+04

1.0E+05

1.2E+05

1 2 4 8 16 32 64 128 256 512 1024

Communication Volume

1

10

100

1 2 4 8 16 32 64 128 256 512 1024

Number of Processors

Partitioning time (secs)

Partitioning Time: Lower is better

Cage15 5.1M electrophoresis

Xyce 680K circuitSLAC 6.0M LCLS

SLAC 2.9M Linear Accelerator

1024 parts.Varying numberof processors.

1

10

100

1000

10000

1 2 4 8 16 32 64 128 256 512 1024

Number of Processors

Partitioning time (secs)

RCB

Graph

Hypergraph

HSFC

Page 89: Tutorial: The Zoltan Toolkit

Slide 89

Repartitioning Experiments

• Experiments with 64 parts on 64 processors.• Dynamically adjust weights in data to simulate,

say, adaptive mesh refinement.• Repartition.• Measure repartitioning time and

total communication volume:

Data redistribution volume

+ Application communication volume

Total communication volume

Page 90: Tutorial: The Zoltan Toolkit

Slide 90

Repartitioning Results:Lower is Better

Xyce 680K circuitSLAC 6.0M LCLS

0.0E+00

5.0E+07

1.0E+08

1.5E+08

2.0E+08

2.5E+08

3.0E+08

HypergraphRepart

Graph Repart StaticHypergraph

Static Graph

Repartitioning Method

Total Communication Volume

0.0E+00

1.0E+00

2.0E+00

3.0E+00

4.0E+00

5.0E+00

6.0E+00

7.0E+00

8.0E+00

9.0E+00

HypergraphRepart

Graph Repart StaticHypergraph

Static Graph

Repartitioning Method

Repartitioning Time (secs)

RepartitioningTime (secs)

Data Redistribution

Volume

ApplicationCommunicationVolume

0.0E+00

5.0E+08

1.0E+09

1.5E+09

2.0E+09

2.5E+09

RCB HSFC HypergraphRepart

Graph Repart StaticHypergraph

Static Graph

Repartitioning Method

Total Communication Volume

1.0E-01

1.0E+00

1.0E+01

1.0E+02

RCB HSFC HypergraphRepart

Graph Repart StaticHypergraph

Static Graph

Repartitioning Method

Repartitioning Time (secs)