ma/cs 471 lecture 15, fall 2002 introduction to graph partitioning

MA/CS 471MA/CS 471

Lecture 15, Fall 2002

Introduction to Graph Partitioning

MA/CS 471 Fall 2002 2

Graph (or mesh) PartitioningGraph (or mesh) Partitioning We have so far implemented a finite element Poisson solver.

The implementation is serial and not suited to parallel computing immediately

We have started to make the algorithm more suitable by switching from the LU factorization approach to solving the linear system –> to a conjugate gradient, iterative, algorithm which does not have the same bottlenecks to parallel computation

MA/CS 471 Fall 2002 3

Next Step To ParallelismNext Step To Parallelism Now we have made sure that there are no intrinsically

serial computation steps in system solve we are free to divide up the work between processes.

We will proceed by deciding which finite-element triangle goes to which processor

MA/CS 471 Fall 2002 4

Mesh PartioningMesh Partioning So far, I have supplied files which include information

on which triangle goes to which processor These files were generated using pmetis http://www-users.cs.umn.edu/~karypis This is a serial routine, however Karypis has written

a parallel version which can be used as a library. The library is called parmetis…

http://www-users.cs.umn.edu/~karypis

MA/CS 471 Fall 2002 5

MA/CS 471 Fall 2002 6

MA/CS 471 Fall 2002 7

MA/CS 471 Fall 2002 8

MA/CS 471 Fall 2002 9

MA/CS 471 Fall 2002 10

Team Project ContinuedTeam Project Continued Now we are ready to progress towards making the

serial Poisson solver work in paralllel.

This task divides into a number of steps:

Conversion of umDriver, umMESH, umStartUp, umMatrix and umSolve

Adding a routine to read in a partition file (or call parMetis to obtain a partition vector)

MA/CS 471 Fall 2002 11

umDriver modificationumDriver modification This code should now initialize MPI

This code should call the umPartition routine

This should be modified to find the number of processors and local processor ID (stored in your struct/class..)

This code should finalize MPI

MA/CS 471 Fall 2002 12

umPartitionumPartition This code should read in a partition from file

The input should be the name of the partition file, the current process ID (rank) and the number of processes (size)

The output should be a list of elements belonging to this process

MA/CS 471 Fall 2002 13

umMESH ModificationsumMESH Modifications This routine should now be fed a partition file

determining which elements it should read in from the .neu input mesh file

You should replace the elmttoelmt part with a piece of code which goes through the .neu file and reads in which element/face lies on the boundary and use this to mark whether a node is known or unknown

Each process should send a list of its “known” vertices’ global numbers to each other process so all nodes can be correctly identified as lying on the boundary or not

MA/CS 471 Fall 2002 14

umStartUp modificationumStartUp modification Remains largely unchanged (depending on how you

read in umVertX,umVertY, elmttonode).

MA/CS 471 Fall 2002 15

umMatrix modificationumMatrix modification This routine should be modified so that instead of

creating the mat matrix it should be fed a vector vecand returns mat*vec

IT SHOULD NOT STORE THE GLOBAL MATRIX AT ALL!!

I strongly suggest creating a new routine (umMatrixOP) and comparing the output from this with using umMatrix to build and multiply some vector as debugging

MA/CS 471 Fall 2002 16

umSolve modificationumSolve modification The major biggy here is the replacement of umAinvB

with a call to your own conjugate gradient solver

Note – the rhs vector is filled up here with a global gather of the elemental contributions, so this will have to be modified due to the elements on other processes.

MA/CS 471 Fall 2002 17

umCG modificationumCG modification umCG is the routine which should take a rhs and

return an approximate solution using CG.

Each step of the CG algorithm needs to be analyzed to determine the process data dependency

For the matrix*vector steps a certain amount of data swap is required

For the dot products an allreduce is required.

Strongly suggest creating the exchange sequence before the iterations start.

MA/CS 471 Fall 2002 18

Work PartitionWork Partition Here’s the deal – there are approximately six unequal

chunks of work to be done. I suggest the following code split up

umDriver, umCG umPartition, umSolve umMESH, umStartUp umMatrixOP

However, you are free to choose.

Try to minimize the amount of data stored on multiple processes (but do not make the task too difficult, by not sharing anything)

MA/CS 471 Fall 2002 19

Discussion and Project Write-UpDiscussion and Project Write-Up This is a little tricky so now is the time to form a plan and to ask any

questions.

This will be due on Tuesday 22nd October

As usual I need a complete write up.

This should include parallel timings and speed up tests (I.e. for a fixed grid find wall clock time umCG for Nprocs =2,4,6,8,10,12,14,16 and compare in a graph)

Test the code to make sure it is giving the same results (up to convergence tolerance) as the serial code

Profile your code using upshot

Include pictures showing partition (use a different colour per partition) and parallel solution.

ma/cs 471 lecture 15, fall 2002 introduction to graph partitioning

Documents