introduction to petsc vigre seminar, wednesday, november 8, 2006

Post on 27-Dec-2015

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to PETSc

VIGRE Seminar, Wednesday, November 8, 2006

Parallel Computing

How (basically) does it work?

Parallel Computing

How (basically) does it work?• Assign each processor a number

Parallel Computing

How (basically) does it work?• Assign each processor a number

• The same program goes to all

Parallel Computing

How (basically) does it work?• Assign each processor a number

• The same program goes to all

• Each uses separate memory

Parallel Computing

How (basically) does it work?• Assign each processor a number

• The same program goes to all

• Each uses separate memory

• They pass information back and forth as necessary

Parallel Computing

Example 1: Matrix-Vector Product

Parallel Computing

Example 1: Matrix-Vector Product

and

are inputs

into the program.

0:

1:

2:

Parallel Computing

Example 1: Matrix-Vector Product

The control node (0) reads in the matrix and distributes the rows amongst the processors.

0: (a, b, c)

1: (d, e, f)

2: (g, h, i)

Parallel Computing

Example 1: Matrix-Vector Product

The control node also sends the vector to each processor’s memory.

0: (a, b, c) ; (j, k, l)

1: (d, e, f) ; (j, k, l)

2: (g, h, i) ; (j, k, l)

Parallel Computing

Example 1: Matrix-Vector Product

Each processor computes its own dot product.

0: (a, b, c) • (j, k, l) = aj+bk+cl

1: (d, e, f) • (j, k, l) = dj+ek+fl

2: (g, h, i) • (j, k, l) = gj+hk+il

Parallel Computing

Example 1: Matrix-Vector Product

The processors send their results to the control node, which outputs

.

0: (a, b, c) • (j, k, l) = aj+bk+cl

1: (d, e, f) • (j, k, l) = dj+ek+fl

2: (g, h, i) • (j, k, l) = gj+hk+il

Parallel Computing

Example 2: Matrix-Vector Product

Suppose for memory reasons each processor only has part of the vector.

0: (a, b, c) ; j

1: (d, e, f) ; k

2: (g, h, i) ; l

Parallel Computing

Example 2: Matrix-Vector Product

Before the multiply, each processor sends the necessary information elsewhere.

0: (a, b, c) ; j ; (k from 1) ;

(l from 2)

1: (d, e, f) ; (j from 0) ; k ;

(l from 2)

2: (g, h, i) ; (j from 0) ;

(k from 1) ; l

Parallel Computing

Example 2: Matrix-Vector Product

After the multiply, the space is freed again for other uses.

0: (a, b, c) ; j

1: (d, e, f) ; k

2: (g, h, i) ; l

Parallel Computing

Example 3: Matrix-Matrix Product

The previous case illustrates how to multiply matrices stored across multiple processors.

0: (a, b, c) ; (j, k, l)

1: (d, e, f) ; (m, n, o)

2: (g, h, i) ; (p, q, r)

Parallel Computing

Example 3: Matrix-Matrix Product

Each column is distributed for processing in turn.

1) (a, b, c)•(j, m, p)=α0: 2) (a, b, c)•(k, n, q)=β 3) (a, b, c)•(l, o, r)=γ

1) (d, e, f)•(j, m, p)=δ1: 2) (d, e, f)•(k, n, q)=ε 3) (d, e, f)•(l, o, r)=ζ

1) (g, h ,i)•(j, m, p)=η2: 2) (g, h, i)•(k, n, q)=θ 3) (g, h, i)•(l, o, r)=ι

Parallel Computing

Example 3: Matrix-Matrix Product

The result is a matrix with the same parallel row structure as the first matrix and column structure as the right.

0: (α, β, γ)

1: (δ, ε, ζ)

2: (η, θ, ι)

Parallel Computing

Example 3: Matrix-Matrix Product

The original indices could also have been sub-matrices, as long as they were compatible.

0: (α, β, γ)

1: (δ, ε, ζ)

2: (η, θ, ι)

Parallel Computing

Example 4: Block Diagonal Product

Suppose the second matrix is block diagonal.

0: (A, B, C) ; (J, 0, 0)

1: (D, E, F) ; (0, K, 0)

2: (G, H, I) ; (0, 0, L)

Parallel Computing

Example 4: Block Diagonal Product

Much less information needs to be passed between the processors.

1) AJ=α0: 2) BK=β 3) CL=γ

1) DJ=δ1: 2) EK=ε 3) FL=ζ

1) GJ=η2: 2) HK=θ 3) IL=ι

Parallel Computing

When is it worth it to parallelize?

Parallel Computing

When is it worth it to parallelize?• There is a time cost associated with

passing messages

Parallel Computing

When is it worth it to parallelize?• There is a time cost associated with

passing messages

• The amount of message passing is dependent on the problem and the program (algorithm)

Parallel Computing

When is it worth it to parallelize?• Therefore, the benefits depend more

on the structure of the problem and the program than on the size/speed of the parallel network (diminishing returns).

Parallel Networks

How do I use multiple processors?

Parallel Networks

How do I use multiple processors?

• This depends on the network, but…

• Most networks use some variation of PBS, a job scheduler, and mpirun or mpiexec.

Parallel Networks

How do I use multiple processors?

• This depends on the network, but…

• Most networks use some variation of PBS, a job scheduler, and mpirun or mpiexec.

• A parallel program needs to be submitted as a batch job.

Parallel Networks

• Suppose I have a program myprog, which gets data from data.dat, which I call in the following fashion when only using one processor:

./myprog –f data.datI would write a file myprog.pbs that looks like the following:

Parallel Networks

#PBS –q compute (name of the processing queue [not necessary on all networks])

#PBS -N myprog (the name of the job)#PBS –l nodes=2:ppn=1,walltime=00:10:00 (number of nodes and number of processes per node,

maximum time to allow the program to run)#PBS -o /home/me/mydir/myprog.out (where the

output of the program should be written)#PBS -e /home/me/mydir/myprog.err (where the

error stream should be written)These are the headers that tell the job scheduler how to handle your job.

Parallel Networks

Although what follows depends on the MPI software that the network runs, it should look something like this:

cd $PBS_O_WORKDIR (makes the processors run the program in the directory where myprog.pbs is saved)

mpirun –machinefile $PBS_NODEFILE –np 2 myprog –f mydata.dat (tells the MPI software

which processes to use and how many processes to start: notice that command line arguments follows as usual)

Parallel Networks

• Once the .pbs file is written, it can be submitted to the job scheduler with qsub:

qsub myprog.pbs

Parallel Networks

• Once the .pbs file is written, it can be submitted to the job scheduler with qsub:

qsub myprog.pbs• You can check to see if your job is

running with the command qstat.

Parallel Networks

• Some systems (but not all) will allow you to simulate running your program in parallel on one processor, which is useful for debugging:

mpirun –np 3 myprog –f mydata.dat

Parallel Networks

What parallel systems are available?

Parallel Networks

What parallel systems are available?

• RTC : Rice Terascale Cluster: 244 processors.

Parallel Networks

What parallel systems are available?

• RTC : Rice Terascale Cluster: 244 processors.

• ADA : Cray XD1: 632 processors.

Parallel Networks

What parallel systems are available?

• RTC : Rice Terascale Cluster: 244 processors.

• ADA : Cray XD1: 632 processors.

• caamster: CAAM department exclusive: 8(?) processors.

PETSc

What do I use PETSc for?

PETSc

What do I use PETSc for?

• File I/O with “minimal” understanding of MPI

PETSc

What do I use PETSc for?

• File I/O with “minimal” understanding of MPI

• Vector and matrix based data management (in particular: sparse)

PETSc

What do I use PETSc for?

• File I/O with “minimal” understanding of MPI

• Vector and matrix based data management (in particular: sparse)

• Linear algebra routines familiar from the famous serial packages

PETSc

• At the moment, ada and caamster (and harvey) have PETSc installed

PETSc

• At the moment, ada and caamster (and harvey) have PETSc installed

• You can download and install PETSc on your own machine (requires cygwin for Windows), for educational and debugging purposes

PETSc

• PETSc builds on existing software BLAS and LAPACK: which implementations to use can be specified at configuration

PETSc

• PETSc builds on existing software BLAS and LAPACK: which implementations to use can be specified at configuration

• Has (slower) debugging configuration and (faster, tacit) optimized configuration

PETSc

• Installation comes with documentation, examples, and manual pages.

PETSc

• Installation comes with documentation, examples, and manual pages.

• The biggest part of learning how to use PETSc is learning how to use the manual pages.

PETSc

• It is extremely useful to have an environmental variable PETSC_DIR in you shell of choice, which gives the path to the installation of PETSc, e.g.

PETSC_DIR=/usr/local/src/petsc-2.3.1-p13/export PETSC_DIR

PETSc

Makefile

PETSc

Makefile

• You can pretty much copy/paste/modify the makefiles in the examples, but here is the basic setup:

PETSc

Makefile

(…) (Other definitions for CFLAGS, etc.)LOCDIR = ~/mydir

include ${PETSC_DIR}/bmake/common/base(This is why it is useful to have this variable saved)myprog: myprog.o chkopts -${CLINKER} -o myprog myprog.o

${PETSC_LIB} ${RM} myprog.o

PETSc

Headers

PETSc

Headers

•#include “petsc.h” in all files, unless the routines that you use need more specific headers.

PETSc

Headers

•#include “petsc.h” in all files, unless the routines that you use need more specific headers.

• How do you know? Consult the manual pages!

PETSc

Data Types

PETSc

Data Types

• PETSc has a slew of its own data types: PetscInt, PetscReal, PetscScalar, etc.

PETSc

Data Types

• PETSc has a slew of its own data types: PetscInt, PetscReal, PetscScalar, etc.

• Usually aliases of normal data types: PetscInt ~ int, PetscReal ~ double

PETSc

Data Types

• PETSc has a slew of its own data types: PetscInt, PetscReal, PetscScalar, etc.

• Usually aliases of normal data types: PetscInt ~ int, PetscReal ~ double

• Safer to use for compatibility

PETSc

Usage in C/C++

PETSc

Usage in C/C++

• The top program should begin:

Static char[] help=“Your message here.”

int main(int argc,char **argv){

(… declarations)

PetscInitialize(&argc,&argv,PETSC_NULL,help)

PETSc

Usage in C/C++

• The top program should end:

(…)

PetscFinalize();

return(0);

}

PETSc

Usage in C/C++

• When first programming, include the following variable:

PetscErrorCode ierr;

• Where you’d call a PETSc routine,Routine(arg);

write insteadierr=Routing(arg);CHKERRQ(ierr);

PETSc

Usage in C/C++

• When you try to run your program, you will be informed of any problems with incompatible data types/dimensions/etc.

PETSc

Data

• Anything data type larger than a scalar has a Create and a Destroy routine.

PETSc

Data

• Anything data type larger than a scalar has a Create and a Destroy routine.

• If you run ./myprog –log_summary, you get # created and # destroyed for each data type, to find memory leaks.

PETSc

Example: Vec

PETSc

Example: Vec

• Two types: global and local

PETSc

Example: Vec

• Two types: global and local

• Dependent on function: do other processors need to see this data?

PETSc

Example: Vec

• Two types: global and local

• Dependent on function: do other processors need to see this data?

• Basic usage:Vec X;VecCreate([PETSC_COMM_WORLD/PETSC_COMM_SELF],&X);

PETSc

Example: Vec

• Advanced usage:

PETSc

Example: Vec

• Advanced usage:VecCreateSeq(PETSC_COMM_SELF,n,&X);

PETSc

Example: Vec

• Advanced usage:VecCreateSeq(PETSC_COMM_SELF,n,&X);VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,&X);

PETSc

Example: Vec

• Advanced usage:VecCreateSeq(PETSC_COMM_SELF,n,&X);VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,&X);VecLoad(instream,VECSEQ,&X);

PETSc

Example: Vec

• Advanced usage:VecCreateSeq(PETSC_COMM_SELF,n,&X);VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,&X);VecLoad(instream,VECSEQ,&X);VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,&X);

PETSc

Example: Vec

• Advanced usage:VecCreateSeq(PETSC_COMM_SELF,n,&X);VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,&X);VecLoad(instream,VECSEQ,&X);VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,&X);VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,&X);

PETSc

Example: Vec

• Advanced usage:VecCreateSeq(PETSC_COMM_SELF,n,&X);VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,&X);VecLoad(instream,VECSEQ,&X);VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,&X);VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,&X);VecCreateMPIWithArray(PETSC_COMM_WORLD,n,N,vals,&X);

PETSc

Example: Vec

• Advanced usage:VecCreateSeq(PETSC_COMM_SELF,n,&X);VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,&X);VecLoad(instream,VECSEQ,&X);VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,&X);VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,&X);VecCreateMPIWithArray(PETSC_COMM_WORLD,n,N,vals,&X);VecLoad(instream,VECMPI,&X);

PETSc

Example: Vec

• Advanced usage:VecCreateSeq(PETSC_COMM_SELF,n,&X);VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,&X);VecLoad(instream,VECSEQ,&X);VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,&X);VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,&X);VecCreateMPIWithArray(PETSC_COMM_WORLD,n,N,vals,&X);VecLoad(instream,VECMPI,&X);VecDuplicate(Y,&X);

PETSc

Example: Vec

• Advanced usage:VecCreateSeq(PETSC_COMM_SELF,n,&X);VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,&X);VecLoad(instream,VECSEQ,&X);VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,&X);VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,&X);VecCreateMPIWithArray(PETSC_COMM_WORLD,n,N,vals,&X);VecLoad(instream,VECMPI,&X);VecDuplicate(Y,&X);MatGetVecs(M,&X,PETSC_NULL);

PETSc

Example: Vec

• Advanced usage:VecCreateSeq(PETSC_COMM_SELF,n,&X);VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,&X);VecLoad(instream,VECSEQ,&X);VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,&X);VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,&X);VecCreateMPIWithArray(PETSC_COMM_WORLD,n,N,vals,&X);VecLoad(instream,VECMPI,&X);VecDuplicate(Y,&X);MatGetVecs(M,&X,PETSC_NULL);MatGetVecs(M,PETSC_NULL,&X);

PETSc

Example: Vec

• If not created with array or loaded from file, values still needed:

PETSc

Example: Vec

• If not created with array or loaded from file, values still needed

• To copy the values of another Vec, with the same parallel structure, use VecCopy(Y,X).

PETSc

Example: Vec

• If not created with array or loaded from file, values still needed

• To copy the values of another Vec, with the same parallel structure, use VecCopy(Y,X).

• To set all values to a single scalar value, use VecSet(X,alpha).

PETSc

Example: Vec

• There are routines for more complicated ways to set values

PETSc

Example: Vec

• There are other routines for more complicated ways to set values

• PETSc guards the block of data where the actual values are stored very closely

PETSc

Example: Vec

• There are other routines for more complicated ways to set values

• PETSc guards the block of data where the actual values are stored very closely

• An assembly routine must be called after these other routines

PETSc

Example: Vec

• Other routines:

PETSc

Example: Vec

• Other routines:VecSetValue

PETSc

Example: Vec

• Other routines:VecSetValueVecSetValueLocal (different indexing used)

PETSc

Example: Vec

• Other routines:VecSetValueVecSetValueLocal (different indexing used)

VecSetValues

PETSc

Example: Vec

• Other routines:VecSetValueVecSetValueLocal (different indexing used)

VecSetValuesVecSetValuesLocal

PETSc

Example: Vec

• Other routines:VecSetValueVecSetValueLocal (different indexing used)

VecSetValuesVecSetValuesLocalVecSetValuesBlocked

PETSc

Example: Vec

• Other routines:VecSetValueVecSetValueLocal (different indexing used)

VecSetValuesVecSetValuesLocalVecSetValuesBlockedVecSetValuesBlockedLocal

PETSc

Example: Vec

• Once a vector is assembled, there are routines for (almost) every function we could want from a vector: AXPY, dot product, absolute value, pointwise multiplication, etc.

PETSc

Example: Vec

• Once a vector is assembled, there are routines for (almost) every function we could want from a vector: AXPY, dot product, absolute value, pointwise multiplication, etc.

• Call VecDestroy(X) to free its array when it isn’t needed anymore.

PETSc

Example: Mat

PETSc

Example: Mat

• Like Vec, a Mat can be global or local (MPI/Seq)

PETSc

Example: Mat

• Like Vec, a Mat can be global or local (MPI/Seq)

• A Mat can take on a large number of data structures to optimize * and \, even though the same routine is used on all structures.

PETSc

Example: Mat

• Row compressed

• Block row compressed

• Symmetric block row compressed

• Block diagonal

• And even dense

PETSc

File I/O

PETSc

File I/O

• The equivalent to a stream is a viewer.

PETSc

File I/O

• PETSc has equivalent routines to printf, but you must decide if you want every node to print or just the control node

PETSc

File I/O

• PETSc has equivalent routines to printf, but you must decide if you want every node to print or just the control node

• To ensure clarity when multiple nodes print, use PetscSynchronizedPrintf followed by PetscSynchronizedFlush.

PETSc

File I/O

• The equivalent to a stream is a “viewer”, but a viewer organizes data across multiple processors.

PETSc

File I/O

• The equivalent to a stream is a “viewer”, but a viewer organizes data across multiple processors.

• A viewer combines an output location (file/stdout/stderr), with a format.

PETSc

File I/O

• The equivalent to a stream is a “viewer”, but a viewer organizes data across multiple processors.

• A viewer combines an output location (file/stdout/stderr), with a format.

• Most data types have a View routine such as MatView(M,viewer)

PETSc

File I/O

• On a batch server, ASCII I/O can be horrendously slow.

PETSc

File I/O

• On a batch server, ASCII I/O can be horrendously slow.

• PETSc only reads into a parallel format data which is stored in binary form.

PETSc

File I/O

• On a batch server, ASCII I/O can be horrendously slow.

• PETSc only reads into a parallel format data which is stored in binary form.

• Lots of output data is likely: binary is more compressed than ASCII.

PETSc

I have ASCII input data: solution?

PETSc

I have ASCII input data: solution?

• Write a wrapper program

• Runs on one processor

• Creates the data to be used in parallel, and “views” it to a binary input file

• In parallel, it will be automatically distributed

PETSc

Next Time, Issues for Large Dynamical Systems:

• Time Stepping

• Updating algebraically

• Managing lots of similar equations (Scattering/Gathering)

top related