exercises 7: sparse linear algebra via petsc

Post on 02-Feb-2022

9 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Exercises 7:Sparse Linear Algebra via PETSc

November 14, 2019

Adapted from PRACE PETSc tutorials: http://www.training.prace-ri.eu/tutorials/index.html

Reminders

• Homework 2 due today

• Homework 3 posted; due November 28, 2019 17:20

• No lecture or exercises next week :)

2

Today• PETSc tutorial

• Run code examples

• Adapted from tutorial by Václav Hapla, IT4Innovations & Department of Applied Mathematics, VSB - Technical University of Ostrava

3

Libraries implementing solvers for Sparse LA• Trilinos (Sandia National Lab), https://trilinos.github.io/

• PETSc, Portable, Extensible Toolkit forScientific Computation (Argonne National Lab), https://www.mcs.anl.gov/petsc/

• many numerical algorithms (LU, CG, SPMV, …) implemented, tested and ready to use!

4

PETSc

• building blocks (data structures and routines) for the scalable parallel solution of scientific applications

• allows thinking in terms of high-level objects (matrices) instead of low-level objects (raw arrays)

• coded primarily in C language but good FORTRAN support, can also be called from C++, Python and Java codes

• highly portable

• source code and mailing lists open to anybody

5

Role of PETSc

"Developing parallel, nontrivial PDE solvers that deliver high performance is still difficult and requires months (or even years) of concentrated effort. PETSc is a toolkit that can ease these difficulties and reduce the development time, but it is not a black-box PDE solver, nor a silver bullet."

Barry Smith (PETSc founder)

6

PETSc components

7

Parallelism in PETSc• PETSc is parallelized mostly using MPI

• MPI provides low-level routines to exchange data primitives between processes

• PETSc provides mid-level routines such as

• insert matrix element to arbitrary location

• parallel matrix-vector product

• you can call MPI directly if needed

• same code for sequential and parallel runs

• support for hybrid MPI + {pthreads, OpenMP, CUDA} parallelism

8

PETSc Interfaces• Dense lin. algebra: BLAS, LAPACK, Elemental

• Sparse direct lin. sys. solvers: MUMPS, SuperLU, SuperLU_Dist, PaStiX, UMFPACK, LUSOL

• Iterative solvers / multigrid / preconditioners: HYPRE, Trilinos ML, SPAI

• Graph partitioning: ParMetis, Scotch, Party, Chaco

• FFT: FFTW

• ODE: Sundials

• Data exchange: HDF5

• Mathematics packages: MATLAB, Mathematica

9

PETSc Interfaced By• TAO - Toolkit for Advanced Optimization

• SLEPc - Scalable Library for Eigenvalue Problems

• fluidity - a finite element/volume fluids code

• OpenFVM - finite volume based CFD solver

• OOFEM - object oriented finite element library

• libMesh - adaptive finite element library

• MOOSE - Multiphysics Object-Oriented Simulation Environment

• DEAL.II - sophisticated C++ based finite element simulation package

• PHMAL - The Parallel Hierarchical Adaptive MultiLevel Project

• Chaste - Cancer, Heart and Soft Tissue Environment

• PetIGA - A framework for high performance Isogeometric Analysis

10

Datatypes• PETSc provides its own primitive data types

PetscInt n = 20;

PetscScalar v = -3.5, w = 3.1e9;

PetscReal x = 2.55, y = 1e-9;

• It is better to use them instead of built-in C types

• better portability

• easy switching between real and complex scalars

• easy switching between 32-bit and 64-bit numbers

11

PETSc Communicators

• communicator = an opaque object of MPI_Comm type that defines process group and synchronization channel

• PETSc built-in communicators:

• PETSC_COMM_SELF – just this process – for serial objects –

• PETSC_COMM_WORLD – all processes – for parallel objects

• you can define your own communicators

12

Error Handling• PETSc is written in C

• C has no support for C++ exceptions

• instead of throwing exception, every routine returns integer error code (PetscErrorCode type)

• error code is catched by CHKERRQ macro:

PetscErrorCode ierr;

ierr = SomePetscRoutine();CHKERRQ(ierr);

13

Command-line options

• on the command-line

./program -myint 10 -myreal 1e3

• in the program:

PetscInt myint;

PetscReal myreal;

PetscOptionsGetInt(PETSC_NULL,PETSC_NULL,"myint",&myint,PETSC_NULL);

PetscOptionsGetReal(PETSC_NULL,PETSC_NULL,"-myreal",&myreal,PETSC_NULL);

// now myint=10, myreal=1e3

14

Examples for Today• Get ex7.tar from Moodle; move to cluster

• Unpack tar file

• On cluster, need to load libraries needed to use PETSc. Type:

module load fenics/2019.1.0-ompi-petsc3.11-gcc8.3

15

HelloWorld Example• Open petsc_1_ex.c

• To make, type make (or make ex1)

• To run: (e.g.)

srun -n 2 ./ex1 -myint 5

16

PETSc Inheritance• PETSc uses 3-level inheritance

• every object in PETSc is an instance of a class: Vec, Mat, PC, KSP, SNES, …

• all classes inherit from PetscObject

• functions called on objects (methods) are prefixed with a class name:

MatMult(Mat,…)

• a new object is created with a class-specific Create function (constructor):

Mat A;

MatCreate(comm, &A);

• every class is further refined to types specified with

SetType MatSetType(A, MATSEQAIJ);

17

PETSc Inheritance

18

Vectors in PETScVec v; PetscInt m=2, M=8;

VecType type=VECMPI;

MPI_Comm comm=PETSC_COMM_WORLD;

• create: VecCreate(comm, &v);

• layout: VecSetSizes(v,m,M);

• type: VecSetType(v,type);

• options: VecSetFromOptions(v);

• dealloc: VecDestroy(&v);

19

Vectors in PETScVec v; PetscInt m=2, M=8;

VecType type=VECMPI;

MPI_Comm comm=PETSC_COMM_WORLD;

• create: VecCreate(comm, &v);

• layout: VecSetSizes(v,m,M);

• type: VecSetType(v,type);

• options: VecSetFromOptions(v);

• dealloc: VecDestroy(&v);

20

//VECSEQ, VECSTANDARD

//PETSC_COMM_SELF

//VecSetSizes(v,M,M);

Sequential variant:

Vectors in PETScVec v; PetscInt m=2, M=8;

VecType type=VECMPI;

MPI_Comm comm=PETSC_COMM_WORLD;

• create: VecCreate(comm, &v);

• layout: VecSetSizes(v,m,M);

• type: VecSetType(v,type);

21

//VECSEQ, VECSTANDARD

//PETSC_COMM_SELF

//VecSetSizes(v,M,M);

Sequential variant:

• all in one: VecCreateMPI(comm,m,M,&v);

• options: VecSetFromOptions(v);

• dealloc: VecDestroy(&v);

//VecCreateSeq(comm,M,&v);

Parallel layout

• consider the vector v with local size m, global size M, distributed across 3 processes

• call VecSetSizes(v,m,M) to set the sizes

22

Parallel layout• set either m or M to PETSC_DECIDE, i.e. let PETSc use the standard

layout

• get this standard layout across comm: PetscSplitOwnership(comm,&m,&M)

23

Query Layout• local and global sizes: VecGetLocalSize(v,&m) and VecGetSize(v,&M)

• global indices of the first and last elements of the local portion: VecGetOwnershipRange(v,&lo,&hi)

24

Vector assembly

Vec x;

• Set all entries to constant value: VecSet(x, 1.0);

• Set all entries to 0: VecZeroEntries(x);

• Set an individual element (global indexing !):

PetscInt i = 10;

PetscReal v = 3.14;

VecSetValue(x, i, v, INSERT_VALUES);

// or

VecSetValues(x, 1, &i, &v, INSERT_VALUES);

25

Vector assembly

• Set multiple entries at once:

PetscInt ii[]={1, 2}; PetscReal vv[]={2.7, 3.1);

VecSetValues(x, 2, ii, vv, INSERT_VALUES);

• The last argument can be

• INSERT_VALUES replace original value (=)

• ADD_VALUES add the new values to the original values (+=)

26

Vector assembly

• VecSetValues is purely local with no inter-process communication

• values are just locally cached

• before using the vector, call assembly function pair to exchange values between processors:

VecAssemblyBegin(x);

VecAssemblyEnd(x);

27

Vector assembly

• VecSetValues is purely local with no inter-process communication

• values are just locally cached

• before using the vector, call assembly function pair to exchange values between processors:

VecAssemblyBegin(x);

//optional calculations not involving x

VecAssemblyEnd(x);

• computations can be done while MPI messages are in transition

• this allows overlapping communication and computation

28

Getting values

Vec x;

• get a copy of 2 local entries of x with global indices ix to an array y:

PetscInt ix[]={10, 20};

PetscScalar v[2];

VecGetValues(x, 2, ix, v)

• get the pointer to the whole local internal array:

PetscScalar *a; VecGetArray(x, &a);

// read and/or modify the array a

VecRestoreArray(x, &a);

29

Duplicate and copy

• Create another Vec with the same type & layout:

Vec v, w;

VecDuplicate(v, &w);

• Copy the entries from v to w:

VecCopy(v, w);

30

Example 2• Open petsc_2_ex.c

31

Matrices in PETScMat A; PetscInt m=2, n=3, M=8, N=12;

MatType type=MATMPIAIJ;

MPI_Comm comm=PETSC_COMM_WORLD;

• create: MatCreate(comm, &A);

• layout: MatSetSizes(A,m,n,M,N);

• type: MatSetType(A,type);

• prealloc: MatMPIAIJSetPreallocation(A,5,PETSC_NULL,5,PETSC_NULL);

• all in one: MatCreateMPIAIJ(comm,m,n,M,N,5,PETSC_NULL,5,PETSC_NULL,&A);

• options: MatSetFromOptions(A);

• dealloc: MatDestroy(&A);

32

// MATSEQAIJ,MATAIJ

// PETSC_COMM_SELF

Sequential variants:

//MatSetSizes(v,M,N,M,N)

// MatSeqAIJSetPreallocation(A,5,PETSC_NULL);

Matrices in PETScMat A; PetscInt m=2, n=3, M=8, N=12;

MatType type=MATMPIAIJ;

MPI_Comm comm=PETSC_COMM_WORLD;

• create: MatCreate(comm, &A);

• layout: MatSetSizes(A,m,n,M,N);

• type: MatSetType(A,type);

• prealloc: MatMPIAIJSetPreallocation(A,5,PETSC_NULL,5,PETSC_NULL);

• all in one: MatCreateMPIAIJ(comm,m,n,M,N,5,PETSC_NULL,5,PETSC_NULL,&A);

• options: MatSetFromOptions(A);

• dealloc: MatDestroy(&A);

33

// MATSEQAIJ,MATAIJ

// PETSC_COMM_SELF

Sequential variants:

//MatSetSizes(v,M,N,M,N)

// MatSeqAIJSetPreallocation(A,5,PETSC_NULL);

Basic Matrix Types

• MATAIJ, MATSEQAIJ, MATMPIAIJ

• basic sparse format, known as compressed row format, CRS, Yale

• MATAIJ means MATSEQAIJ with a single process communicator, and MATMPIAIJ otherwise.

• MATBAIJ, MATSEQBAIJ, MATMPIBAIJ

• extensions of the AIJ formats described above

• store matrix elements by fixed-sized dense blocks

• intended especially for use with multiclass PDEs

• multiple DOFs per mesh node

• MATDENSE, MATSEQDENSE, MATMPIDENSE

• dense matrices

34

Parallel Layout Compatibility

35

Matrix AssemblyMat A;

• Set all (allocated) entries to 0:

MatZeroEntries(A);

• Set an individual element (global indexing !):

PetscInt i = 1, j = 2; PetscReal v = 3.14;

MatSetValue(A, i, j, v, INSERT_VALUES);

// or

MatSetValues(A, 1, &i, 1, &i, &v, INSERT_VALUES);

36

Matrix Assembly• Set multiple entries at once:

PetscInt ii[2]={1, 2}, jj[2]={11, 12};

PetscReal vv[4]={1.3, 2.7, 3.1, 4.5};

MatSetValues(A, 2, ii, 2, jj, vv, INSERT_VALUES);

• The last argument can be

• INSERT_VALUES replace original value (=)

• ADD_VALUES add the new values to the original values (+=)

37

Matrix Assembly

• MatSetValues is purely local with no inter-process communication

• values are just locally cached

• before using the vector, call assembly function pair to exchange values between processors:

MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);

MatAssemblyEnd( A,MAT_FINAL_ASSEMBLY);

38

Matrix Assembly

• MatSetValues is purely local with no inter-process communication

• values are just locally cached

• before using the vector, call assembly function pair to exchange values between processors:

MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);

//optional calculations not involving A

MatAssemblyEnd( A,MAT_FINAL_ASSEMBLY);

• computations can be done while MPI messages are in transition

• this allows overlapping communication and computation

39

Matrix Assembly

• cannot mix inserting and adding values

• need to do assembly between

MatSetValues(A, ..., INSERT_VALUES);

MatAssemblyBegin(A,MAT_FLUSH_ASSEMBLY);

MatAssemblyEnd( A,MAT_FLUSH_ASSEMBLY);

MatSetValues(A, ..., ADD_VALUES);

• MAT_FINAL_ASSEMBLY – final assembly to make A ready to use

• MAT_FLUSH_ASSEMBLY – cheaper, suffices for INSERT_VALUES/ADD_VALUES interleaving

40

Getting valuesMat A;

• get a copy of the 3x2 local block of A with global row indices ii and global column indices jj to an array v:

PetscInt ii[]={11, 22, 33}; PetscInt jj[]={12, 24};

PetscScalar v[6];

MatGetValues(A,3,ii,2,jj,v);

• get the row of the matrix A:

PetscInt nnz, *nzi; PetscScalar *vals;

MatGetRow( A, i, &nnz, &nzi, &vals);

// read the array vals (dont‘t alter!)

MatRestoreRow(A, i, &nnz, &nzi, &vals);

41

Some other matrix operations

• matrix

MatNorm(A, NORM_INFINITY, &norm); //NORM_1,

NORM_FROBENIUS MatScale(A, 2.2);

• matrix-vector

MatMult(A,x,y);

MatMultAdd(A,x,y);

MatMultTranspose(A,x,y);

• matrix-matrix

MatMatMult( A,B,MAT_INITIAL_MATRIX,fill,&C);

MatMatTransposeMult(A,B,MAT_INITIAL_MATRIX,fill,&C);

MatTransposeMatMult(A,B,MAT_INITIAL_MATRIX,fill,&C);

42

Example 3• Open petsc_3_ex.c

43

Linear System Solvers

KSP ksp;

KSPType type = KSPCG;

MPI_Comm comm = PETSC_COMM_WORLD;

// initialize A,b,x

• create: KSPCreate(comm, &ksp);

• type: KSPSetType(ksp, type); //default = KSPGMRES

• options: KSPSetFromOptions(ksp);

• dealloc: KSPDestroy(&ksp);

44

Solving linear systemsMat A; Vec b,x;

// initialize A,b,x

KSPSetOperators(ksp, A, A);

KSPSolve(ksp,b,x);

• The first matrix defines the linear system

• The second matrix is used in constructing the preconditioner

45

Solve tolerances

KSPSetTolerances(ksp, rtol, atol, dtol, maxit);

• rtol = the relative convergence tolerance

stop if: residual < rtol * norm(RHS)

• atol = absolute conv. tol.

stop if: residual < atol

• dtol = divergence tolerance

stop if: residual > dtol * norm(RHS)

• maxit = maximum number of iterations

• all can be set to PETSC_DEFAULT

• cmd-line: -ksp_rtol 1e-6 -ksp_divtol 1e-15 -ksp_atol 10

-ksp_max_it 2000

46

Preconditioners

• many built-in and interfaced preconditioners

• ILU, block Jacobi, sparse approximate inverse ...

• can be composed

PC pc;

KSPGetPC(ksp, &pc);

PCSetType(pc, PCILU);

47

Direct solvers

• Direct solvers = special case of iterative solvers

• just one iteration with application of "perfect preconditioner", i.e. forward & backward substitution of a complete factor

KSPSetType(ksp, KSPPREONLY);

PCSetType(pc, PCLU); // or PCCHOLESKY

48

Example 4• Open petsc_4_ex.c

49

No deliverable today. :)

50

top related