exercises 7: sparse linear algebra via petsc

Exercises 7:Sparse Linear Algebra via PETSc

November 14, 2019

Adapted from PRACE PETSc tutorials: http://www.training.prace-ri.eu/tutorials/index.html

Reminders

• Homework 2 due today

• Homework 3 posted; due November 28, 2019 17:20

• No lecture or exercises next week :)

Today• PETSc tutorial

• Run code examples

• Adapted from tutorial by Václav Hapla, IT4Innovations & Department of Applied Mathematics, VSB - Technical University of Ostrava

Libraries implementing solvers for Sparse LA• Trilinos (Sandia National Lab), https://trilinos.github.io/

• PETSc, Portable, Extensible Toolkit forScientific Computation (Argonne National Lab), https://www.mcs.anl.gov/petsc/

• many numerical algorithms (LU, CG, SPMV, …) implemented, tested and ready to use!

• building blocks (data structures and routines) for the scalable parallel solution of scientific applications

• allows thinking in terms of high-level objects (matrices) instead of low-level objects (raw arrays)

• coded primarily in C language but good FORTRAN support, can also be called from C++, Python and Java codes

• highly portable

• source code and mailing lists open to anybody

Role of PETSc

"Developing parallel, nontrivial PDE solvers that deliver high performance is still difficult and requires months (or even years) of concentrated effort. PETSc is a toolkit that can ease these difficulties and reduce the development time, but it is not a black-box PDE solver, nor a silver bullet."

Barry Smith (PETSc founder)

PETSc components

Parallelism in PETSc• PETSc is parallelized mostly using MPI

• MPI provides low-level routines to exchange data primitives between processes

• PETSc provides mid-level routines such as

• insert matrix element to arbitrary location

• parallel matrix-vector product

• you can call MPI directly if needed

• same code for sequential and parallel runs

• support for hybrid MPI + {pthreads, OpenMP, CUDA} parallelism

PETSc Interfaces• Dense lin. algebra: BLAS, LAPACK, Elemental

• Sparse direct lin. sys. solvers: MUMPS, SuperLU, SuperLU_Dist, PaStiX, UMFPACK, LUSOL

• Iterative solvers / multigrid / preconditioners: HYPRE, Trilinos ML, SPAI

• Graph partitioning: ParMetis, Scotch, Party, Chaco

• FFT: FFTW

• ODE: Sundials

• Data exchange: HDF5

• Mathematics packages: MATLAB, Mathematica

PETSc Interfaced By• TAO - Toolkit for Advanced Optimization

• SLEPc - Scalable Library for Eigenvalue Problems

• fluidity - a finite element/volume fluids code

• OpenFVM - finite volume based CFD solver

• OOFEM - object oriented finite element library

• libMesh - adaptive finite element library

• MOOSE - Multiphysics Object-Oriented Simulation Environment

• DEAL.II - sophisticated C++ based finite element simulation package

• PHMAL - The Parallel Hierarchical Adaptive MultiLevel Project

• Chaste - Cancer, Heart and Soft Tissue Environment

• PetIGA - A framework for high performance Isogeometric Analysis

Datatypes• PETSc provides its own primitive data types

PetscInt n = 20;

PetscScalar v = -3.5, w = 3.1e9;

PetscReal x = 2.55, y = 1e-9;

• It is better to use them instead of built-in C types

• better portability

• easy switching between real and complex scalars

• easy switching between 32-bit and 64-bit numbers

PETSc Communicators

• communicator = an opaque object of MPI_Comm type that defines process group and synchronization channel

• PETSc built-in communicators:

• PETSC_COMM_SELF – just this process – for serial objects –

• PETSC_COMM_WORLD – all processes – for parallel objects

• you can define your own communicators

Error Handling• PETSc is written in C

• C has no support for C++ exceptions

• instead of throwing exception, every routine returns integer error code (PetscErrorCode type)

• error code is catched by CHKERRQ macro:

PetscErrorCode ierr;

ierr = SomePetscRoutine();CHKERRQ(ierr);

Command-line options

• on the command-line

./program -myint 10 -myreal 1e3

• in the program:

PetscInt myint;

PetscReal myreal;

PetscOptionsGetInt(PETSC_NULL,PETSC_NULL,"myint",&myint,PETSC_NULL);

PetscOptionsGetReal(PETSC_NULL,PETSC_NULL,"-myreal",&myreal,PETSC_NULL);

// now myint=10, myreal=1e3

Examples for Today• Get ex7.tar from Moodle; move to cluster

• Unpack tar file

• On cluster, need to load libraries needed to use PETSc. Type:

module load fenics/2019.1.0-ompi-petsc3.11-gcc8.3

HelloWorld Example• Open petsc_1_ex.c

• To make, type make (or make ex1)

• To run: (e.g.)

srun -n 2 ./ex1 -myint 5

PETSc Inheritance• PETSc uses 3-level inheritance

• every object in PETSc is an instance of a class: Vec, Mat, PC, KSP, SNES, …

• all classes inherit from PetscObject

• functions called on objects (methods) are prefixed with a class name:

MatMult(Mat,…)

• a new object is created with a class-specific Create function (constructor):

Mat A;

MatCreate(comm, &A);

• every class is further refined to types specified with

SetType MatSetType(A, MATSEQAIJ);

PETSc Inheritance

Vectors in PETScVec v; PetscInt m=2, M=8;

VecType type=VECMPI;

MPI_Comm comm=PETSC_COMM_WORLD;

• create: VecCreate(comm, &v);

• layout: VecSetSizes(v,m,M);

• type: VecSetType(v,type);

• options: VecSetFromOptions(v);

• dealloc: VecDestroy(&v);

//VECSEQ, VECSTANDARD

//PETSC_COMM_SELF

//VecSetSizes(v,M,M);

Sequential variant:

//VECSEQ, VECSTANDARD

//PETSC_COMM_SELF

//VecSetSizes(v,M,M);

Sequential variant:

• all in one: VecCreateMPI(comm,m,M,&v);

//VecCreateSeq(comm,M,&v);

Parallel layout

• consider the vector v with local size m, global size M, distributed across 3 processes

• call VecSetSizes(v,m,M) to set the sizes

Parallel layout• set either m or M to PETSC_DECIDE, i.e. let PETSc use the standard

layout

• get this standard layout across comm: PetscSplitOwnership(comm,&m,&M)

Query Layout• local and global sizes: VecGetLocalSize(v,&m) and VecGetSize(v,&M)

• global indices of the first and last elements of the local portion: VecGetOwnershipRange(v,&lo,&hi)

Vector assembly

Vec x;

• Set all entries to constant value: VecSet(x, 1.0);

• Set all entries to 0: VecZeroEntries(x);

• Set an individual element (global indexing !):

PetscInt i = 10;

PetscReal v = 3.14;

VecSetValue(x, i, v, INSERT_VALUES);

VecSetValues(x, 1, &i, &v, INSERT_VALUES);

Vector assembly

• Set multiple entries at once:

PetscInt ii[]={1, 2}; PetscReal vv[]={2.7, 3.1);

VecSetValues(x, 2, ii, vv, INSERT_VALUES);

• The last argument can be

• INSERT_VALUES replace original value (=)

• ADD_VALUES add the new values to the original values (+=)

Vector assembly

• VecSetValues is purely local with no inter-process communication

• values are just locally cached

• before using the vector, call assembly function pair to exchange values between processors:

VecAssemblyBegin(x);

VecAssemblyEnd(x);

Vector assembly

• VecSetValues is purely local with no inter-process communication

VecAssemblyBegin(x);

//optional calculations not involving x

VecAssemblyEnd(x);

• computations can be done while MPI messages are in transition

• this allows overlapping communication and computation

Getting values

Vec x;

• get a copy of 2 local entries of x with global indices ix to an array y:

PetscInt ix[]={10, 20};

PetscScalar v[2];

VecGetValues(x, 2, ix, v)

• get the pointer to the whole local internal array:

PetscScalar *a; VecGetArray(x, &a);

// read and/or modify the array a

VecRestoreArray(x, &a);

Duplicate and copy

• Create another Vec with the same type & layout:

Vec v, w;

VecDuplicate(v, &w);

• Copy the entries from v to w:

VecCopy(v, w);

Example 2• Open petsc_2_ex.c

Matrices in PETScMat A; PetscInt m=2, n=3, M=8, N=12;

MatType type=MATMPIAIJ;

• create: MatCreate(comm, &A);

• layout: MatSetSizes(A,m,n,M,N);

• type: MatSetType(A,type);

• prealloc: MatMPIAIJSetPreallocation(A,5,PETSC_NULL,5,PETSC_NULL);

• all in one: MatCreateMPIAIJ(comm,m,n,M,N,5,PETSC_NULL,5,PETSC_NULL,&A);

• options: MatSetFromOptions(A);

• dealloc: MatDestroy(&A);

// MATSEQAIJ,MATAIJ

// PETSC_COMM_SELF

Sequential variants:

//MatSetSizes(v,M,N,M,N)

// MatSeqAIJSetPreallocation(A,5,PETSC_NULL);

Matrices in PETScMat A; PetscInt m=2, n=3, M=8, N=12;

MatType type=MATMPIAIJ;

• create: MatCreate(comm, &A);

• layout: MatSetSizes(A,m,n,M,N);

• type: MatSetType(A,type);

• prealloc: MatMPIAIJSetPreallocation(A,5,PETSC_NULL,5,PETSC_NULL);

• all in one: MatCreateMPIAIJ(comm,m,n,M,N,5,PETSC_NULL,5,PETSC_NULL,&A);

• options: MatSetFromOptions(A);

• dealloc: MatDestroy(&A);

// MATSEQAIJ,MATAIJ

// PETSC_COMM_SELF

Sequential variants:

//MatSetSizes(v,M,N,M,N)

// MatSeqAIJSetPreallocation(A,5,PETSC_NULL);

Basic Matrix Types

• MATAIJ, MATSEQAIJ, MATMPIAIJ

• basic sparse format, known as compressed row format, CRS, Yale

• MATAIJ means MATSEQAIJ with a single process communicator, and MATMPIAIJ otherwise.

• MATBAIJ, MATSEQBAIJ, MATMPIBAIJ

• extensions of the AIJ formats described above

• store matrix elements by fixed-sized dense blocks

• intended especially for use with multiclass PDEs

• multiple DOFs per mesh node

• MATDENSE, MATSEQDENSE, MATMPIDENSE

• dense matrices

Parallel Layout Compatibility

Matrix AssemblyMat A;

• Set all (allocated) entries to 0:

MatZeroEntries(A);

• Set an individual element (global indexing !):

PetscInt i = 1, j = 2; PetscReal v = 3.14;

MatSetValue(A, i, j, v, INSERT_VALUES);

MatSetValues(A, 1, &i, 1, &i, &v, INSERT_VALUES);

Matrix Assembly• Set multiple entries at once:

PetscInt ii[2]={1, 2}, jj[2]={11, 12};

PetscReal vv[4]={1.3, 2.7, 3.1, 4.5};

MatSetValues(A, 2, ii, 2, jj, vv, INSERT_VALUES);

• The last argument can be

• INSERT_VALUES replace original value (=)

• ADD_VALUES add the new values to the original values (+=)

Matrix Assembly

• MatSetValues is purely local with no inter-process communication

MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);

MatAssemblyEnd( A,MAT_FINAL_ASSEMBLY);

Matrix Assembly

• MatSetValues is purely local with no inter-process communication

MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);

//optional calculations not involving A

MatAssemblyEnd( A,MAT_FINAL_ASSEMBLY);

• computations can be done while MPI messages are in transition

• this allows overlapping communication and computation

Matrix Assembly

• cannot mix inserting and adding values

• need to do assembly between

MatSetValues(A, ..., INSERT_VALUES);

MatAssemblyBegin(A,MAT_FLUSH_ASSEMBLY);

MatAssemblyEnd( A,MAT_FLUSH_ASSEMBLY);

MatSetValues(A, ..., ADD_VALUES);

• MAT_FINAL_ASSEMBLY – final assembly to make A ready to use

• MAT_FLUSH_ASSEMBLY – cheaper, suffices for INSERT_VALUES/ADD_VALUES interleaving

Getting valuesMat A;

• get a copy of the 3x2 local block of A with global row indices ii and global column indices jj to an array v:

PetscInt ii[]={11, 22, 33}; PetscInt jj[]={12, 24};

PetscScalar v[6];

MatGetValues(A,3,ii,2,jj,v);

• get the row of the matrix A:

PetscInt nnz, *nzi; PetscScalar *vals;

MatGetRow( A, i, &nnz, &nzi, &vals);

// read the array vals (dont‘t alter!)

MatRestoreRow(A, i, &nnz, &nzi, &vals);

Some other matrix operations

• matrix

MatNorm(A, NORM_INFINITY, &norm); //NORM_1,

NORM_FROBENIUS MatScale(A, 2.2);

• matrix-vector

MatMult(A,x,y);

MatMultAdd(A,x,y);

MatMultTranspose(A,x,y);

• matrix-matrix

MatMatMult( A,B,MAT_INITIAL_MATRIX,fill,&C);

MatMatTransposeMult(A,B,MAT_INITIAL_MATRIX,fill,&C);

MatTransposeMatMult(A,B,MAT_INITIAL_MATRIX,fill,&C);

Linear System Solvers

KSP ksp;

KSPType type = KSPCG;

MPI_Comm comm = PETSC_COMM_WORLD;

// initialize A,b,x

• create: KSPCreate(comm, &ksp);

• type: KSPSetType(ksp, type); //default = KSPGMRES

• options: KSPSetFromOptions(ksp);

• dealloc: KSPDestroy(&ksp);

Solving linear systemsMat A; Vec b,x;

// initialize A,b,x

KSPSetOperators(ksp, A, A);

KSPSolve(ksp,b,x);

• The first matrix defines the linear system

• The second matrix is used in constructing the preconditioner

Solve tolerances

KSPSetTolerances(ksp, rtol, atol, dtol, maxit);

• rtol = the relative convergence tolerance

stop if: residual < rtol * norm(RHS)

• atol = absolute conv. tol.

stop if: residual < atol

• dtol = divergence tolerance

stop if: residual > dtol * norm(RHS)

• maxit = maximum number of iterations

• all can be set to PETSC_DEFAULT

• cmd-line: -ksp_rtol 1e-6 -ksp_divtol 1e-15 -ksp_atol 10

-ksp_max_it 2000

Preconditioners

• many built-in and interfaced preconditioners

• ILU, block Jacobi, sparse approximate inverse ...

• can be composed

PC pc;

KSPGetPC(ksp, &pc);

PCSetType(pc, PCILU);

Direct solvers

• Direct solvers = special case of iterative solvers

• just one iteration with application of "perfect preconditioner", i.e. forward & backward substitution of a complete factor

KSPSetType(ksp, KSPPREONLY);

PCSetType(pc, PCLU); // or PCCHOLESKY

No deliverable today. :)

exercises 7: sparse linear algebra via petsc

Documents

petsc/tao users manual

Руководство пользователя по...

manual petsc

petsc tutorials

petsc users manual (pdf) - argonne national laboratory ·...

petsc portable, extensible toolkit for scientific computing

getting started with petsc - mathematics and computer...

viennacl and petsc tutorial - karl rupp · viennacl and...

research article a petsc-based parallel implementation of...

laplace-example with mpi and petsc · 42. —...

petsc users manual - anl.gov

runtime configurability in...

a scalable multiphysics network simulation using petsc...

numerical optimization using petsc/tao -...

petsc: high-performance software for engineering and...

petsc tutorial - argonne national laboratory tutorial petsc...

petsc tutorial

petsc users manual - massachusetts institute of...

efficient petsc solvers for ... - d-scholarship@pitt

hpc course on mpi, petsc, and openmp