review/preface to second day of dd-15 tutorial · dd15 tutorial, berlin, 17-18 july 2003 petsc...

Post on 01-Jan-2021

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Review/Preface to Second Day of DD-15 Tutorial

William D. Gropp

Lab for Advanced Numerical SoftwareArgonne National Laboratory

David E. Keyes

Department of Applied Physics & Applied MathematicsColumbia University

DD15 Tutorial, Berlin, 17-18 July 2003

PETSc codeUser code

ApplicationInitialization

FunctionEvaluation

JacobianEvaluation

Post-Processing

PC KSPPETSc

Main Routine

Linear Solvers (SLES)

Nonlinear Solvers (SNES)

Timestepping Solvers (TS)

PETSc’s SLES & SNES: objects and algorithms

DD15 Tutorial, Berlin, 17-18 July 2003

PETSc objectsl Vector and matrix objects, Vec and Mat

n Accessed and used as mathematical objects n Code usage independent of runtime choice of data structuresn Implemented to allow for high performance

u Split-phase operationsu Overlap of communication and computationu Aggregation, locality, low memory footprint

l Cartesian grid object DAl Solver objects SLES and SNES

n Large collection of progressive algorithmic options assembled under abstract interfaces

n Recursively callable, with separate contexts for sub-objectsn Command-line tunable

l Auxilliary objects (e.g., Viewer)

DD15 Tutorial, Berlin, 17-18 July 2003

Basic and advanced algorithmsl Linear solvers

n Schwarz: single-level, two-leveln Schur: reduced, full (interface-interior, primal-dual)n Schwarz-Schur hybrids

l Nonlinear solversn Newton-Krylov-Schwarz (NKS)n Jacobian-free Newton-Krylov (JFNK)

l Implicit time integratorsn Time-accurate time stepping (PETSc’s TS)n Pseudo-transient continuation (? tc)

DD15 Tutorial, Berlin, 17-18 July 2003

Scalability properties of DDl Parallel scaling (time per iteration)

n Good “surface-to-volume” ratio: communication subdominant to computation, Amdahl’s law defeated

n Reasonable communication requirements: mostly nearest neighbor, of small size when global

n Can increase processors without bound with increased system size, at power-law rate that depends upon richness of communication network

l Convergence scaling (iterations needed for solution)n Schwarz and Schur can have bounded condition number

n Newton can have quadratic convergencen Nonlinear robustification techniques

DD15 Tutorial, Berlin, 17-18 July 2003

So far…l Good stories mentioned …

n 1999 Gordon Bell Prize for aerodynamic computation using NKS (PETSc, Argonne)

n 2002 Gordon Bell Prize for structural dynamics computation using FETI-DP (Salinas, Sandia)

n Combustion model for ? tc

l But only simple examples presented in detail …n Poisson problem

n Bratu problem

n Driven cavity

DD15 Tutorial, Berlin, 17-18 July 2003

This morning …l Brief review

l More advanced algorithms

l Programmatic context of current PETSc development environmentn Mathematicians and computer scientists, come join

the fun☺

l Broader sense of PDE-type scientific applications being addressed through PETScn Applications scientists, come join the fun☺

DD15 Tutorial, Berlin, 17-18 July 2003

Two distinct definitions of scalabilityl “Strong scaling”

n execution time decreases in inverse proportion to the number of processors

n fixed size problem overall

l “Weak scaling”n execution time remains constant,

as problem size and processor number are increased in proportion

n fixed size problem per processor

n also known as “Gustafson scaling”

T

p

good

poor

poor

N ∝ p

log T

log pgood

N constant

Slope= -1

Slope= 0

DD15 Tutorial, Berlin, 17-18 July 2003

Three important usages of “efficiency”l For any iterative method,

l “Convergence efficiency”

l “Implementation efficiency”

l Overall efficiency

l “Ideal” is unity for all three measures

)()(# iterpertimeitersoftime ×=

)(#)(#

large

smallconv Pforitersof

Pforitersof=η

large

small

large

smallimpl )(

)(PP

PforiterpertimePforiterpertime

×=η

large

small

large

smallimplconv )(

)(PP

PfortimePfortime

×=×= ηηη

DD15 Tutorial, Berlin, 17-18 July 2003

PDE varietiesl Evolution (time hyperbolic, time parabolic)

l Equilibrium (elliptic, spatially hyperbolic or parabolic)

l Mixed, varying by region

l Mixed, of multiple type (e.g., parabolic with elliptic constraint)

KK =∇•⋅∇−•∂∂

=•⋅∇+•∂∂

)()(,))(()( αt

ft

K=∇•−•⋅∇ )( αU

DD15 Tutorial, Berlin, 17-18 July 2003

Common featuresl Despite enormous variety of mathematical analysis

required, most computational algorithms for PDEs can be described with surprisingly few common bulk-synchronous (SPMD) kernelsn Domain-decomposed loops over mesh points manipulating purely

local data

n Domain-decomposed loops over mesh points (or edges) manipulating own and near neighbors’ data

n Global reductions and recurrences

l The large variety of operations, yet simplicity of operation class, invites abstraction and performance optimization: in short, an object oriented library

DD15 Tutorial, Berlin, 17-18 July 2003

Questions?

l Now is a great time for them!

l Next, we go on to nonlinear Schwarz and advanced concepts in nonlinear solver software, to finish Lecture #2 from Thursday afternoon

l Then we start Lecture #3 on applications this morning

l This afternoon’s Lecture #4 is on two research topics: physics-based preconditioning and PDE-constrained optimization

l Whatever we don’t finish will be posted on line by this evening

DD15 Tutorial, Berlin, 17-18 July 2003

Nonlinear Schwarz preconditioningl Nonlinear Schwarz has Newton both inside and

outside and is fundamentally Jacobian-freel It replaces with a new nonlinear system

possessing the same root, l Define a correction to the partition (e.g.,

subdomain) of the solution vector by solving the following local nonlinear system:

where is nonzero only in the components of the partition

l Then sum the corrections: to get an implicit function of u

0)( =uF0)( =Φ u

thi

thi

)(uiδ

0))(( =+ uuFR ii δn

i u ℜ∈)(δ

)()( uu ii δ∑=Φ

DD15 Tutorial, Berlin, 17-18 July 2003

Nonlinear Schwarz – picture

11

11

0 0

u

F(u)

Ri

RiuRiF

DD15 Tutorial, Berlin, 17-18 July 2003

Nonlinear Schwarz – picture

11

11

0 0

11

11

0 0

u

F(u)

Ri

Rj

Riu

RjF

RiF

Rju

DD15 Tutorial, Berlin, 17-18 July 2003

Nonlinear Schwarz – picture

11

11

0 0

11

11

0 0

u

F(u)

Fi’(ui)

Ri

Rj

Riu

RjF

RiF

Rju

diu+dju

DD15 Tutorial, Berlin, 17-18 July 2003

Nonlinear Schwarz, cont.l It is simple to prove that if the Jacobian of F(u) is

nonsingular in a neighborhood of the desired root then and have the same unique root

l To lead to a Jacobian-free Newton-Krylov algorithm we need to be able to evaluate for any :n The residual n The Jacobian-vector product

l Remarkably, (Cai-Keyes, 2000) it can be shown that

where and l All required actions are available in terms of !

0)( =Φ u

nvu ℜ∈,)()( uu ii δ∑=Φ

0)( =uF

vu ')(Φ

JvRJRvu iiTii )()( 1' −∑≈Φ

)(' uFJ = Tiii JRRJ =

)(uF

DD15 Tutorial, Berlin, 17-18 July 2003

Nonlinear Schwarz, cont.

Discussion:l After the linear version of additive Schwarz was

invented, it was realized that it is in the same class of methods as additive multigrid, in terms of operator algebra: projection off the error in a set of subspaces (though linear MG is usually done multiplicatively)

l After nonlinear Schwarz was invented, it was realized that it is in the same class of methods as nonlinear multigrid (full approximation scheme MG), in terms of operator algebra (though nonlinear multigrid is usually done multiplicatively)

DD15 Tutorial, Berlin, 17-18 July 2003

Driven cavity in velocity-vorticity coords

02 =∂∂

−∇−y

02 =∂∂

+∇−x

0Gr2 =∂∂

−∂∂

+∂∂

+∇−xT

yv

xu

ωωω

0)(Pr2 =∂∂

+∂∂

+∇−yT

vxT

uT

x-velocity

y-velocity

vorticity

internal energy

hotcold

DD15 Tutorial, Berlin, 17-18 July 2003

Experimental example of nonlinear Schwarz

Newton’s method Additive Schwarz Preconditioned Inexact Newton(ASPIN)

Difficulty at critical Re

Stagnation beyond

critical Re

Convergence for all Re

DD15 Tutorial, Berlin, 17-18 July 2003

Common software infrastructure for nonlinear PDE solvers – coming in PETSc

l The user codes to the problem they are solving (by providing F(u)), not the algorithm used to solve the problem

l Implementation of various algorithms reuse common concepts and code when possible, withoutlosing efficiency

Optimizer

Linear solver

Eigensolver

Time integrator

Nonlinear solver

Indicates dependence

Sens. AnalyzerVision:

DD15 Tutorial, Berlin, 17-18 July 2003

Encompassing …l Newton’s method

n Jacobian-free methods n w/Jacobian-based preconditioned linear solvers

(Newton-PCG)n w/multigrid linear solvers (Newton-MG)

l Nonlinear multigridn a.k.a. Full approximation scheme (FAS) n a.k.a. MG-Newton

DD15 Tutorial, Berlin, 17-18 July 2003

Software engineering ingredients

l Standard solver interfaces (SLES, SNES)

l Solver libraries

l Automatic differentiation (AD) tools

l Code generation tools

DD15 Tutorial, Berlin, 17-18 July 2003

Algorithm review

F(u) = 0, Jacobian A(u)

Newton

Newton–SOR (1 inner linear sweep over i)

SOR–Newton (1 inner nonlinear sweep over i)

1( ) ( )u u A u F u−← −

1( ){ ( ) ( )[ ]}i ji ii i ij jj i

u u A u F u A u u u−

<

← − − −∑

1( ) ( )ii ii iu u A u F u−← −

DD15 Tutorial, Berlin, 17-18 July 2003

Cute observation

( ) ( )

( ) ( ) ( )[ ]

ii ii

ji i ij j

A u A u

F u F u A u u u

≈ + −∑1( ){ ( ) ( )[ ]}i ji ii i ij j

j i

u u A u F u A u u u−

<

← − − −∑

1( ) ( )ii ii iu u A u F u−← −SOR-Newton

… with approximations

… gives Newton-SOR

⇒ (Gauss-Seidel) matrix-free linear relaxation

is very similar to nonlinear relaxation

DD15 Tutorial, Berlin, 17-18 July 2003

Function and Jacobian evaluation

l FAS requires them pointwise

l Newton requires them globally

l Newton-MG requires bothn Newton (outside) globally

n MG (inside) locally

DD15 Tutorial, Berlin, 17-18 July 2003

Enter Automatic Differentiation

l Given code for AD can compute

n A(u) and

n A(u)*w efficientlyl Given code for AD can compute

n and

n efficiently

( )F u

( )iF u( )iiA u

( )ij jj

A u w∑

DD15 Tutorial, Berlin, 17-18 July 2003

Automated Code generation (“in-lining”)

• Newton method requires a user-provided function and an AD Jacobian

• Big performance hit if handled directly with components§ A outer loop over an inner subroutine that calls a function at

each point is inefficient globally

§ An outer subroutine that loops over all inner points is wasteful locally (information thrown away)

• Need to have it both ways, depending upon algorithmic context; user should not see this level of performance-induced coding complexity

DD15 Tutorial, Berlin, 17-18 July 2003

Coarse grid correction can also be handled

Coarse grid objects generated together with fine

(e.g., PETSc’s DMMG)

l Newton-MG

l MG-Newton

( ) ( )H H

TH

A Ru c RF u

u u R c

=

← −

)

( ) ( ) ( ) 0H H HF Ru c F Ru RF u+ − + =) )

DD15 Tutorial, Berlin, 17-18 July 2003

Conclusion

The algorithmic/mathematical building blocks for Newton-MG and MG-Newton are essentially the same.

Thus the software building blocks should be also (and they will be in the next release of PETSc, version 3.0).

top related