review/preface to second day of dd-15 tutorial · dd15 tutorial, berlin, 17-18 july 2003 petsc...

Review/Preface to Second Day of DD-15 Tutorial

William D. Gropp

Lab for Advanced Numerical SoftwareArgonne National Laboratory

David E. Keyes

Department of Applied Physics & Applied MathematicsColumbia University

DD15 Tutorial, Berlin, 17-18 July 2003

PETSc codeUser code

ApplicationInitialization

FunctionEvaluation

JacobianEvaluation

Post-Processing

PC KSPPETSc

Main Routine

Linear Solvers (SLES)

Nonlinear Solvers (SNES)

Timestepping Solvers (TS)

PETSc’s SLES & SNES: objects and algorithms

PETSc objectsl Vector and matrix objects, Vec and Mat

n Accessed and used as mathematical objects n Code usage independent of runtime choice of data structuresn Implemented to allow for high performance

u Split-phase operationsu Overlap of communication and computationu Aggregation, locality, low memory footprint

l Cartesian grid object DAl Solver objects SLES and SNES

n Large collection of progressive algorithmic options assembled under abstract interfaces

n Recursively callable, with separate contexts for sub-objectsn Command-line tunable

l Auxilliary objects (e.g., Viewer)

Basic and advanced algorithmsl Linear solvers

n Schwarz: single-level, two-leveln Schur: reduced, full (interface-interior, primal-dual)n Schwarz-Schur hybrids

l Nonlinear solversn Newton-Krylov-Schwarz (NKS)n Jacobian-free Newton-Krylov (JFNK)

l Implicit time integratorsn Time-accurate time stepping (PETSc’s TS)n Pseudo-transient continuation (? tc)

Scalability properties of DDl Parallel scaling (time per iteration)

n Good “surface-to-volume” ratio: communication subdominant to computation, Amdahl’s law defeated

n Reasonable communication requirements: mostly nearest neighbor, of small size when global

n Can increase processors without bound with increased system size, at power-law rate that depends upon richness of communication network

l Convergence scaling (iterations needed for solution)n Schwarz and Schur can have bounded condition number

n Newton can have quadratic convergencen Nonlinear robustification techniques

So far…l Good stories mentioned …

n 1999 Gordon Bell Prize for aerodynamic computation using NKS (PETSc, Argonne)

n 2002 Gordon Bell Prize for structural dynamics computation using FETI-DP (Salinas, Sandia)

n Combustion model for ? tc

l But only simple examples presented in detail …n Poisson problem

n Bratu problem

n Driven cavity

This morning …l Brief review

l More advanced algorithms

l Programmatic context of current PETSc development environmentn Mathematicians and computer scientists, come join

the fun☺

l Broader sense of PDE-type scientific applications being addressed through PETScn Applications scientists, come join the fun☺

Two distinct definitions of scalabilityl “Strong scaling”

n execution time decreases in inverse proportion to the number of processors

n fixed size problem overall

l “Weak scaling”n execution time remains constant,

as problem size and processor number are increased in proportion

n fixed size problem per processor

n also known as “Gustafson scaling”

N ∝ p

log pgood

N constant

Slope= -1

Slope= 0

Three important usages of “efficiency”l For any iterative method,

l “Convergence efficiency”

l “Implementation efficiency”

l Overall efficiency

l “Ideal” is unity for all three measures

)()(# iterpertimeitersoftime ×=

)(#)(#

smallconv Pforitersof

Pforitersof=η

smallimpl )(

PforiterpertimePforiterpertime

smallimplconv )(

PfortimePfortime

×=×= ηηη

PDE varietiesl Evolution (time hyperbolic, time parabolic)

l Equilibrium (elliptic, spatially hyperbolic or parabolic)

l Mixed, varying by region

l Mixed, of multiple type (e.g., parabolic with elliptic constraint)

KK =∇•⋅∇−•∂∂

=•⋅∇+•∂∂

)()(,))(()( αt

K=∇•−•⋅∇ )( αU

Common featuresl Despite enormous variety of mathematical analysis

required, most computational algorithms for PDEs can be described with surprisingly few common bulk-synchronous (SPMD) kernelsn Domain-decomposed loops over mesh points manipulating purely

local data

n Domain-decomposed loops over mesh points (or edges) manipulating own and near neighbors’ data

n Global reductions and recurrences

l The large variety of operations, yet simplicity of operation class, invites abstraction and performance optimization: in short, an object oriented library

Questions?

l Now is a great time for them!

l Next, we go on to nonlinear Schwarz and advanced concepts in nonlinear solver software, to finish Lecture #2 from Thursday afternoon

l Then we start Lecture #3 on applications this morning

l This afternoon’s Lecture #4 is on two research topics: physics-based preconditioning and PDE-constrained optimization

l Whatever we don’t finish will be posted on line by this evening

Nonlinear Schwarz preconditioningl Nonlinear Schwarz has Newton both inside and

outside and is fundamentally Jacobian-freel It replaces with a new nonlinear system

possessing the same root, l Define a correction to the partition (e.g.,

subdomain) of the solution vector by solving the following local nonlinear system:

where is nonzero only in the components of the partition

l Then sum the corrections: to get an implicit function of u

0)( =uF0)( =Φ u

)(uiδ

0))(( =+ uuFR ii δn

i u ℜ∈)(δ

)()( uu ii δ∑=Φ

Nonlinear Schwarz – picture

RiuRiF

Fi’(ui)

diu+dju

Nonlinear Schwarz, cont.l It is simple to prove that if the Jacobian of F(u) is

nonsingular in a neighborhood of the desired root then and have the same unique root

l To lead to a Jacobian-free Newton-Krylov algorithm we need to be able to evaluate for any :n The residual n The Jacobian-vector product

l Remarkably, (Cai-Keyes, 2000) it can be shown that

where and l All required actions are available in terms of !

0)( =Φ u

nvu ℜ∈,)()( uu ii δ∑=Φ

0)( =uF

vu ')(Φ

JvRJRvu iiTii )()( 1' −∑≈Φ

)(' uFJ = Tiii JRRJ =

Nonlinear Schwarz, cont.

Discussion:l After the linear version of additive Schwarz was

invented, it was realized that it is in the same class of methods as additive multigrid, in terms of operator algebra: projection off the error in a set of subspaces (though linear MG is usually done multiplicatively)

l After nonlinear Schwarz was invented, it was realized that it is in the same class of methods as nonlinear multigrid (full approximation scheme MG), in terms of operator algebra (though nonlinear multigrid is usually done multiplicatively)

Driven cavity in velocity-vorticity coords

02 =∂∂

−∇−y

02 =∂∂

+∇−x

0Gr2 =∂∂

−∂∂

+∂∂

+∇−xT

ωωω

0)(Pr2 =∂∂

+∂∂

+∇−yT

x-velocity

y-velocity

vorticity

internal energy

hotcold

Experimental example of nonlinear Schwarz

Newton’s method Additive Schwarz Preconditioned Inexact Newton(ASPIN)

Difficulty at critical Re

Stagnation beyond

critical Re

Convergence for all Re

Common software infrastructure for nonlinear PDE solvers – coming in PETSc

l The user codes to the problem they are solving (by providing F(u)), not the algorithm used to solve the problem

l Implementation of various algorithms reuse common concepts and code when possible, withoutlosing efficiency

Optimizer

Linear solver

Eigensolver

Time integrator

Nonlinear solver

Indicates dependence

Sens. AnalyzerVision:

Encompassing …l Newton’s method

n Jacobian-free methods n w/Jacobian-based preconditioned linear solvers

(Newton-PCG)n w/multigrid linear solvers (Newton-MG)

l Nonlinear multigridn a.k.a. Full approximation scheme (FAS) n a.k.a. MG-Newton

Software engineering ingredients

l Standard solver interfaces (SLES, SNES)

l Solver libraries

l Automatic differentiation (AD) tools

l Code generation tools

Algorithm review

F(u) = 0, Jacobian A(u)

Newton

Newton–SOR (1 inner linear sweep over i)

SOR–Newton (1 inner nonlinear sweep over i)

1( ) ( )u u A u F u−← −

1( ){ ( ) ( )[ ]}i ji ii i ij jj i

u u A u F u A u u u−

← − − −∑

1( ) ( )ii ii iu u A u F u−← −

Cute observation

( ) ( )

( ) ( ) ( )[ ]

ji i ij j

A u A u

F u F u A u u u

≈ + −∑1( ){ ( ) ( )[ ]}i ji ii i ij j

u u A u F u A u u u−

← − − −∑

1( ) ( )ii ii iu u A u F u−← −SOR-Newton

… with approximations

… gives Newton-SOR

⇒ (Gauss-Seidel) matrix-free linear relaxation

is very similar to nonlinear relaxation

Function and Jacobian evaluation

l FAS requires them pointwise

l Newton requires them globally

l Newton-MG requires bothn Newton (outside) globally

n MG (inside) locally

Enter Automatic Differentiation

l Given code for AD can compute

n A(u) and

n A(u)*w efficientlyl Given code for AD can compute

n efficiently

( )F u

( )iF u( )iiA u

( )ij jj

A u w∑

Automated Code generation (“in-lining”)

• Newton method requires a user-provided function and an AD Jacobian

• Big performance hit if handled directly with components§ A outer loop over an inner subroutine that calls a function at

each point is inefficient globally

§ An outer subroutine that loops over all inner points is wasteful locally (information thrown away)

• Need to have it both ways, depending upon algorithmic context; user should not see this level of performance-induced coding complexity

Coarse grid correction can also be handled

Coarse grid objects generated together with fine

(e.g., PETSc’s DMMG)

l Newton-MG

l MG-Newton

( ) ( )H H

A Ru c RF u

u u R c

← −

( ) ( ) ( ) 0H H HF Ru c F Ru RF u+ − + =) )

Conclusion

The algorithmic/mathematical building blocks for Newton-MG and MG-Newton are essentially the same.

Thus the software building blocks should be also (and they will be in the next release of PETSc, version 3.0).

review/preface to second day of dd-15 tutorial · dd15 tutorial, berlin, 17-18 july 2003 petsc...

Documents

detroit dd15 engine detroit™ dd15® engine developed...

detroit dd15 engine - velocity truck centers

abap objects - basics guided tutorial

installation procedure – detroit diesel dd15/dd13 ·...

isometrie-tutorial adobe illustrator - meissner · drawing...

e528 product guide - inncom by honeywell · pdf filevrv...

business objects step by step tutorial

the detroit dd15 engine with bluetec...

dd13-dd15 a&i manual

abap objects basics guided tutorial

sap sql anywhere high availability tutorial for business...

xsl formatting objects tutorial - renderx

detroit™ dd15® engine

dd15 spec sheet - detroit diesel

service manual dd15

detroit dd15 engine - western starthe detroit dd15® engine...

tutorial 13 working with objects and styles. xp objectives...

motor dd15

visp tutorial geometric objects

powerpoint tutorial 2: adding and modifying text and...