enabling fast convergence of the iterated penalty picard

28
Enabling fast convergence of the iterated penalty Picard iteration with O(1) penalty parameter for incompressible Navier-Stokes via Anderson acceleration Leo G. Rebholz a,1,* , Duygu Vargun a,1 , Mengying Xiao b a Department of Mathematical Sciences, Clemson University, Clemson, SC 29634, USA b Department of Mathematics and Statistics, University of West Florida, Pensacola, FL 32514, USA Abstract This paper considers an enhancement of the classical iterated penalty Picard (IPP) method for the incompressible Navier-Stokes equations, where we re- strict our attention to O(1) penalty parameter, and Anderson acceleration (AA) is used to significantly improve its convergence properties. After show- ing the fixed point operator associated with the IPP iteration is Lipschitz continuous and Lipschitz continuously (Frechet) differentiable, we apply a recently developed general theory for AA to conclude that IPP enhanced with AA improves its linear convergence rate by the gain factor associated with the underlying AA optimization problem. Results for several challeng- ing numerical tests are given and show that IPP with penalty parameter 1 and enhanced with AA is a very effective solver. 1. Introduction We consider solvers for the incompressible Navier-Stokes equations (NSE), which are given by * Corresponding author. Email addresses: [email protected] (Leo G. Rebholz ), [email protected] (Duygu Vargun ), [email protected] (Mengying Xiao) 1 This author was partially supported by NSF Grant DMS 2011490. Preprint submitted to arXiv May 21, 2021 arXiv:2105.09339v1 [math.NA] 19 May 2021

Upload: others

Post on 24-Apr-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Enabling fast convergence of the iterated penalty Picard

Enabling fast convergence of the iterated penalty Picard

iteration with O(1) penalty parameter for

incompressible Navier-Stokes via Anderson acceleration

Leo G. Rebholza,1,∗, Duygu Varguna,1, Mengying Xiaob

aDepartment of Mathematical Sciences, Clemson University, Clemson, SC 29634, USAbDepartment of Mathematics and Statistics, University of West Florida, Pensacola, FL

32514, USA

Abstract

This paper considers an enhancement of the classical iterated penalty Picard(IPP) method for the incompressible Navier-Stokes equations, where we re-strict our attention to O(1) penalty parameter, and Anderson acceleration(AA) is used to significantly improve its convergence properties. After show-ing the fixed point operator associated with the IPP iteration is Lipschitzcontinuous and Lipschitz continuously (Frechet) differentiable, we apply arecently developed general theory for AA to conclude that IPP enhancedwith AA improves its linear convergence rate by the gain factor associatedwith the underlying AA optimization problem. Results for several challeng-ing numerical tests are given and show that IPP with penalty parameter 1and enhanced with AA is a very effective solver.

1. Introduction

We consider solvers for the incompressible Navier-Stokes equations (NSE),which are given by

∗Corresponding author.Email addresses: [email protected] (Leo G. Rebholz ), [email protected]

(Duygu Vargun ), [email protected] (Mengying Xiao)1This author was partially supported by NSF Grant DMS 2011490.

Preprint submitted to arXiv May 21, 2021

arX

iv:2

105.

0933

9v1

[m

ath.

NA

] 1

9 M

ay 2

021

Page 2: Enabling fast convergence of the iterated penalty Picard

ut + u · ∇u+∇p− ν∆u = f, (1)

∇ · u = 0, (2)

where u and p are the unknown velocity and pressure, ν is the kinematicviscosity which is inversely proportional to the Reynolds number Re, and fis a known function representing external forcing. For simplicity we assumeno-slip boundary conditions and a steady flow (ut = 0) as well as small dataso as to be consistent with steady flow, but our analysis and results can beextended to other common boundary conditions and temporarily discretizedtransient flows with only minor modifications. Due to the wide applicabilityof (1)-(2) across science and engineering, many nonlinear solvers alreadyexist for it [14], with the most popular being Picard and Newton iterations[9]. Newton’s iteration converges quadratically once near a root, but requiresa good initial guess, especially for higher Re [9]. The Picard iteration for theNSE is linearly convergent, but also globally convergent and is much morerobust for higher Re [9, 25].

Herein, we consider Anderson acceleration (AA) of the iterated penaltyPicard (IPP) iteration. The IPP iteration is generally more efficient thanPicard for a single iteration since the linear solve is easier/cheaper, but com-pared to Picard it can be less robust and require more iterations if the penaltyparameter is not chosen correctly. The IPP scheme for the NSE is given in[5] as: Given uk, pk, solve for uk+1, pk+1 from

uk · ∇uk+1 +∇pk+1 − ν∆uk+1 = f, (3)

εpk+1 +∇ · uk+1 = εpk, (4)

where ε > 0 is a penalty parameter, generally taken small. The system(3)-(4) is equivalent to the velocity-only system

uk · ∇uk+1 − ε−1∇(∇ · uk+1)− ν∆uk+1 = f + ε−1

k∑j=0

∇(∇ · uj), (5)

which is used in [10, 31, 20, 28], and the pressure can be expressed in terms

of velocities, i.e. pk+1 = −ε−1k+1∑j=0

∇ · uj. There are several advantages to

using the IPP, including the pressure in the continuity equation allows for

2

Page 3: Enabling fast convergence of the iterated penalty Picard

circumventing the inf-sup condition on the velocity and pressure spaces [5],and Scott-Vogelius elements can be used without any mesh restriction andwill produce a pointwise divergence free solution along with many advantagesthis brings. Codina showed in [5] that a discretization of (3)-(4) convergeslinearly under a small data condition and sufficiently small ε, and has a betterconvergence rate if the penalty parameter ε is chosen sufficiently small.

Unfortunatly, with small ε the advantages of using (3)-(4) diminish sincethe same nonsymmetric saddle point system of the usual Picard iteration isrecovered as ε → 0, and if IPP is computed via (5), then ε < 1 can lead tolinear systems that most common preconditioned iterative linear solvers willhave difficulty resolving [29, 23]. Hence even though the IPP is theoreticallyeffective when ε is small, its use has largely died out over the past few decadessince small ε leads to the need for direct linear solvers, but direct linearsolvers are not effective on most large scale problems of modern interest.Hence, in an effort to show (properly enhanced with AA) IPP can still be avery competitive solver on any size problem, we completely avoid the notionof small ε and in our numerical tests use only ε = 1, where preconditionediterative methods found success on linear systems resembling (5) [12, 23, 6, 3].

This paper presents an analytical and numerical study of AA appliedto IPP, without assuming small ε. AA has recently been used to improveconvergence and robustness of solvers for a wide range of problems includ-ing various types of flow problems [19, 26, 27], geometry optimization [24],radiation diffusion and nuclear physics [1, 34], machine learning [11], molec-ular interaction [32], computing nearest correlation matrices [13], and manyothers e.g. [35, 16, 18, 19, 8, 36]. In particular, AA was used in [26] tomake the Picard iteration for (1)-(2) more robust with respect to Re andto converge significantly faster. Hence it is a natural and important nextstep to consider AA applied to IPP, which is a classical NSE solver but isnot always effective when ε < 1 due to linear solver difficulties. Herein weformulate IPP equipped with a finite element discretization as a fixed pointiteration uk+1 = G(uk), where G is a solution operator to discrete linear sys-tem. We then prove that G is continuously (Frechet) differentiable, allowingus to invoke the AA theory from [25], which implies AA will improve thelinear convergence rate of the iteration by a factor (less than 1) representingthe gain of the underlying AA optimization problem. Results of several nu-merical tests are also presented, which shows IPP using ε = 1 and enhancedwith AA can be a very effective solver for the NSE.

This paper is arranged as follows: In Section 2, we provide notation and

3

Page 4: Enabling fast convergence of the iterated penalty Picard

mathematical preliminaries on the finite element discretizations and AA.In section 3, we present the IPP method and prove associated fixed pointsolution operator properties. In section 4, we give the Anderson acceleratedIPP scheme and present a convergence result. In section 5, we report onthe results of several numerical tests, which demonstrate a significant (andsometimes dramatic) positive impact on the convergence.

2. Notation and preliminaries

We consider a domain Ω ⊂ Rd (d = 2, 3) that is open, connected, and withLipschitz boundary ∂Ω. The L2(Ω) norm and inner product will be denotedby ‖·‖ and (·, ·). Throughout this paper, it is understood by context whethera particular space is scalar or vector valued, and so we do not distinguishnotation.

The natural function spaces for velocity and pressure in this setting aregiven by

X :=H10 (Ω) = v ∈ L2(Ω) | ∇v ∈ L2(Ω), v|∂Ω = 0,

Q :=L20(Ω) = q ∈ L2(Ω) |

∫Ω

q dx = 0.

In the space X, the Poincare inequality holds [17]: there exists a constantCP > 0 depending only on Ω such that for any φ ∈ X,

‖φ‖ ≤ CP‖∇φ‖.

The dual space of X will be denoted by X ′, with norm ‖ · ‖−1. We define theskew-symmetric trilinear operator b∗ : X ×X ×X → R by

b∗(u, v, w) :=1

2(u · ∇v, w)− 1

2(u · ∇w, v),

which satisfies

b∗(u, v, w) ≤M‖∇u‖‖∇v‖‖∇w‖, (6)

for any u, v, w ∈ X, where M is a constant depending on |Ω| only, see [17].In our analysis, the following natural norm on (X,Q) arises

‖(v, q)‖X :=√ν‖∇v‖2 + ε‖q‖2. (7)

4

Page 5: Enabling fast convergence of the iterated penalty Picard

The FEM formulation of the steady NSE is given as follows: Find (u, p) ∈(Xh, Qh) such that

ν(∇u,∇v) + b∗(u, u, v)− (p,∇ · v) = (f, v),

(q,∇ · u) = 0,(8)

for all (v, q) ∈ (Xh, Qh). It is known that system (8) has solutions forany data, and those solutions are unique if the small data condition κ :=ν−2M‖f‖−1 < 1 is satisfied. Moreover, all solutions to (8) are bounded by‖∇u‖ ≤ ν−1‖f‖−1.

Assumption 2.1. We will assume in our analysis that κ < 1 and so that(8) is well-posed.

2.1. Discretization preliminaries

We denote with τh a conforming, shape-regular, and simplicial triangula-tion of Ω with h denoting the maximum element diameter of τh. We repre-sent the space of degree k globally continuous piecewise polynomials on τhby Pk(τh), and P disc

k (τh) the space of degree k piecewise polynomials that canbe discontinuous across elements.

We choose the discrete velocity space by Xh = X∩Pk(τh) and the pressurespace Qh = ∇ · Xh ⊆ Q. With this choice of spaces, the discrete versionsof (3)-(4) and (5) are equivalent, although in our computations we use only(5) and so the pressure space is never explicitly used. As discussed in [20],

pressure recovery via the L2 projection of −ε−1∞∑j=0

∇ · uj into Q ∩ Pk−1(τh)

will yield a continuous and optimally accurate pressure. Under certain meshstructures, the (Xh, Qh) pair will satisfy the discrete inf-sup condition [39,38, 21, 2, 11]. While inf-sup is important for small ε in the IPP, our focus ison ε = 1 and so this compatibility condition is not necessary for our analysisto hold.

2.2. Anderson acceleration

Anderson acceleration is an extrapolation method used to improve con-vergence of fixed-point iterations. Following [33, 35, 26], it may be stated asfollows, where Y is a normed vector space and g : Y → Y .

Algorithm 2.2 (Anderson iteration). Anderson acceleration with depth mand damping factors βk.

5

Page 6: Enabling fast convergence of the iterated penalty Picard

Step 0: Choose x0 ∈ Y.Step 1: Find w1 ∈ Y such that w1 = g(x0)− x0. Set x1 = x0 + w1.Step k: For k = 2, 3, . . . Set mk = mink − 1,m.

[a.] Find wk = g(xk−1)− xk−1.[b.] Solve the minimization problem for the Anderson coefficients αkj

mkj=1

min

∥∥∥∥∥(

1−mk∑j=1

αkj

)wk +

mk∑j=1

αkjwk−j

∥∥∥∥∥Y

. (9)

[c.] For damping factor 0 < βk ≤ 1, set

xk = (1−mk∑j=1

αkj )xk−1 +mk∑j=1

αkjxj−1 + βk

((1−

mk∑j=1

αkj )wk +mk∑j=1

αkjwk−j

),

(10)

where wj = g(xj−1) − xj−1 is the nonlinear residual (and also sometimesreferred to as the update step).

Note that depth m = 0 returns the original fixed-point iteration. Wedefine the optimization gain factor θk by

θk =

∥∥∥∥∥(

1−mk∑j=1

αkj

)wk +

mk∑j=1

αkjwk−j

∥∥∥∥∥Y

‖wk‖Y, (11)

representing the ratio gain of the minimization problem (9) using mk com-pared to the m = 0 (usual fixed point iteration) case. The gain factor θkplays a critical role in the general AA convergence theory [7, 25] that revealshow AA improves convergence: specifically, the acceleration reduces the first-order residual term by a factor of θk, but introduces higher-order terms intothe residual expansion.

The next two assumptions give sufficient conditions on the fixed pointoperator g for the theory of [25] to be applied.

Assumption 2.3. Assume g ∈ C1(Y ) has a fixed point x∗ in Y , and thereare positive constants C0 and C1 with

1. ‖g′(x)‖Y ≤ C0 for all x ∈ Y , and

6

Page 7: Enabling fast convergence of the iterated penalty Picard

2. ‖g′(x)− g′(y)‖Y ≤ C1 ‖x− y‖Y for all x, y ∈ Y .

Assumption 2.4. Assume there is a constant σ > 0 for which the differencesbetween consecutive residuals and iterates satisfy

‖wk+1 − wk‖Y ≥ σ‖xk − xk−1‖Y , k ≥ 1.

Assumption 2.4 is satisfied, for example, if g is contractive (i.e. if C0 < 1in Assumption 2.2). Other ways that the assumption is satisfied are dis-cussed in [25]. Under Assumptions 2.3 and 2.4, the following result from [25]produces a bound on the residual ‖wk+1‖ in terms of the previous residuals.

Theorem 2.5 (Pollock et al., 2021). Let Assumptions 2.3 and 2.4 hold, andsuppose the direction sines between each column j of matrix

Fj =(

(wj − wj−1) (wj−1 − wj−2) · · · (wj−mj+1 − wj−mj))

= (fj,i)(12)

and the subspace spanned by the preceeding columns satisfies | sin(fj,i, span fj,1,. . . , fj,i−1)| ≥ cs > 0, for j = 1, . . . ,mk. Then the residual wk+1 = g(xk)−xkfrom Algorithm 2.2 (depth m) satisfies the bound

‖wk+1‖Y ≤ ‖wk‖Y

(θk((1− βk) + C0βk) +

CC1

√1− θ2

k

2

(‖wk‖Y h(θk)

+ 2k−1∑

n=k−mk+1

(k − n) ‖wn‖Y h(θn) +mk ‖wk−mk‖Y h(θk−mk

)

)),

(13)

where each h(θj) ≤ C√

1− θ2j + βjθj, and C depends on cs and the implied

upper bound on the direction cosines.

The estimate (13) shows how the relative contributions from the lowerand higher order terms are determined by the gain factor θk: the lower orderterms are scaled by θk and the higher-order terms by

√1− θ2

k. The estimatereveals that while larger choices of m generally provide lower θk’s whichreduces the lower order contributions to the residual, it also incurs a costof both increased accumulation and weight of higher order terms. If recentresiduals are small then greater algorithmic depths m may be advantageous,

7

Page 8: Enabling fast convergence of the iterated penalty Picard

but if not, large m may slow or prevent convergence. As discussed in [25],this suggests that depth selection strategies that use small m early in theiteration and large m later may be advantageous in some settings.

This result supposes the sufficient linear independence of the columnsof each matrix Fj given by (12). As discussed in [25], this assumption canbe both verified and ensured, so long as the optimization problem is solvedin a norm induced by an inner-product. One can safeguard by sufficientlyreducing m or by removing columns of Fj where the desired inequality failsto hold, as demonstrated in [25].

3. The iterated penalty Picard method and associated solution op-erator properties

This section presents some properties of the IPP iteration and its associ-ated fixed point function.

3.1. Iterated penalty Picard method

This subsection studies some properties of IPP method. We begin bydefining its associated fixed point operator.

Definition 3.1. We define a mapping G : (Xh, Qh) → (Xh, Qh), G(u, p) =(G1(u, p), G2(u, p)) such that for any (v, q) ∈ (Xh, Qh)

ν(∇G1(u, p),∇v) + b∗(u,G1(u, p), v)− (G2(u, p),∇ · v) = (f, v),

ε(G2(u, p), q) + (∇ ·G1(u, p), q) = ε(p, q).(14)

Thus the IPP method for solving steady NSE can be rewritten now asfollows.

Algorithm 3.2. The iterated penalty method for solving steady NSE is

Step 0 Guess (u0, p0) ∈ (Xh, Qh).

Step k Find (uk+1, pk+1) = G(uk, pk).

We now show that G is well-defined, and will then prove smoothnessproperties for it.

8

Page 9: Enabling fast convergence of the iterated penalty Picard

Lemma 3.3. The operator G is well defined. Moreover,

‖∇G1(u, p)‖ ≤ ν−1‖f‖−1 +

√ε

ν‖p‖, (15)

for any (u, p) ∈ (Xh, Qh).

Proof. Given f, u, p, assume (u1, p1), (u2, p2) ∈ (Xh, Qh) are solutions to (14).Subtracting these two systems and letting eu = u1 − u2 and ep = p1 − p2

produces

ν(∇eu,∇v) + b∗(u, eu, v)− (ep,∇ · v) = 0,

ε(ep, q) + (∇ · eu, q) = 0.

Setting v = eu and q = ep, and adding these equations gives

ν‖∇eu‖2 + ε‖ep‖2 = 0,

which is satisfied if eu = ep = 0 implying the solution of (14) is unique.Because (14) is linear and finite dimensional, solutions must exist uniquely.Choosing v = G1(u, p) and q = G2(u, p) in (14) produces

‖G(u, p)‖2X = ν‖∇G1(u, p)‖2 + ε‖G2(u, p)‖2 ≤ ε‖p‖2 + ν−1‖f‖2

−1,

thanks to Cauchy-Schwarz and Young’s inequalities. This shows the solutionG(u, p) is bounded continuously by the data, proving (14) is well-posed andthus G is well-defined. Additionally, dropping the term ‖G2(u, p)‖2 andtaking square root yields (15).

Lemma 3.4. Under Assumption 2.1, let (u, p) be the solution of (8) and(uk, pk) be kth iteration from Algorithm 3.2. Then we have

‖(uk+1, pk+1)− (u, p)‖X < ‖(uk, pk)− (u, p)‖X . (16)

Proof. Subtracting equations (8) from (14) with (uk+1, pk+1) gives

ν(∇(uk+1 − u),∇v) + b∗(uk, uk+1 − u, v) + b∗(uk − u, u, v)

−(pk+1 − p,∇ · v) =0, (17)

ε(pk+1 − p, q) + (∇ · (uk+1 − u), q) =ε(pk − p, q). (18)

9

Page 10: Enabling fast convergence of the iterated penalty Picard

Adding these equations together and setting v = uk+1 − u, q = pk+1 − pproduces

ν‖∇(uk+1 − u)‖2 + ε‖pk+1 − p‖2

≤M‖∇(uk − u)‖‖∇u‖‖∇(uk+1 − u)‖+ ε‖pk − p‖‖pk+1 − p‖,

thanks to (6) and Cauchy-Schwarz inequality. Then, using ‖∇u‖ ≤ ν−1‖f‖−1,and Young’s inequality gives

ν‖∇(uk+1 − u)‖2 + ε‖pk+1 − p‖2 ≤ νκ2‖∇(uk − u)‖2 + ε‖pk − p‖2,

where κ := Mν−2‖f‖−1. Thanks to the Assumption 2.1 and taking thesquare root on both sides gives (16).

Lemma 3.4 shows us that Algorithm 3.2 converges when the small datacondition κ < 1 is satisfied. However, it tells us nothing when κ ≥ 1. WithAnderson acceleration, we can discuss the convergence behavior κ ≥ 1, seeTheorem 4.1. Next, we show the solution operator G is Lipschitz continuousand Frechet differentiable.

Lemma 3.5. For any (u, p), (w, z) ∈ (Xh, Qh), we have

‖G(u, p)−G(w, z)‖X ≤ CL‖(u, p)− (w, z)‖X , (19)

where CL = max1, κ+√ε/ν3M‖p‖.

Remark 3.6. Equation (19) tells us that Algorithm 3.2 converges linearlywith rate CL, which may be larger than the usual Picard method’s rate of κ[9]. This is not surprising, since (until now) IPP would never be used withsmall ε. In [5], for example, a smaller CL is found for IPP, but small εis assumed as is an inf-sup compatibility condition on the discrete spaces.We will show in section 4 that when IPP is enhanced with AA, the effectivelinear convergence rate will be much smaller than CL even when ε = 1 andwithout an assumption of an inf-sup condition, and the resulting solver isdemonstrated to be very effective in section 5.

Proof. From (14) with (u, p) and (w, z), we obtain

ν(∇(G1(u, p)−G1(w, z)),∇v) + b∗(u,G1(u, p)−G1(w, z), v)

+b∗(u− w,G1(w, z), v)− (G2(u, p)−G2(w, z),∇ · v) = 0,

ε(G2(u, p)−G2(w, z), q) + (q,∇ · (G1(u, p)−G1(w, z))) = ε(p− z, q).

10

Page 11: Enabling fast convergence of the iterated penalty Picard

Adding these equations and choosing v = G1(u, p) − G1(w, z) and q =G2(u, p)−G2(w, z) yields

ν‖∇(G1(u, p)−G1(w, z))‖2 + ε‖G2(u, p)−G2(w, z)‖2

= ε(p− z,G2(u, p)−G2(w, z))− b∗(u− w,G1(w, z), G1(u, p)−G1(w, z))

≤ ε‖p− z‖‖G2(u, p)−G2(w, z)‖+M‖∇(u− w)‖‖∇(G1(w, z))‖‖∇(G1(u, p)−G1(w, z))‖,

thanks to Cauchy-Schwarz and (6). Applying Young’s inequality provides

ν‖∇(G1(u, p)−G1(w, z))‖2 + ε‖G2(u, p)−G2(w, z)‖2

≤ ε‖p− z‖2 + ν−1M2‖∇G1(w, z)‖2‖∇(u− w)‖2,

which reduces to (19) due to (7) and (15).

Next, we define an operator G′ and then show that G′ is the Frechetderivative of operator G.

Definition 3.7. Given (u, p) ∈ (Xh, Qh), define an operator G′(u, p; ·, ·) :(Xh, Qh)→ (Xh, Qh) by

G′(u, p;h, s) := (G′1(u, p;h, s), G′2(u, p;h, s))

satisfying for all (h, s) ∈ (Xh, Qh)

ν(∇G′1(u, p;h, s),∇v) + b∗(h,G1(u, p), v) + b∗(u,G′1(u, p;h, s),v)

−(G′2(u, p;h, s),∇ · v) =0,

ε(G′2(u, p;h, s), q) + (q,∇ ·G′1(u, p;h, s)) =ε(s, q).

(20)

Lemma 3.8. G′ is well-defined and is the Frechet derivative of operator Gsatisfying

‖G′(u, p;h, s)‖X ≤ CL‖(h, s)‖X . (21)

and

‖G′(u+ h, p+ s;w, z)−G′(u, p;w, z)‖X ≤ CL‖(w, z)‖X‖(h, s)‖X (22)

where CL is defined in Lemma 3.5 and CL =√

10ν−3/2MCL.

11

Page 12: Enabling fast convergence of the iterated penalty Picard

Proof. The proof consists of three parts. First, we show that G′ is well-defined and (21) holds. Adding equations in (20) and setting v = G′1(u, p;h, s)and q = G′2(u, p;h, s) produces

ν‖∇G′1(u, p;h, s)‖2 + ε‖G′2(u, p;h, s)‖2 = ε(s,G′2(u, p;h, s))

− b∗(h,G1(u, p), G′1(u, p;h, s)).

Applying Cauchy-Schwarz, (6) and Young’s inequalities gives

ν‖∇G′1(u, p;h, s)‖2 + ε‖G′2(u, p;h, s)‖2

≤ ε‖s‖2 + ν−1M2‖∇h‖2‖∇G1(u, p)‖2

≤ ε‖s‖2 + ν−1M2(ν−1‖f‖−1 +

√ε

ν‖p‖)2‖∇h‖2

≤ C2L(ε‖s‖2 + ν‖∇h‖2),

thanks to (15), which leads to (21). Since the system (20) is linear and finitedimensional, (21) is sufficient to conclude that (20) is well-posed.

The second part shows that G′ is the Frechet derivative of G. Denoteη1 = G1(u+h, p+s)−G1(u, p)−G′1(u, p;h, s), η2 = G2(u+h, p+s)−G2(u, p)−G′2(u, p;h, s). Subtracting the sum of (20) and (14) from the equation (14)with (u+ h, p+ s) yields

ν(∇η1,∇v) + b∗(u, η1, v) + b∗(h,G1(u+ h, p+ s)−G1(u, p), v)

−(η2,∇ · v) = 0,

ε(η2, q) + (∇ · η1, q) = 0.

Setting v = η1, q = η2 produces

‖G(u+ h, p+ s)−G(u, p)−G′(u, p;h, s)‖2X

≤ ν−1M2‖∇h‖2‖∇(G1(u+ h, p+ s)−G1(u, p))‖2

≤ ν−3M2C2L‖(h, s)‖4

X ,

thanks to Young’s inequality and (19). Thus we have verified that G′ is theFrechet derivative of G.

Lastly, we show G′ is Lipschitz continuous over (Xh, Qh). For (u, p), (h, s),(w, z) ∈ (Xh, Qh), letting e1 := G′1(u + h, p + s;w, z) − G′1(u, p;w, z), e2 :=G′2(u+h, p+s;w, z)−G′2(u, p;w, z), and then subtracting (14) withG′(u, p;w, z)

12

Page 13: Enabling fast convergence of the iterated penalty Picard

from (14) with G′(u+ h, p+ s;w, z) yields

ν(∇e1,∇v) + b∗(w,G1(u+ h, p+ s)−G1(u, p), v) + b∗(h,G′1(u, p;w, z),v)

+b∗(u+ h, e1, v)− (e2,∇ · v) = 0,

ε(e2, q) + (∇ · e1, q) = 0.

Adding these equations and setting v = e1, q = e2 eliminates the fourth termand gives us

ν‖e1‖2 + ε‖e2‖2

= −b∗(w,G1(u+ h, p+ s)−G1(u, p), e1)− b∗(h,G′1(u, p;w, z), e1)

≤M‖∇w‖‖∇(G1(u+ h, p+ s)−G1(u, p)‖‖∇e1‖+M‖∇h‖‖∇G′1(u, p;w, z)‖‖∇e1‖,

thanks to (20). Now, applying the Young’s inequality and (21), we get

‖(e1, e2)‖2X ≤ 10ν−3M2C2

L‖(w, z)‖2X‖(h, s)‖2

X ,

which implies to (22) after taking square roots on both sides. This finishesthe proof.

Lastly, we show G satisfies Assumption 2.4.

Lemma 3.9. The following inequality holds

‖(G(uk, pk)− (uk, pk)

)−(G(uk−1, pk−1)− (uk−1, pk−1)

)‖X

≥ (1− CL)‖(uk, pk)− (uk−1, pk−1)‖X , (23)

Proof. If CL = 1, the result holds trivially. Otherwise, 0 ≤ CL < 1, and fromLemma 3.4, we have

‖(G(uk, pk)− (uk, pk)

)−(G(uk−1, pk−1)− (uk−1, pk−1)

)‖X

= ‖(G(uk, pk)−G(uk−1, pk−1)

)−((uk, pk)− (uk−1, pk−1)

)‖X

≥ (1− CL)‖(uk, pk)− (uk−1, pk−1)‖X .

13

Page 14: Enabling fast convergence of the iterated penalty Picard

4. The Anderson accelerated iterated penalty Picard scheme andits convergence

Now we present the Anderson accelerated iterated penalty Picard (AAIPP)algorithm and its convergence properties. Here, we continue the notationfrom section 3 that G is the IPP solution operator for a given set of problemdata.

Algorithm 4.1 (AAIPP). The AAIPP method with depth m for solving thesteady NSE is given by:

Step 0 Guess (u0, p0) ∈ (Xh, Qh).

Step 1 Compute (u1, p1) = G(u0, p0)Set residual (w1, z1) = (u1 − u0, p1 − p0) and (u1, p1) = (u1, p1).

Step k For k = 2, 3, . . . set mk = mink,m,

a) Find (uk, pk) = G(uk−1, pk−1) and set (wk, zk) = (uk − uk−1, pk −pk−1).

b) Find αkjmkj=0 minimizing

min∑mkj=0 α

kj =1

∥∥∥∥∥mk∑j=1

αkj (wk−j, zk−j)

∥∥∥∥∥X

. (24)

c) Update (uk, pk) = (1 − βk)

(mk∑j=0

αkj (uj, pj)

)+ βk

(mk∑j=0

αkj (uj, pj)

)where 0 < βk ≤ 1 is the damping factor.

For any step k with αkm = 0, one should decrease m and repeat Stepk, to avoid potential cyclic behavior. Under the assumption that αkm 6= 0and together with Lemmas 3.5, 3.8 and 3.9, we can invoke Theorem 2.5 toestablish the following convergence theory for AAIPP.

Theorem 4.1. For any step k > m with αkm 6= 0, the following bound holdsfor the AAIPP residual

‖(wk+1, zk+1)‖X ≤θk(1− βk + βkCL)‖(wk, zk)‖X

+ C√

1− θ2k‖(wk, zk)‖X

m∑j=1

‖(wk−j+1, zk−j+1)‖X ,

14

Page 15: Enabling fast convergence of the iterated penalty Picard

for the residual (wk, zk) from Algorithm 4.1, where θk is the gain from theoptimization problem, CL is the Lipschitz constant of G defined in Lemma3.5, and C depending on θk, βk, CL.

This theorem tells us that Algorithm 4.1 converges linearly with rateθk(1− βk + βkCL) < 1, which improves on Algorithm 3.2 due to the scalingθk and the damping factor βk.

5. Numerical tests

We now test AAIPP on the benchmark problems of 2D driven cavity,3D driven cavity, and (time dependent) Kelvin-Helmholtz instability. For alltests, the penalty parameter is chosen as ε = 1, and we use the velocity onlyformulation (5) for the IPP/AAIPP iteration. For the driven cavity problems,AA provides a clear positive impact, reducing total iterations and enablingconvergence at much higher Re than IPP without AA. For Kelvin-Helmholtz,AA significantly reduces the number of iterations needed at each time step.Overall, our results show that AAIPP with ε = 1 is an effective solver. Inall of our tests, we use a direct linear solver (i.e. MATLAB’s backslash)for the linear system solves of IPP/AAIPP as the problem sizes are suchthat direct solvers are more efficient. Since the velocity-only formulation isused, convergence is measured in the L2 norm instead of the X-norm, whichrequires the pressure. For the problems we consider (up to 1.3 million degreeof freedom in 3D) this remains a very robust and efficient linear solver. In allof our tests, the cost of applying AA was negligible compared with the linearsolve needed at each iteration, generally at least two orders of magnitudeless.

5.1. 2D driven cavity

We first test AAIPP for the steady NSE on a lid-driven cavity prob-lem. The domain of the problem is the unit square Ω = (0, 1)2 and weimpose Dirichlet boundary conditions by u|y=1 = (1, 0)T and u = 0 onthe other three sides. The discretization uses P2 elements on barycenter-refinement of uniform triangular mesh. We perform AAIPP with m =0 (no acceleration), 1, 2, 5 and 10 with varying Re, all with no relaxation(β = 1). In the following tests, convergence was declared if the velocityresidual fell below 10−8.

Convergence results for AAIPP using Re = 1000, 5000 and 10000 withvarying m when h = 1/128 are shown in Figure 1. It is observed that as

15

Page 16: Enabling fast convergence of the iterated penalty Picard

m increases, AA improves convergence. In particular, as Re increases, AAis observed to provide a very significant improvement; for Re = 10000, theiteration with m = 0 (i.e. IPP iteration) fails but with AA convergence isachieved quickly.

In Figure 2, convergence results by Anderson accelerated Picard (AAPi-card) iteration are shown (i.e. no iterated penalty, and solve the nonsym-metric saddle point linear system at each iteration). The same mesh is used,and here with (P2, P

disc1 ) Scott-Vogelius elements. Convergence behavior for

AAPicard is observed to be overall similar to that of AAIPP with ε = 1, withthe exception that for lower m = 1, 2 AAPicard performs slightly better thanAAIPP in terms of total number of iterations. Since each iteration of AAIPPis significantly cheaper than each iteration of AAPicard (generally speaking,since AAPicard must solve a more difficult linear system), these results showAAIPP with ε = 1 performs very well.

Figure 1: Shown above is convergence of AAIPP with varying Re and m, using meshwidth h = 1/128.

Figure 2: Shown above is convergence of AAPicard with varying Re and m, using meshwidth h = 1/128.

16

Page 17: Enabling fast convergence of the iterated penalty Picard

5.2. 3D driven cavity

We next test AAIPP on the 3D lid driven cavity. In this problem, the do-main is the unit cube, there is no forcing (f = 0), and homogeneous Dirichletboundary conditions are enforced on all walls and u = 〈1, 0, 0〉 on the movinglid. We compute with P3 elements on Alfeld split tetrahedral meshes with796,722 and 1,312,470 total degrees of freedom (dof) that are weighted to-wards the boundary by using a Chebychev grid before tetrahedralizing. Wetest AAIPP with varying Re, m, and relaxation parameter β.

Figure 4 shows a visualization of the computed solutions by AAIPP withm = 10 for Re = 100, 400 and 1000 on the 796,722 total dof, which are ingood qualitative agreement with reference results of Wong and Baker [37].In Figure 3, we compare centerline x-velocities for varying Re = 100, 400and 1000 on the same mesh with reference data of Wong and Baker [37] andobtain excellent agreement.

In table 1, the number of AAIPP iterations required for reducing thevelocity residual to fall below 10−8, for varying dof, Re, relaxation parameterβ and depth, within 300 iterations. In each cases, we observe that as mincreases, the number of iterations decreases, and the maximum m = k − 1is observed to be the best choice in all cases. Also, the relaxation parameterβ is observed to give improved results for larger Re. Thus, AAIPP withproperly chosen depth and relaxation can significantly improve the ability ofthe iteration to to converge.

We also tested AAIPP with Re = 1500, 2000, 2500 which would not con-verge within 1000 iterations without AA. Table 2 shows that with sufficientlylarge m and properly chosen relaxation parameter, AAIPP converges for evenhigher Re. This is especially interesting since the bifurcation point wherethis problem becomes time dependent is around Re ≈ 2000 [4], and so hereAAIPP is finding steady solutions in the time dependent regime.

17

Page 18: Enabling fast convergence of the iterated penalty Picard

Iterations for convergenceDoF Re β m=0 m=1 m=2 m=5 m=10 m=k-1796,722 100 1 > 300 > 300 138 98 92 75796,722 400 1 > 300 > 300 260 120 91 62796,722 1000 1 > 300 > 300 > 300 > 300 126 77796,722 1000 0.5 > 300 > 300 > 300 198 114 68796,722 1000 0.2 > 300 > 300 > 300 > 300 177 871,312,470 1000 1 > 300 > 300 > 300 > 300 140 831,312,470 1000 0.5 > 300 > 300 > 300 203 109 711,312,470 1000 0.2 > 300 > 300 > 300 > 300 174 88

Table 1: Shown above is the number of AAIPP iterations required for convergence in the3D driven-cavity tests, with varying dof, Re, damping factor β and depth.

Figure 3: Shown above is the centerline x-velocity plots for the 3D driven cavity simulationsat Re = 100, 400, 1000, using AAIPP with m = 10, β = 1 and 796,722 dof.

Re dof m β k

1500 1,312,470 k-1 0.5 1021500 1,312,470 k-1 0.3 1112000 1,312,470 k-1 0.5 1542000 1,312,470 k-1 0.3 1622500 1,312,470 100 0.5 3812500 1,312,470 100 0.3 > 10002500 1,312,470 k-1 0.5 371

Table 2: Shown above is the number of AAIPP iterations required for convergence in the3D driven-cavity tests, with varying dof, Re, damping factor β and depth of AA iterations.

18

Page 19: Enabling fast convergence of the iterated penalty Picard

Figure 4: Shown above are the midsliceplane plots for the 3D driven cavity simulations atRe = 100, 400 and 1000 by AAIPP with m = 10, β = 1 and 796,722 dof.

5.3. Kelvin-Helmholtz instability

For our last test we consider a benchmark problem from [30] for 2DKelvin-Helmholtz instability. This test is time dependent, and we apply theIPP/AAIPP method at each time step to solve the nonlinear problem. Thedomain is the unit square, with periodic boundary conditions at x = 0, 1.At y = 0, 1, the no penetration boundary condition u · n = 0 is stronglyenforced, along with a natural weak enforcement of the free-slip condition(−ν∇u · n)× n = 0. The initial condition is given by

u0(x, y) =

(tanh (28(2y − 1))

0

)+ 10−3

(∂yψ(x, y)−∂xψ(x, y)

),

where 128

is the initial vorticity thickness, 10−3 is a noise/scaling factor, and

ψ(x, y) = exp(−282(y − 0.5)2

)(cos(8πx) + cos(20πx)) .

19

Page 20: Enabling fast convergence of the iterated penalty Picard

Figure 5: Shown above are Re = 100 absolute vorticity contours for IPP solution, attimes t=0, 1, 2, 3, 4, 5 and 6 (left to right, top to bottom).

0 1 2 3 4 5t

0.42

0.43

0.44

0.45

0.46

0.47

0.48

Ene

rgy

AAIPP 128 m=k-1AAIPP 128 m=3SKEW 128Reference

0 1 2 3 4 5t

0

10

20

30

40

Ens

trop

hy

AAIPP 128 m=k-1AAIPP 128 m=3SKEW 128Reference

0 1 2 3 4 510-8

10-6

10-4

10-2

100

102||

div

u ||

AAIPP 128 m=k-1AAIPP 128 m=3AAIPP 128 m=1IPP 128 (AA with m=0)Skew 128

Figure 6: Shown above are Re = 100 energy, enstrophy and divergence error versus timefor the IPP/AAIPP, SKEW and reference solutions from [30].

The Reynolds number is defined by Re = 128ν

, and ν is defined by selectingRe. Solutions are computed for both Re = 100 and Re = 1000, up to endtime T = 5.

Define X = v ∈ H1(Ω), v(0, y) = v(1, y), v · n = 0 at y = 0, 1, andtake V = v ∈ X, ‖∇ · v‖ = 0 and Xh = P2(τh) ∩ X. The problem now

20

Page 21: Enabling fast convergence of the iterated penalty Picard

0 1 2 3 4 5t

0

5

10

15

20

25

30

35

IPP

itera

tions

AAIPP 128 m=k-1AAIPP 128 m=3AAIPP 128 m=1IPP 128 (AA with m=0)

Figure 7: Shown above are IPP/AAIPP total iteration counts at each time step, forRe = 100 and varying m.

becomes at each time step: Find un+1 ∈ Xh ∩ V satisfying

1

2∆t(3un+1, v) + (un+1 · ∇un+1, v) + ν(∇un+1,∇v)

= (f, v) +1

2∆t(4un − un−1, v) ∀v ∈ Xh ∩ V.

The IPP iteration to find each un+1 is thus analogous to what is used forsolving the steady NSE above including pressure recovery, but now with thetime derivative terms and using the previous time step solution as the initialguess.

For accuracy comparison, we also give results using the standard (nonlin-ear) BDF2 mixed formulation using skew-symmetry, which we will refer toas the SKEW formulation: Find un+1 ∈ Xh and pn+1

h ∈ Qh = P1(τh)∩L20(Ω)

satisfying

1

2∆t(un+1, v) + b∗(un+1, un+1, v) + (pn+1,∇ · v) + ν(∇un+1,∇v)

= (f, v) +1

2∆t(un − 2un−1, v),

(∇ · un+1, q) = 0,

for all (v, q) ∈ (Xh, Qh). The nonlinear problem for SKEW is resolved usingNewton’s method, and since Taylor-Hood elements are being used, a largedivergence error is expected.

For Re = 100, a h = 1128

uniform triangular mesh was used, togetherwith a time step of ∆t = 0.005. The tolerance for the nonlinear solver was

21

Page 22: Enabling fast convergence of the iterated penalty Picard

to reduce the H1 relative residual to 10−6. Simulations were performed withIPP, AAIPP with m = 1, 3, k− 1, and SKEW. The evolution of the flow canbe seen in figure 5 as absolute vorticity contours from the AAIPP solutions(all IPP and AAIPP solutions were visually indistinguishable), and thesematch those of the high resolution solution from [30] and solutions from [22].In addition to their plots of vorticity contours being the same, the IPP andAAIPP solutions yielded the same energy and enstrophy to five significantdigits (i.e. they all give the same solution, as expected). Figure 6 showsthe energy, enstrophy and divergence of the IPP/AAIPP/SKEW solutionsversus time, along with energy and enstrophy of the high resolution solu-tions from [30]. We observe that the energy and enstrophy solutions of IPP,AAIPP, and SKEW all match the high resolution reference solutions verywell. As expected, the IPP/AAIPP solutions have divergence error around10−5, which is consistent with a relative residual L2 stopping criteria of 10−8.SKEW, however, has a large divergence error that is O(10−2) despite a ratherfine mesh and essentially resolving the flow; since Taylor-Hood elements areused, this large divergence error is not surprising [15].

Figure 7 shows the number of iterations needed to converge IPP/AAIPPat each time step. We observe that using AAIPP with m = 1 offers no realimprovement over IPP (m = 0) in converging the iteration, however bothm = 3 and m = k − 1 both offer significant improvement. While at earlyiterations the larger m choices give modest improvement, by t = 4 the largerm choices cut the iteration count from 27 to 16 at each time step.

For Re = 1000, the IPP/AAIPP tests again used h = 1128

, using the samesetup as for the Re = 100 case and again solutions are compared with areference solution from [30] and the SKEW solution (but here the SKEWsolution uses a finer mesh with h = 1

196. As discussed in [22], even the 1

196

mesh is not fully resolved for this Reynolds number. The time step sizewas chosen to be ∆t = 0.001 for the IPP/AAIPP and SKEW simulations.IPP/AAIPP vorticity contours are plotted in figure 8, and match those fromthe reference solution qualitatively well (as discussed in [30], the evolutionof this flow in time is very sensitive and it is not clear what is the correctbehavior in time, even though it is clear how the flow develops spatiallyand how the eddies combine). Figure 9 shows the energy, enstrophy anddivergence of the computed and reference solutions (the reference solutionhas divergence on the order of roundoff error, and it is not shown), and weobserve that the 1/196 SKEW solution gives the worst predictions of energy,enstrophy and (not surprisingly) divergence, even though IPP/AAIPP uses

22

Page 23: Enabling fast convergence of the iterated penalty Picard

Figure 8: Shown above are Re = 1000 absolute vorticity contours for the iterated penaltysolution, at times t=0, 1, 2, 3, 4, 5 and 6 (left to right, top to bottom).

0 1 2 3 4 5t

0.47

0.475

0.48

0.485

Ene

rgy

AAIPP 128 m=k-1AAIPP 128 m=3AAIPP 128 m=1IPP 128 (AA with m=0)SKEW 196Reference

0 1 2 3 4 5t

10

20

30

40

50

Ens

trop

hy

AAIPP 128 m=k-1AAIPP 128 m=3AAIPP 128 m=1IPP 128 (AA with m=0)SKEW 196Reference

0 1 2 3 4 510-8

10-6

10-4

10-2

100

102||

div

u ||

AAIPP 128 m=k-1AAIPP 128 m=3AAIPP 128 m=1IPP 128 (AA with m=0)Skew 196

Figure 9: Shown above are Re = 1000 energy, enstrophy and divergence error versus timefor the IPP/AAIPP, SKEW and reference solutions from [30].

a significantly coarser mesh. Finally, the impact of AA on the IPP iterationis shown in figure 10, where we observe a significant reduction in iterationsat each time step, with larger m cutting the total number of iterations by afactor of four. Hence overall, the AAIPP iteration is effective and efficient,and produces accurate divergence-free solutions.

6. Conclusions

In this paper, we studied IPP with penalty parameter ε = 1, and showedthat while alone it is not an effective solver for the NSE, when used with AA

23

Page 24: Enabling fast convergence of the iterated penalty Picard

0 1 2 3 4 5t

0

10

20

30

40

50

60

IPP

itera

tions

AAIPP 128 m=k-1AAIPP 128 m=3AAIPP 128 m=1IPP 128 (AA with m=0)

Figure 10: Shown above are IPP/AAIPP total iteration counts at each time step, forRe = 1000 and varying m.

and large m it becomes very effective. We proved the IPP fixed point functionsatisfies regularity properties which allow the AA theory of [25] to be applied,which shows that AA applied to IPP will scale the linear convergence rate bythe ratio gain of the underlying AA optimization problem. We also showedresults of three test problems which revealed AAIPP with ε = 1 is a veryeffective solver, without any continuation method or pseudo time-stepping.While the classical IPP method is not commonly used for large scale NSEproblems due to difficulties with linear solvers when ε is small, our resultsherein suggest it may deserve a second look since using AA allows for thepenalty parameter ε = 1 to be used, which in turn will allow for effectivepreconditioned iterative linears solvers to be used such as those in [12, 3, 23].

References

[1] H. An, X. Jia, and H.F. Walker. Anderson acceleration and applicationto the three-temperature energy equations. Journal of ComputationalPhysics, 347:1–19, 2017.

[2] D. Arnold and J. Qin. Quadratic velocity/linear pressure Stokes ele-ments. In R. Vichnevetsky, D. Knight, and G. Richter, editors, Advancesin Computer Methods for Partial Differential Equations VII, pages 28–34. IMACS, 1992.

[3] S. Borm and S. Le Borne. H-LU factorization in preconditioners foraugmented Lagrangian and grad-div stabilized saddle point systems.International Journal for Numerical Methods in Fluids, 68:83–98, 2012.

24

Page 25: Enabling fast convergence of the iterated penalty Picard

[4] S.-H. Chiu, T.-W. Pan, J. He, A. Guo, and R. Glowinski. A numericalstudy of the transition to oscillatory flow in 3d lid-driven cubic cavityflows. 2016.

[5] R. Codina. An iterative penalty method for the finite element solution ofthe stationary Navier-Stokes equations. Computer Methods in AppliedMechanics and Engineering, 110:237–262, 1993.

[6] B. Cousins, S. Le Borne, A. Linke, L. Rebholz, and Z. Wang. Efficientlinear solvers for incompressible flow simulations using Scott-Vogeliusfinite elements. Numerical Methods for Partial Differential Equations,29:1217–1237, 2013.

[7] C. Evans, S. Pollock, L. Rebholz, and M. Xiao. A proof that Ander-son acceleration improves the convergence rate in linearly convergingfixed-point methods (but not in those converging quadratically). SIAMJournal on Numerical Analysis, 58:788–810, 2020.

[8] A. Fu, J. Zhang, and S. Boyd. Anderson accelerated Douglas-Rachfordsplitting. SIAM Journal on Scientific Computing, 42(6):A3560–A3583,2020.

[9] V. Girault and P.-A. Raviart. Finite element methods for Navier–Stokesequations: Theory and algorithms. Springer-Verlag, 1986.

[10] M. Gunzburger. Iterative penalty methods for the Stokes and Navier-Stokes equations. Proceedings from Finite Element Analysis in Fluidsconference, University of Alabama, Huntsville, pages 1040–1045, 1989.

[11] J. Guzman and L.R. Scott. The Scott-Vogelius finite elements revisited.Math. Comp., 88(316):515–529, 2019.

[12] T. Heister and G. Rapin. Efficient augmented Lagrangian-type precon-ditioning for the Oseen problem using grad-div stabilization. Int. J.Numer. Meth. Fluids, 71:118–134, 2013.

[13] N. Higham and N. Strabic. Anderson acceleration of the alternatingprojections method for computing the nearest correlation matrix. Nu-merical Algorithms, 72:1021–1042, 2016.

25

Page 26: Enabling fast convergence of the iterated penalty Picard

[14] V. John, P. Knobloch, and J. Novo. Finite elements for scalarconvection-dominated equations and incompressible flow problems-anever ending story? Computing and Visualization in Science, Com-puting and Visualization in Science, 12 2018.

[15] V. John, A. Linke, C. Merdon, M. Neilan, and L. G. Rebholz. On thedivergence constraint in mixed finite element methods for incompressibleflows. SIAM Review, 59(3):492–544, 2017.

[16] C.T. Kelley. Numerical methods for nonlinear equations. Acta Numer-ica, 27:207–287, 2018.

[17] W. Layton. An Introduction to the Numerical Analysis of Viscous In-compressible Flows. SIAM, Philadelphia, 2008.

[18] J. Loffeld and C. Woodward. Considerations on the implementation anduse of Anderson acceleration on distributed memory and GPU-basedparallel computers. Advances in the Mathematical Sciences, pages 417–436, 2016.

[19] P. A. Lott, H. F. Walker, C. S. Woodward, and U. M. Yang. An acceler-ated Picard method for nonlinear systems related to variably saturatedflow. Adv. Water Resour., 38:92–101, 2012.

[20] H. Morgan and L.R. Scott. Towards a unified finite element methodfor the stokes equations. SIAM Journal on Scientific Computing,40(1):A130–A141, 2018.

[21] M. Neilan and D. Sap. Stokes elements on cubic meshes yieldingdivergence-free approximations. Calcolo, 53(3):263–283, September2016.

[22] M. Olshanskii and L. Rebholz. Longer time accuracy for incompress-ible Navier-Stokes simulations with the EMAC formulation. ComputerMethods in Applied Mechanics and Engineering, 372(113369):1–17, 2020.

[23] M. A. Olshanskii and E. E. Tyrtyshnikov. Iterative Methods for LinearSystems: Theory and Applications. SIAM, Philadelphia, 2014.

[24] Y. Peng, B. Deng, J. Zhang, F. Geng, W. Qin, and L. Liu. Andersonacceleration for geometry optimization and physics simulation. ACMTransactions on Graphics, 42:1–14, 2018.

26

Page 27: Enabling fast convergence of the iterated penalty Picard

[25] S. Pollock and L. Rebholz. Anderson acceleration for contractive andnoncontractive operators. IMA Journal of Numerical Analysis, in press,2021.

[26] S. Pollock, L. Rebholz, and M. Xiao. Anderson-accelerated convergenceof Picard iterations for incompressible Navier-Stokes equations. SIAMJournal on Numerical Analysis, 57:615– 637, 2019.

[27] S. Pollock, L. Rebholz, and M. Xiao. Acceleration of nonlinear solversfor natural convection problems. Journal of Numerical Mathematics, toappear, 2021.

[28] L. Rebholz and M. Xiao. On reducing the splitting error in Yosidamethods for the Navier-Stokes equations with grad-div stabilization.Computer Methods in Applied Mechanics and Engineering, 294:259–277,2015.

[29] J. Schoberl. Robust multigrid methods for a parameter dependent prob-lem in primal variables. Numerische Mathematik, 84:97–119, 1999.

[30] P. Schroeder, V. John, P. Lederer, C. Lehrenfeld, G. Lube, andJ. Schoberl. On reference solutions and the sensitivity of the 2d Kelvin-Helmholtz instability problem. Computers and Mathematics with Ap-plications, 77(4):1010–1028, 2019.

[31] L.R. Scott. Kinetic energy flow instability with application to Couetteflow. Submitted, 2021.

[32] P. Stasiak and M.W. Matsen. Efficiency of pseudo-spectral algorithmswith anderson mixing for the SCFT of periodic block-copolymer phases.Eur. Phys. J. E, 34:110:1–9, 2011.

[33] A. Toth and C. T. Kelley. Convergence analysis for Anderson accelera-tion. SIAM J. Numer. Anal., 53(2):805–819, 2015.

[34] A. Toth, C.T. Kelley, S. Slattery, S. Hamilton, K. Clarno, andR. Pawlowski. Analysis of Anderson acceleration on a simplified neutron-ics/thermal hydraulics system. Proceedings of the ANS MC2015 JointInternational Conference on Mathematics and Computation (M&C), Su-percomputing in Nuclear Applications (SNA) and the Monte Carlo (MC)Method, ANS MC2015 CD:1–12, 2015.

27

Page 28: Enabling fast convergence of the iterated penalty Picard

[35] H. F. Walker and P. Ni. Anderson acceleration for fixed-point iterations.SIAM J. Numer. Anal., 49(4):1715–1735, 2011.

[36] D. Wicht, M. Schneider, and T. Bohlke. Anderson-accelerated polariza-tion schemes for fast Fourier transform-based computational homoge-nization. International Journal for Numerical Methods in Engineering,to appear, 2021.

[37] K.L. Wong and A.J. Baker. A 3d incompressible Navier–Stokes veloc-ity–vorticity weak form finite element algorithm. International Journalfor Numerical Methods in Fluids, 38(2):99–123, 2002.

[38] S. Zhang. Divergence-free finite elements on tetrahedral grids for k ≥ 6.Math. Comp., 80(274):669–695, 2011.

[39] S. Zhang. Quadratic divergence-free finite elements on Powell-Sabintetrahedral grids. Calcolo, 48(3):211–244, 2011.

28