scalable, adaptive methods for forward and inverse...

1
CCGO CENTER FOR COMPUTATIONAL GEOSCIENCES & OPTIMIZATION Scalable, Adaptive Methods for Forward and Inverse Modeling of Continental-Scale Ice Sheet Flow Tobin Isaac h[email protected]i Center for Computational Geosciences & Optimization, The University of Texas at Austin SC14, New Orleans (A version of this poster is available at http://users.ices.utexas.edu/~tisaac/.) The Bayesian Inverse Problem Climate predictions depend on ice sheet dynamics From the IPCC Fourth Assessment Report: "Dynamical processes related to ice flow not included in current models but suggested by recent observations could increase the vulnerability of the ice sheets to warming, increasing future sea level rise. Understanding of these processes is limited and there is no consensus on their magnitude." [Summary for Policymakers: Projections of Future Changes in Climate] "[T]he uncertainty in the projections of the land ice contributions [to sea level rise] is dominated by the various uncertainties in the land ice models themselves ... rather than in the temperature projections." [MSC + 07] Ice sheets exhibit complex, localized flows [RMS11] Landsat mosaic, Pine Island Glacier, Antarctica (NASA Earth Observatory) Ice stream formation depends on basal conditions: topography, sediment, till, and subglacial hydrology. Bayesian inversion for basal parameters State equations: S (u , p ; m)= 0 Equations of motion (Stokes) -· [2μ(u ) D(u ) - I p ]= ρg , [D(u )= 1 2 (u + u T )] · u = 0, Nonlinear constitutive relation μ(u )= B D(u ) 1-n 2n II , [D(u ) II = 1 2 tr(D(u ) 2 )] Boundary conditions σ n| Γ t = 0, u · n| Γ b = 0, T k (σ n + β u ) | Γ b = 0. [β = exp{m}] The state variables (velocity and pressure) are implicitly defined by a PDE. I Solution-dependent effective viscosity I Shear-thinning flow (typically n = 3) I The parameter fields determines the coefficient of a Robin-type boundary condition in the tangential directions at the base of the ice sheet I An implicit system of PDEs that is 3D, nonlinear, indefinite, with heterogeneous coefficients, a high-aspect-ratio domain, complex geometry and boundary conditions. Observations: bu d obs = {u obs (x 0 ),..., u obs (x m-1 )}, bu = {u (x 0 ),... u (x m-1 )}. The observation operator restricts the velocity to the points where the observations are made. Prior distributions and Bayes’ rule π obs (d obs |u )= N (bu , C obs ) π prior (m)= N (m prior , C prior ) π post (m|d obs ) π obs (d obs |u (m)) π prior (m) We assume Gaussian prior distributions of the parameter and the observations. The posterior distribution of parameters given observations is determined by Bayes’ rule. The MAP optimization problem min m J (m) := 1 2 kbu (m) - d obs k 2 C -1 obs + 1 2 km - m prior k 2 C -1 prior , where u (m) solves S (u , p ; m)= 0. The maximum a posteriori estimate m MAP solves a PDE-constrained optimization problem. I Parameter space O (10 5 - 10 6 ) I State space O (10 7 - 10 9 ) I Evaluating J (m) requires non-linear PDE solve I Evaluating J 0 (m) requires additional adjoint linear PDE solve I Applying J 00 (m) ˆ m requires two sequential linear PDE solves A Scalable Stokes Solver Discretization/solver co-design [ISG14] High-order, inf-sup stable discretization Velocity: I continuous piecewise tensor polynomials [Q k ] 3 , k > 1; I Gauss-Lobatto basis, Gauss-Legendre quadrature; I fast matrix-free calculations, high flop/memop ratio [Bro10]. Pressure: I discontinuous, piecewise Q k -2 , Gauss-Legendre basis; I inf-sup constant is o (k -1 ); I local, element-wise conservation of mass; I robust at high aspect-ratios [TS03]; Nonlinear Solver We solve S (u , p ; m)= 0 using an inexact Newton-Krylov method [EW96] with backtracking. I Newton update system is S {u , p } (u , p ; m)[ ˜ u , ˜ p ]= -S (u , p ; m), which is a Stokes system with a fourth-order tensor for effective viscosity, μ 0 (u ) = I - 1 - n 2n D(u ) D(u ) D(u ) II . I Inexact Newton stresses preconditioner construction/reconstruction because early Krylov solves are inaccurate. Linear Solver The Newton step Jacobian is a symmetric saddle point operator S (u )= F (u ; m) B T B 0 . I Solve using right-preconditioned (F)GMRES with block preconditioner ˜ S = ˜ F 0 ˜ S ! , where ˜ S is a diagonal approximation to the Schur complement S = -BF -1 B T . I ˜ F is a low-order sparse approximation of denser high-order matrix F , which is never stored. I Sparse approximation at non-conforming element boundaries introduces additional complexity. I ˜ F (second-order elliptic system with rigid-body motions in near-nullspace) is preconditioned with smoothed-aggregation AMG v-cycle. I Pointwise smoothers (Jacobi, Gauss-Seidel) are less effective on high-aspect-ratio finite elements: we use Chebyshev(1)/ASM(i )/IC(j ) or CG(2)/ASM(i )/IC(j ) smoothers. [CG = conjugate gradient, ASM = additive Schwarz method, IC = incomplete Cholesky factorization]. Performance and Scalability Antarctic ice sheet mesh, multiple values of k 0 20 40 60 80 100 120 140 160 180 200 10 -12 10 -10 10 -8 10 -6 10 -4 10 -2 10 0 Krylov iteration j kr j k/kr 0 k: Antarctica, CG(2)/ASM(1)/IC(0) k = 3, Krylov k = 3, Newton [18] k = 4, Krylov k = 4, Newton [24] k = 5, Krylov k = 5, Newton [24] Antarctic ice sheet problem: convergence of the incomplete Newton-Krylov solver for regularized geometry. The mesh is distributed over 512 processes. The bracketed numbers are the total number of nonlinear iterations. Problem sizes are 22M, 52M, and 101M dofs for k = 3, 4, 5. Weak scalability, linear solver P N solve eff. #K matvec eff. apply eff. setup eff. 128 11.8M 14.6 1.00 47 0.0223 1.00 0.246 1.00 3.78 1.00 1024 82.6M 15.9 0.92 51 0.0217 1.03 0.245 1.00 4.32 0.88 8192 615M 28.7 0.51 53 0.0234 0.95 0.258 0.95 11.7 0.32 Antarctic ice sheet problem: weak scaling of the linear solver for k = 3 for regularized geometry. The smoother for the (1,1)-block AMG v-cycle uses ASM(0): all other solver parameters are the same as above. We report the convergence of the linear Stokes problem with constant μ, which we solve to a relative tolerance of 10 -15 . Large Scale Bayesian Inversion and UQ Deterministic model problem [PZS + 12] Setup x z 0 H L Ω Γ l Γ r Γ b Γ t α I Periodic domain I Inverted for linear and nonlinear rheology I Variable domain length L, signal-to-noise ratio I Regularization parameter α chosen by Morozov’s criterion: misfit approximately matches noise in magnitude Example: L = 40, SNR = 100 0 0.2 0.4 0.6 0.8 1 10 20 30 40 50 60 Normalized x Velocity (m a -1 ) Linear Nonlinear 0 0.2 0.4 0.6 0.8 1 10 20 30 40 50 60 Normalized x Velocity (m a -1 ) Linear Nonlinear 0 0.2 0.4 0.6 0.8 1 10 20 30 40 50 60 Normalized x Velocity (m a -1 ) Linear Nonlinear Slice of the horizontal surface velocity from linear and nonlinear rheology (left), noisy data (middle), and reconstructed velocity (right). Reconstruction table linear rheology nonlinear rheology L Signal-to-noise ratio Signal-to-noise ratio 500 100 20 10 500 100 20 10 5 0.031 0.112 * * * * * * 10 0.022 0.049 0.157 0.310 0.032 0.165 * * 20 0.017 0.041 0.109 0.177 0.026 0.045 0.118 0.285 40 0.013 0.039 0.125 0.191 0.015 0.036 0.097 0.149 I Sharper boundary layer in nonlinear rheology weakens response I Recovery of high frequency (small L) variations in β much worse for nonlinear rheology I Noise affects recovery more for nonlinear rheology Gauss-Newton versus Nonlinear CG mesh #dof #β NCG (FR/PR) * Gauss-Newton ** Stokes param #iter #PDE #iter #PDE 10×10×2 6978 121 54/57 622/676 10 (41) 104 20×20×2 26538 441 57/58 620/572 10 (43) 108 40×40×2 103458 1681 99/87 674/648 10 (48) 118 80×80×2 408498 6561 125/102 664/623 10 (46) 114 * Nonlinear Conjugate Gradient (FR (Fletcher-Reeves) / PR (Polak-Ribière)) ** Inexact Gauss-Newton-CG *** Iteration is terminated when the norm of the gradient drops below a relative tolerance of 10 -5 . I Number of Newton inversion iterations and required PDE solves are insensitive to the number of inversion parameters. Continental-scale UQ [IPSG14] Priors C obs (u , v )= Z Γ top u · v |d obs | 2 + ds C prior (m, n) Z Γ base (-γ Δm + δ m)(-γ Δn + δ n) ds I Misfit weighed by observed velocities. I Higher-order derivatives required so that Gaussian measure converges in the limit as h 0. Observations and MAP reconstruction Prior and posterior samples Parallel Adaptive Mesh Refinement p4est: parallel quadtree/octree AMR p4est.org The p4est library provides scalable parallel adaptive mesh refinement using the forest-of-quadtrees/octrees paradigm: 0 10 20 30 40 50 60 70 80 90 100 12 60 432 3444 27540 220320 Percentage of runtime Number of CPU cores Partition Balance Ghost Nodes I hierarchically refined quadrants/octants; I flat storage (leaves only); I fully-distributed by space-filling-curve partitioning; I refinement, coarsening, and repartitioning have straight-forward, low-communication, easily scalable algorithms. The bottlenecks are mesh-wide local-topology operations: mesh improvement (enforcing a 2:1 balance condition) and mesh description (converting to adjacency-graph mesh formats). 2:1 Balance [IBG12] 2:1 conditions are local requirements: limiting the sizes of neighbors. Scalable enforcement in parallel requires kernels for action at a distance: o r a? ¯ δ x ¯ δ y Function λ( ¯ δ ) such that size(a)= blog 2 λc k 1D 2D 3D 1 ¯ δ ¯ δ x + ¯ δ y Carry3( ¯ δ y + ¯ δ z , ¯ δ z + ¯ δ x , ¯ δ x + ¯ δ y ) 2 max{ ¯ δ x , ¯ δ y } Carry3( ¯ δ x , ¯ δ y , ¯ δ z ) 3 max{ ¯ δ x , ¯ δ y , ¯ δ z } Carry3(α,β,γ ) max{α,β,γ,α + β + γ - (α|β |γ )} Carry3(α,β,γ ) max{α,β,γ,α + β + γ - (α|β |γ )}. 0 1 2 3 4 5 6 12 96 768 6144 49152 112128 Seconds per (million elements / core) Number of CPU cores Old New Deriving and implementing these kernels (and other improvements) led to a 4-5x speedup: Conversion to adjacency-graph [IBWG14] Exploiting the hierarchical structure of quadtrees and octrees via recursive algorithms leads to 2-3x speedup on deep-cache architectures. Old search/hash-based algorithm (top) vs. New recursion-based algorithm (bottom) metrics, conversion to FE mesh (Ivy Bridge) N runtime (ms) instructions branch misses cache misses 4.6 × 10 3 9.5 × 10 0 4.3 × 10 7 2.1 × 10 5 2.2 × 10 4 1.0 × 10 1 3.7 × 10 7 5.3 × 10 4 1.1 × 10 4 3.9 × 10 4 8.6 × 10 1 4.2 × 10 8 1.7 × 10 6 2.2 × 10 5 4.0 × 10 1 3.1 × 10 8 3.6 × 10 5 5.1 × 10 4 3.2 × 10 5 8.4 × 10 2 3.7 × 10 9 1.3 × 10 7 4.8 × 10 6 3.5 × 10 2 2.5 × 10 9 2.7 × 10 6 4.5 × 10 5 2.6 × 10 6 8.0 × 10 3 3.3 × 10 10 1.0 × 10 8 6.1 × 10 7 2.8 × 10 3 2.0 × 10 10 2.2 × 10 7 5.4 × 10 6 p4est in ice sheet modeling I (left) Refinement is often highly localized (e.g. grounding lines and ice streams). I (right) specialized hybrid quadtree/octree scheme decouples horizontal and vertical refinement (unlike isotropic octree refinement). (A presentation on the hybrid scheme is found at http://users.ices.utexas.edu/~tisaac/slides/). [Bro10] Jed Brown. Efficient nonlinear solvers for nodal high-order finite elements in 3D. Journal of Scientific Computing, 45(1-3):48–63, 2010. [EW96] S. C. Eisenstat and H. F. Walker. Choosing the forcing terms in an inexact Newton method. SIAM Journal on Scientific Computing, 17:16–32, 1996. [IBG12] Tobin Isaac, Carsten Burstedde, and Omar Ghattas. Low-cost parallel algorithms for 2:1 octree balance. In Proceedings of the 26th IEEE International Parallel & Distributed Processing Symposium. IEEE, 2012. [IBWG14] Tobin Isaac, Carsten Burstedde, Lucas C. Wilcox, and Omar Ghattas. Recursive algorithms for distributed forests of octrees. In preparation, 2014. [IPSG14] Tobin Isaac, Noemi Petra, Georg Stadler, and Omar Ghattas. From data to prediction with quantified uncertainties: Scalable parallel algorithms and applications to the dynamics of the Antarctic ice sheet. Journal of Computational Physics (submitted), 2014. [ISG14] Tobin Isaac, Georg Stadler, and Omar Ghattas. Solution of nonlinear Stokes equations discretized by high-order finite elements on nonconforming and anisotropic meshes, with application to ice sheet dynamics. SIAM Journal on Scientific Computing (submitted), 2014. [MSC + 07] G. A. Meehl, T. F. Stocker, W. D. Collins, A. T. Friedlingstein, A. T. Gaye, J. M. Gregory, A. Kitoh, R. Knutti, J. M. Murphy, A. Noda, S. C. B. Raper, I. G. Watterson, A. J. Weaver, and Z.-C. Zhao. Global climate projections. In Climate Change 2007: The Physical Science Basis. Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change, pages 747–845. Cambridge University Press, 2007. [PZS + 12] Noemi Petra, Hongyu Zhu, Georg Stadler, Thomas J. R. Hughes, and Omar Ghattas. An inexact Gauss-Newton method for inversion of basal sliding and rheology parameters in a nonlinear Stokes ice sheet model. Journal of Glaciology, 58(211):889–903, 2012. [RMS11] E Rignot, J Mouginot, and B Scheuchl. Ice flow of the Antarctic ice sheet. Science, 333(6048):1427–1430, 2011. [TS03] A. Toselli and C. Schwab. Mixed hp-finite element approximations on geometric edge and boundary layer meshes in three dimensions. Numerische Mathematik, 94(4):771–801, 2003.

Upload: others

Post on 09-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scalable, Adaptive Methods for Forward and Inverse ...sc14.supercomputing.org/.../drs136s2-file7.pdfSpacing Around Wordmark.5 H.5 H H.5 H.5 H CCGO CENTER FOR COMPUTATIONAL GEOSCIENCES&OPTIMIZATION

Spacing Around Wordmark

.5 H

.5 H

H

.5 H

.5 H

CCGOCENTER FOR COMPUTATIONALGEOSCIENCES & OPTIMIZATION

Scalable, Adaptive Methods for Forward and Inverse Modeling of Continental-Scale Ice Sheet FlowTobin Isaac 〈[email protected]〉Center for Computational Geosciences & Optimization, The University of Texas at AustinSC14, New Orleans (A version of this poster is available at http://users.ices.utexas.edu/~tisaac/.)

The Bayesian Inverse ProblemClimate predictions depend on ice sheet dynamics

From the IPCC Fourth Assessment Report:

"Dynamical processes related to ice flow not included in current models but suggested by recentobservations could increase the vulnerability of the ice sheets to warming, increasing future sea level rise.Understanding of these processes is limited and there is no consensus on their magnitude." [Summary forPolicymakers: Projections of Future Changes in Climate]

"[T]he uncertainty in the projections of the land ice contributions [to sea level rise] is dominated by thevarious uncertainties in the land ice models themselves ... rather than in the temperature projections."[MSC+07]

Ice sheets exhibit complex, localized flows

on

Augu

st 2

2, 2

011

ww

w.s

cien

cem

ag.o

rgD

ownl

oade

d fro

m

[RMS11]

Landsat mosaic, Pine Island Glacier, Antarctica(NASA Earth Observatory)

Ice stream formation depends on basal conditions: topography, sediment, till, and subglacial hydrology.

Bayesian inversion for basal parameters

State equations: S(u,p; m) = 0

Equations of motion (Stokes)−∇ · [2µ(u) D(u)− Ip] = ρg, [D(u) = 1

2(∇u + ∇uT )]

∇ · u = 0,Nonlinear constitutive relation

µ(u) = B D(u)1−n2n

II , [D(u)II = 12tr(D(u)2)]

Boundary conditionsσn|Γt = 0, u · n|Γb

= 0,T‖ (σn + βu) |Γb

= 0. [β = expm]

The state variables (velocity andpressure) are implicitly defined by a PDE.I Solution-dependent effective viscosityI Shear-thinning flow (typically n = 3)I The parameter fields determines the

coefficient of a Robin-type boundarycondition in the tangential directions atthe base of the ice sheet

I An implicit system of PDEs that is 3D,nonlinear, indefinite, withheterogeneous coefficients, ahigh-aspect-ratio domain, complexgeometry and boundary conditions.

Observations: bu

dobs = uobs(x0), . . . ,uobs(xm−1),bu = u(x0), . . .u(xm−1).

The observation operator restricts thevelocity to the points where theobservations are made.

Prior distributions and Bayes’ rule

πobs(dobs|u) = N (bu, Cobs)

πprior(m) = N (mprior, Cprior)

πpost(m|dobs) ∝ πobs(dobs|u(m)) πprior(m)

We assume Gaussian prior distributionsof the parameter and the observations.The posterior distribution of parametersgiven observations is determined byBayes’ rule.

The MAP optimization problem

minmJ (m) := 1

2‖bu(m)− dobs‖2C−1obs

+ 12‖m −mprior‖2C−1

prior,

where u(m) solvesS(u,p; m) = 0.

The maximum a posteriori estimatemMAP solves a PDE-constrainedoptimization problem.I Parameter space O(105 − 106)

I State space O(107 − 109)

I Evaluating J (m) requires non-linearPDE solve

I Evaluating J ′(m) requires additionaladjoint linear PDE solve

I Applying J ′′(m)m requires twosequential linear PDE solves

A Scalable Stokes Solver

Discretization/solver co-design [ISG14]

High-order, inf-sup stable discretization

Velocity:I continuous piecewise tensor polynomials [Qk ]3,

k > 1;I Gauss-Lobatto basis, Gauss-Legendre quadrature;I fast matrix-free calculations, high flop/memop ratio

[Bro10].

Pressure:I discontinuous, piecewise Qk−2, Gauss-Legendre

basis;I inf-sup constant is o(k−1);I local, element-wise conservation of mass;I robust at high aspect-ratios [TS03];

Nonlinear Solver

We solve S(u,p; m) = 0 using an inexact Newton-Krylov method [EW96] with backtracking.I Newton update system is

∂S∂u,p

(u,p; m)[u, p] = −S(u,p; m),

which is a Stokes system with a fourth-order tensor for effective viscosity,

µ′(u) =

I − 1− n

2nD(u)⊗ D(u)

D(u)II

.

I Inexact Newton stresses preconditioner construction/reconstruction because early Krylov solves are inaccurate.

Linear Solver

The Newton step Jacobian is a symmetric saddle point operator

S(u, β) =

(F (u; m) BT

B 0

).

I Solve using right-preconditioned (F)GMRES with block preconditioner

S =

(F ∇0 S

),

where S is a diagonal approximation to the Schur complement S = −BF−1BT .I F is a low-order sparse approximation of denser high-order matrix F , which is never stored.

I Sparse approximation at non-conforming element boundaries introduces additional complexity.I F (second-order elliptic system with rigid-body motions in near-nullspace) is preconditioned with

smoothed-aggregation AMG v-cycle.I Pointwise smoothers (Jacobi, Gauss-Seidel) are less effective on high-aspect-ratio finite elements: we use

Chebyshev(1)/ASM(i)/IC(j) or CG(2)/ASM(i)/IC(j) smoothers. [CG = conjugate gradient, ASM = additiveSchwarz method, IC = incomplete Cholesky factorization].

Performance and Scalability

Antarctic ice sheet mesh, multiple values of k

0 20 40 60 80 100 120 140 160 180 20010−1210−1010−810−610−410−2

100

Krylov iteration j

‖rj‖/‖r0‖: Antarctica, CG(2)/ASM(1)/IC(0)

k = 3, Krylovk = 3, Newton [18]k = 4, Krylovk = 4, Newton [24]k = 5, Krylovk = 5, Newton [24]

Antarctic ice sheet problem:convergence of the incompleteNewton-Krylov solver for regularizedgeometry. The mesh is distributed over512 processes. The bracketednumbers are the total number ofnonlinear iterations. Problem sizes are22M, 52M, and 101M dofs fork = 3,4,5.

Weak scalability, linear solver

P N solve eff. #K matvec eff. apply eff. setup eff.128 11.8M 14.6 1.00 47 0.0223 1.00 0.246 1.00 3.78 1.001024 82.6M 15.9 0.92 51 0.0217 1.03 0.245 1.00 4.32 0.888192 615M 28.7 0.51 53 0.0234 0.95 0.258 0.95 11.7 0.32

Antarctic ice sheet problem: weakscaling of the linear solver for k = 3 forregularized geometry. The smootherfor the (1,1)-block AMG v-cycle usesASM(0): all other solver parametersare the same as above. We report theconvergence of the linear Stokesproblem with constant µ, which wesolve to a relative tolerance of 10−15.

Large Scale Bayesian Inversion and UQ

Deterministic model problem [PZS+12]

Setup

x

z

0

H

L

Ω

Γl

Γr

Γb

Γt

α

I Periodic domainI Inverted for linear and nonlinear rheologyI Variable domain length L, signal-to-noise

ratioI Regularization parameter α chosen by

Morozov’s criterion: misfit approximatelymatches noise in magnitude

Example: L = 40, SNR = 100

0 0.2 0.4 0.6 0.8 110

20

30

40

50

60

Normalized x

Velo

city (

m a

−1)

Linear

Nonlinear

0 0.2 0.4 0.6 0.8 110

20

30

40

50

60

Normalized x

Velo

city (

m a

−1)

Linear

Nonlinear

0 0.2 0.4 0.6 0.8 110

20

30

40

50

60

Normalized x

Velo

city (

m a

−1)

Linear

Nonlinear

Slice of the horizontal surface velocity fromlinear and nonlinear rheology (left), noisydata (middle), and reconstructed velocity(right).

Reconstruction table

linear rheology nonlinear rheology

L Signal-to-noise ratio Signal-to-noise ratio

500 100 20 10 500 100 20 105 0.031 0.112 ∗ ∗ ∗ ∗ ∗ ∗

10 0.022 0.049 0.157 0.310 0.032 0.165 ∗ ∗20 0.017 0.041 0.109 0.177 0.026 0.045 0.118 0.28540 0.013 0.039 0.125 0.191 0.015 0.036 0.097 0.149

I Sharper boundary layer in nonlinearrheology weakens response

I Recovery of high frequency (small L)variations in β much worse for nonlinearrheology

I Noise affects recovery more for nonlinearrheology

Gauss-Newton versus Nonlinear CG

mesh #dof #β NCG (FR/PR)∗ Gauss-Newton∗∗

Stokes param #iter #PDE #iter #PDE10×10×2 6978 121 54/57 622/676 10 (41) 10420×20×2 26538 441 57/58 620/572 10 (43) 10840×40×2 103458 1681 99/87 674/648 10 (48) 11880×80×2 408498 6561 125/102 664/623 10 (46) 114

∗ Nonlinear Conjugate Gradient (FR(Fletcher-Reeves) / PR (Polak-Ribière))

∗∗ Inexact Gauss-Newton-CG∗∗∗ Iteration is terminated when the norm of

the gradient drops below a relativetolerance of 10−5.

I Number of Newton inversion iterations andrequired PDE solves are insensitive to thenumber of inversion parameters.

Continental-scale UQ [IPSG14]

Priors

Cobs(u,v) =

∫Γtop

u · v|dobs|2 + ε

ds

Cprior(m,n) ≈∫

Γbase

(−γ∆m + δm)(−γ∆n + δn) ds

I Misfit weighed by observed velocities.I Higher-order derivatives required so that

Gaussian measure converges in the limitas h→ 0.

Observations and MAP reconstruction

Prior and posterior samples

Parallel Adaptive Mesh Refinementp4est: parallel quadtree/octree AMR p4est.org

The p4est library provides scalable parallel adaptive mesh refinement using theforest-of-quadtrees/octrees paradigm:

0

10

20

30

40

50

60

70

80

90

100

12 60 432 3444 27540 220320

Per

centa

ge

ofru

ntim

e

Number of CPU cores

Partition Balance Ghost Nodes

I hierarchically refined quadrants/octants;I flat storage (leaves only);I fully-distributed by space-filling-curve

partitioning;I refinement, coarsening, and repartitioning have

straight-forward, low-communication, easilyscalable algorithms.

The bottlenecks are mesh-wide local-topology operations: mesh improvement(enforcing a 2:1 balance condition) and mesh description (converting toadjacency-graph mesh formats).

2:1 Balance [IBG12]

2:1 conditions are local requirements: limiting the sizes of neighbors. Scalableenforcement in parallel requires kernels for action at a distance:

o

r

a?

δx

δy

Function λ(δ) such that size(a) = blog2 λck 1D 2D 3D1 δ δx + δy Carry3(δy + δz, δz + δx , δx + δy)

2 maxδx , δy Carry3(δx , δy , δz)

3 maxδx , δy , δz

Carry3(α, β, γ) ≡ maxα, β, γ, α + β + γ − (α|β|γ)

Carry3(α, β, γ) ≡ maxα, β, γ, α + β + γ − (α|β|γ).

0

1

2

3

4

5

6

12 96 768 6144 49152 112128

Secondsper

(millionelem

ents

/core)

Number of CPU cores

Old New

Deriving and implementingthese kernels (and otherimprovements) led to a 4-5xspeedup:

Conversion to adjacency-graph [IBWG14]

Exploiting the hierarchical structure of quadtrees and octrees via recursivealgorithms leads to 2-3x speedup on deep-cache architectures.

Old search/hash-based algorithm (top) vs. New recursion-based algorithm(bottom) metrics, conversion to FE mesh (Ivy Bridge)N runtime (ms) instructions branch misses cache misses4.6× 103 9.5× 100 4.3× 107 2.1× 105 2.2× 104

1.0× 101 3.7× 107 5.3× 104 1.1× 104

3.9× 104 8.6× 101 4.2× 108 1.7× 106 2.2× 105

4.0× 101 3.1× 108 3.6× 105 5.1× 104

3.2× 105 8.4× 102 3.7× 109 1.3× 107 4.8× 106

3.5× 102 2.5× 109 2.7× 106 4.5× 105

2.6× 106 8.0× 103 3.3× 1010 1.0× 108 6.1× 107

2.8× 103 2.0× 1010 2.2× 107 5.4× 106

p4est in ice sheet modeling

I (left) Refinement is often highly localized (e.g. grounding lines and ice streams).I (right) specialized hybrid quadtree/octree scheme decouples horizontal and vertical

refinement (unlike isotropic octree refinement). (A presentation on the hybridscheme is found at http://users.ices.utexas.edu/~tisaac/slides/).

[Bro10] Jed Brown. Efficient nonlinear solvers for nodal high-order finite elements in 3D. Journal of Scientific Computing, 45(1-3):48–63, 2010.[EW96] S. C. Eisenstat and H. F. Walker. Choosing the forcing terms in an inexact Newton method. SIAM Journal on Scientific Computing, 17:16–32, 1996.[IBG12] Tobin Isaac, Carsten Burstedde, and Omar Ghattas. Low-cost parallel algorithms for 2:1 octree balance. In Proceedings of the 26th IEEE International Parallel &

Distributed Processing Symposium. IEEE, 2012.

[IBWG14] Tobin Isaac, Carsten Burstedde, Lucas C. Wilcox, and Omar Ghattas. Recursive algorithms for distributed forests of octrees. In preparation, 2014.

[IPSG14] Tobin Isaac, Noemi Petra, Georg Stadler, and Omar Ghattas. From data to prediction with quantified uncertainties: Scalable parallel algorithms and applications to thedynamics of the Antarctic ice sheet. Journal of Computational Physics (submitted), 2014.

[ISG14] Tobin Isaac, Georg Stadler, and Omar Ghattas. Solution of nonlinear Stokes equations discretized by high-order finite elements on nonconforming and anisotropicmeshes, with application to ice sheet dynamics. SIAM Journal on Scientific Computing (submitted), 2014.

[MSC+07] G. A. Meehl, T. F. Stocker, W. D. Collins, A. T. Friedlingstein, A. T. Gaye, J. M. Gregory, A. Kitoh, R. Knutti, J. M. Murphy, A. Noda, S. C. B. Raper, I. G. Watterson, A. J.

Weaver, and Z.-C. Zhao. Global climate projections. In Climate Change 2007: The Physical Science Basis. Contribution of Working Group I to the Fourth AssessmentReport of the Intergovernmental Panel on Climate Change, pages 747–845. Cambridge University Press, 2007.

[PZS+12] Noemi Petra, Hongyu Zhu, Georg Stadler, Thomas J. R. Hughes, and Omar Ghattas. An inexact Gauss-Newton method for inversion of basal sliding and rheologyparameters in a nonlinear Stokes ice sheet model. Journal of Glaciology, 58(211):889–903, 2012.

[RMS11] E Rignot, J Mouginot, and B Scheuchl. Ice flow of the Antarctic ice sheet. Science, 333(6048):1427–1430, 2011.

[TS03] A. Toselli and C. Schwab. Mixed hp-finite element approximations on geometric edge and boundary layer meshes in three dimensions. Numerische Mathematik,94(4):771–801, 2003.