alegra is a large, highly capable, option rich, production application solving coupled multi-physics...

ALEGRA is a large, highly capable, option rich, production application solving coupled multi-physics PDEs modeling magnetohydrodynamics, electromechanics, stochastic damage modeling and detailed interface mechanics in high strain rate regimes on unstructured meshes in an ALE framework. Nearly all the algorithms must accept dynamic, mixed-material elements, which are modified by remeshing, interface reconstruction, and advection components. Recent trends in computing hardware have forced application developers to think about how to address and improve performance on traditional CPUs and to look forward to next generation platforms. Core to the ALEGRA performance strategy is to improve and rewrite loop bodies to be conformant with the requirements of high performance kernels, such as accessing data in array form, no pointer dereferencing, no function calls, and thread safety. Necessary to achieve this, however, are changes to the underlying infrastructure. We report on recent progress in the infrastructure to support array-based data access and on iteration of mesh objects. The effects on performance on traditional platforms will be shown. We also discuss the practical realities and cost estimates for attempting to move an existing full featured production application like ALEGRA toward running effectively on future platforms and being maintainable at the same time.

The ALEGRA Production Application: Strategy, Challenges and Progress Toward Next Generation Platforms

Richard R. DrakeDept 1443 - Computational Multiphysics, Sandia National Laboratories

Algorithms & Abstractions for Assembly in PDE Codes, May 12-14, 2014

ALEGRA: Shock Hydro & MHD

• 20 years of development & evolution• Operator split, multi-physics• Includes explicit and implicit PDE solvers• 2 and 3 spatial dimensions• Core hydro is multi-material Lagrangian plus remap• An XFEM capability is maturing• 650k LOC (not including libraries, such as Trilinos)• Mix of research, development, and production

capabilities• Extensive material model choices

Shock hydro

2D Magnetics

3D Resistive MHD

Extensive material model choices

Some ALEGRA Core Algorithms

• Mixed material cell treatment• Remap

• Remesh• Material interface reconstruction• Material & field advection

• Dynamic topology• Extended Finite Element Method (XFEM)• Spatial refinement/unrefinement

• Flexible set of material models comprising each material

• Central difference and midpoint time integration options

XFEM requires topological enrichment

Material interfacereconstruction

Swept volume & intersection remap

NEVADA Infrastructure (A Framework)

Everything depends on the “Mesh”

Field I/O

Load Balancing

Contact

Spatial Adaptivity

XFEM Adaptivity

Halo Comm

In-Situ Processing

In-Situ Viz

Remesh

Interface Reconstruction

Advection

Input Parsing

Physics Algorithms

Unstructured Mesh

Structured Mesh

Materials

Performance

We need to run faster !• Customer needs• NW needs• Optics (marketing)

It has become clear that:• There is no performance silver bullet• Application software must change• This will require a resource shift

Can’t rely on faster CPUs anymore !

56%60%

Muzia, 2D

The ALEGRA Performance Strategy

Work in the present but aim for the future.

Incrementally reimplement algorithms• Remesh, interface reconstruction,

advection• Lagrangian step pieces• Matrix assembly coding• Time step size computation

Focus on foundational concepts• Accessing bulk data in array form• Limit pointer dereferencing• Limit function calls (non-inlined)• Minimize the data read/writes• Thread safety

Refactor support infrastructure• Enable array-based access• Enable flat indexed based iteration• Enable thread safety (colorings?)

Consider new algorithms• Alternate formulations• New/different algorithms

[Komatitsch]

Progress in Data Layout

v1v2v3v4 . . .

Object-based layoutArray-based layout

obj_idx0 1 2

v1v2v3v4 . . .

Indexed by “obj_idx”

“double**”

ndVector_Var( CURCOOR )

nddata[ CURCOOR ]

nddata[ CURCOOR ][ ndobj_idx ]

Becomes, in object layout:in array layout:

• Object-based layout has more direct access to memory.• Array-based layout has better cache & TLB behavior.• Depending on the algorithm and problem size, the better memory

behavior may or may not offset the extra dereferencing.

“Transpose”the storage

Common, existing access pattern:

Speedups: Object- versus Array-Based

• Comparisons of unmodified versus array-based code• Intel chips: RedSky=Nehalem, TLCC2=SandyBridge• The memory behavior wins over the extra offset in

many cases.

Algorithms Should Usethe Arrays Directly

Element * el = 0;TOTAL_ELEMENT_LOOP(el) { const Vector vara = el->Vector_Var( VARA_IDX ); Vector & varb = el->Vector_Var( VARB_IDX ); el->Vector_Var( VARA_IDX ) += varb; el->Scalar_Var( VARC_IDX ) = vara * varb;}

ArrayView<Vector> vara = mesh->getField( VARA_IDX );ArrayView<Vector> varb = mesh->getField( VARB_IDX );ArrayView<double> varc = mesh->getField( VARC_IDX );Element * el = 0;TOTAL_ELEMENT_LOOP(el) { const int ei = el->Idx(); const Vector va = vara[ei]; vara[ei] += varb[ei]; varc[ei] = va * varb[ei];}

Object-based access:

Array-based access:(Oversimplified, hypothetical loop)

Object List & Iteration Improvements

Index based mesh object storage Enables iteration without dereferencing objects

Performance comparison shows no improvement Algorithms would have to take advantage first

Doubly linked lists: Index sets:

for ( int i=0; i<N; ++i ) { int ni = index_list[i]; vel[ni] = old_vel + dt * accl[ni]; ...}

Can now do this:

Convert to use integer offsets

0 1 2

List:

Data:

Nodes:…Nodes:

List:

Data:

…0 1 2

Object Ordering Exploration

Improve cache locality by mesh object ordering Hmm? No speedups over default ordering

Order elements by space filling curve

[wikipedia]

Order nodes by first touch element loop

Summary

ALEGRA has adopted a low risk performance strategy Main concept: incrementally rewrite algorithms towards NGP standards

Progress made on support infrastructure Array-based field data Integer index set object looping

1.4X speedup realized on realistic simulations

Work continues on infrastructure & algorithms Data: Topology storage, integer field data, material data Algorithms: Remap, Lagrangian step

alegra is a large, highly capable, option rich, production application solving coupled multi-physics...

Documents

alegra production application

material models

multimaterial lagrangian

mixedmaterial elements

application developers

future platforms

traditional platforms

underlying infrastructure