parallel adaptive finite element simulation using … · equations by classical analytical methods...

PARALLEL ADAPTIVE FINITE ELEMENT

SIMULATION USING DISTRIBUTED QUAD-

TREE/OCTREE FOREST

ARAVINTH RAMACHANDRAN

Vietnamese-German University

Thesis submitted in partial fulfilment

of the requirements for the Degree of

Master of Science in Computational Engineering

supervisors

Prof. Dr. techn Günther Meschke

MSc. Hoang-Giang Bui

Institute for Structural Mechanics

Ruhr University Bochum

Bochum, Germany

November, 2016

i

ABSTRACT

Adaptive Mesh Refinement approach helps to realize discretization that

improve the accuracy of the solution per degree of freedom and thereby

reduces the computational cost. In this thesis, a Parallel Adaptive Finite

Element implementation in C++ is described. The implementation is

based on p4est, a parallel mesh refinement library. The hanging nodes

that result from adaptive refinement are resolved by using constraint

conditions. The scalable algorithms in the p4est library are encoded

using octree based adaptivity and it allows to incorporate the dynamic

adaptivity in parallel. The parallel AMR finite element package is

developed as a modular platform which enables to develop AMR

applications and expand them easily. The capabilities of these

applications are demonstrated using various numerical examples.

ii

ACKNOWLEDGEMENT

Firstly, I would like to express my sincere gratitude to my supervisors

Prof. Dr. Techn Günther Meschke and MSc. Hoang-Giang Bui for their

continuous support for my master thesis research. I would also like to extend

my thanks to Hoang-Giang Bui for his patience, motivation and guidance that

helped me at throughout the research and writing of this thesis. My sincere

thanks also go to DAAD, for offering me the scholarship to carry out the

research at Ruhr University Bochum. I thank Dr. Han Duc Tran, my academic

coordinator for his support throughout the thesis. Last but not the least, I would

like to thank my parents for their support throughout my life.

iii

TABLE OF CONTENTS

Abstract.......................................................................................................... i

Acknowledgement ........................................................................................ ii

Table of Contents ........................................................................................ iii

List of Figures .............................................................................................. v

1 Introduction ............................................................................................ 1

1.1 Motivation for Adaptive Mesh Refinement ..................................... 1

1.2 Objectives ............................................................................................ 3

1.2 Thesis Organisation ............................................................................. 4

2 Literature Review ................................................................................... 5

3 Methodology........................................................................................... 9

3.1 Treatment of Hanging Nodes .............................................................. 9

3.2 Application of Constraint Condition to FEM System ...................... 11

3.3 Refinement Indicators ....................................................................... 13

3.3.1 Kelly Error Estimator .................................................................. 14

3.3.2 Carstensen Error Estimator ......................................................... 14

3.4 Data Transfer for Non-Linear AMR ................................................. 15

4 Implementation ..................................................................................... 17

4.1 p4est and Parallel AMR Implementation .......................................... 18

4.1.1 p4est Interface Schematics .......................................................... 19

4.1.2 2:1 Balancing ............................................................................... 22

4.1.3 Parallel Repartitioning ................................................................. 23

4.2 Structure of Parallel Adaptive Code ................................................. 24

iv

5 Numerical Examples ............................................................................ 29

5.1 Patch test ............................................................................................ 29

5.2 Infinite plate with a circular hole ...................................................... 31

5.3 Slope stability .................................................................................... 35

5.4 Extension of notched beam ............................................................... 37

6 Conclusions .......................................................................................... 39

References .................................................................................................. 40

v

LIST OF FIGURES

Figure 1-1: Linear Elasticity Boundary Value Problem .............................. 1

Figure 2-1: Types of Adaptive Mesh Refinement ....................................... 5

Figure 3-1: Adaptive Mesh Refinement with Hanging Node ...................... 9

Figure 3-2: Hanging Node along Edge 1-2 ................................................ 10

Figure 3-3: Sample domain with Hanging Node ....................................... 12

Figure 3-4: Assembly of RHS based on constraint condition ................... 13

Figure 3-5: Non-Linear AMR Data Transfer ............................................. 16

Figure 4-1: Parallel AMR Tool components.............................................. 17

Figure 4-2: Parallel AMR Tool features .................................................... 18

Figure 4-3: p4est: Tree and Forest terminology [12,13] ............................ 19

Figure 4-4: Adaptive Mesh Refinement Simulation using p4est [2] ......... 19

Figure 4-5: 2:1 Balancing ........................................................................... 22

Figure 4-6: Parallel Partitioning [11,12,13] ............................................... 23

Figure 4-7: Parallel AMR Tool Generic layout ......................................... 24

Figure 4-8: Hanging Node Decoder ........................................................... 27

Figure 5-1: Patch test problem representation ........................................... 30

Figure 5-2: Displacement of the domain along x and y direction ............. 30

Figure 5-3: Stress distribution along x direction ........................................ 30

Figure 5-4: Infinite plate with a circular hole ............................................ 31

Figure 5-5: Stress distribution along x direction ........................................ 32

Figure 5-6: Parallel Repartitioning colour coded by MPI rank ................. 33

Figure 5-7: Comparison of strain energy convergence .............................. 34

vi

Figure 5-8: Slope stability problem representation .................................... 35

Figure 5-9: Adaptive Refinement based on Plasticity Indicator ................ 36

Figure 5-10: Extension of notched beam problem representation ............. 37

Figure 5-11: Adaptive Refinement based on Damage Indicator ............... 38

1

CHAPTER 1

1 INTRODUCTION

Many physical phenomena in engineering and science can be

described in terms of partial differential equations. In general, solving these

equations by classical analytical methods for arbitrary shapes is almost

impossible. The finite element method (FEM) is a numerical technique for

solving problems which are described by partial differential equations or

can be formulated as functional minimization. A domain of interest is

represented as an assembly of finite elements. Approximating functions in

finite elements are determined in terms of nodal values of a physical field

which is sought. A continuous physical problem is transformed into a

discretized finite element problem with unknown nodal values [16]. The

individual elements are connected by a topological map called a mesh.

When modelling problems involving localized deformation, steep gradients

or discontinuous surfaces, a very fine mesh is usually required for accurate

results. To reduce the computational time, adaptive refinement techniques

are usually preferred over the uniform mesh refinement.

1.1 Motivation for Adaptive Mesh Refinement

Consider the linear elastic problem on an arbitrary shaped domain(Ω)

shown in Figure 1-1, subjected to traction along the boundary 𝛤𝑡

Figure 1-1: Linear Elasticity Boundary Value Problem

2

The governing equations for linear elasticity are,

Equilibrium Equation,

𝑑𝑖𝑣𝝈 + 𝜌𝒃 = 𝟎 𝑜𝑛 𝛺

Kinematics equation,

𝛆 = 𝛻𝑠𝐮 = 𝛻𝐮 + 𝛻𝑇𝐮

Constitutive equation,

𝛔 = ℂ ∶ 𝛆

The traction and displacement boundary condition in the domain are given as,

𝒖 = �̅� 𝑜𝑛 𝛤𝑢

𝒕 = �̅� 𝑜𝑛 𝛤𝑡

The variational form or the weak form of the governing equations is

∫ 𝛿𝜺 ∶ 𝝈

𝛺

𝑑𝑉 = ∫ 𝛿𝒖 ∙ 𝜌𝒃

𝛺

𝑑𝑉 + ∫ 𝛿𝒖 ∙ �̅�

𝛤𝑡

𝑑𝐴

where 𝛿𝐮 ∈ 𝐻𝐷1(Ω)

u is found such that 𝒖 ∈ 𝐻𝐷1(𝛺) = {𝜑 ∈ 𝐻1(𝛺):𝜑 = �̅� 𝑜𝑛 𝛤𝑢}

We triangulate the domain to get the triangulation (mesh) 𝒯 and

choose a discrete Sobolev space defined over mesh 𝒯 such as

𝑆𝐷1(𝒯) ⊂ 𝐻𝐷

1(Ω)

The solution 𝐮ℎ ∈ 𝑆𝐷1(𝒯) is sought satisfying for all 𝛿𝐮ℎ ∈ 𝑆𝐷

1(𝒯)

∫ 𝛿𝛆𝒉 : 𝛔𝒉

Ω

𝑑𝑉 = ∫ 𝛿𝐮𝒉 ∙ 𝜌𝐛

Ω

𝑑𝑉 + ∫ 𝛿𝐮𝒉 ∙ 𝐭̅

𝛤𝑡

𝑑𝐴

The residual of the solution is given as,

∫ 𝛿𝛆 : (𝛔 − 𝛔𝒉)

Ω

𝑑𝑉 = ∫ 𝛿𝐮 ∙ 𝜌𝐛

Ω

𝑑𝑉 + ∫ 𝛿𝐮 ∙ 𝐭̅

𝛤𝑡

𝑑𝐴 − ∫ 𝛿𝛆 : 𝛔𝒉

Ω

𝑑𝑉

= ℜΩ(𝐮, 𝐮ℎ)

3

The Friedrichs and Cauchy-Schwarz inequatilities shows that

𝑠𝑢𝑝 𝛿𝐮∈𝐻𝐷1 (Ω)

‖𝛿𝐮‖𝐻1(Ω)=1

{ℜΩ(𝐮, 𝐮ℎ)}

≤ ‖ℋ(𝐮): (𝛔 − 𝛔𝒉)‖𝐻1(Ω)

≤ √1 + 𝑐Ω2𝑠𝑢𝑝 𝛿𝐮∈𝐻𝐷

1 (Ω)

‖𝛿𝐮‖𝐻1(Ω)=1

{ℜΩ(𝐮, 𝐮ℎ)}

This indicates that the finite element solution error is bounded above

and below by the residual of the solution and when the residual is driven

down the accuracy of the solution improves.

𝑠𝑢𝑝 𝛿𝐮∈𝐻𝐷1 (Ω)

‖𝛿𝐮‖𝐻1(Ω)=1

{∫ 𝛿𝐮 ∙ 𝜌𝐛

Ω

𝑑𝑉 + ∫ 𝛿𝐮 ∙ 𝐭̅

𝛤𝑡

𝑑𝐴 − ∫ 𝛿𝛆:𝛔𝒉

Ω

𝑑𝑉} → 𝑚𝑖𝑛

‖𝐮 − 𝐮𝒉‖𝐻1(Ω)

≤ 𝑐 {∑ ℎ𝐾2

𝐾∈𝒯

‖𝑅𝐾(𝐮𝒉)‖2+ ∑ ℎ𝐸

𝐾∈𝒯

‖𝑅𝐸(𝐮𝒉)‖2}

This implies that error is dominated by largest cell size and the

(global) norm of the residual. To reduce the error, the mesh must be refined

globally.

However, a closer analysis show that to reduce the error the mesh need

to be refined only where the local residual is large. By this the accuracy of

the solution is retained but at a reduced computational cost.

1.2 Objectives

The main objective of this thesis is to develop a C ++ based Parallel

Adaptive Mesh Refinement FE tool that would provide a platform to

develop Parallel AMR applications easily. To achieve this, the tool would

be based on open source libraries that provides Parallelism, Adaptive Mesh

Refinement, and Solver capabilities. The next objective is to verify and

validate the functioning of various features of the tool. This is done by

creating sample applications that encompasses different numerical

problems. The output from these applications would be validated to ensure

the proper implementation and working of the tool.

4

1.2 Thesis Organisation

The thesis is structured as follows. Following this introduction,

provided here in Chapter 1, the different AMR techniques, their advantages

and disadvantages, are presented in Chapter 2. In Chapter 3, the different

methodologies that are used in the code development are discussed. Chapter

4 focuses on the implementation of the methods in the code and discuss

about the structure of the code. Validation of the Parallel AMR code and

their outcomes are discussed in the Chapter 5. The conclusions of the thesis

are highlighted in Chapter 6.

5

CHAPTER 2

2 LITERATURE REVIEW

The principal idea of adaptive mesh refinement (AMR) is to enable a

higher accuracy solution at lower costs, by optimal distribution of grid

points for the computational region. In a nutshell, AMR is a hierarchical

inter-mesh communication scheme. It relies on locally refined mesh or mesh

patches to increase the resolution of an underlying coarse mesh only where

needed. It can lighten some of the difficulties in the generation of high

quality grid and reduce the number of iterations between the grid generation

and solution required for tailoring the grid to the specification of a problem.

Thus, it can offer orders of magnitude saving in computational and storage

costs over an equivalent uniformly refined mesh.

The Adaptive Mesh Refinement strategies can be classified into four

broad categories depending on the partitioning algorithm and/or data

structure used to encode connectivity. The different approaches are, patch-

based AMR methods, cell-based AMR methods, block-based AMR

methods, and hybrid block-based AMR techniques. Figure 2-1 (a) to 2-1 (c)

shows the refinement resulting from the patch-based, cell-based, and block-

based AMR schemes. The elements for refinement are highlighted in the

base Cartesian mesh.

Figure 2-1: Types of Adaptive Mesh Refinement

6

Berger et al. [17] introduced the concept of AMR on structured grids.

This approach is now more generally referred to as patch-based AMR. The

technique begins with a coarse base-level Cartesian grid and, as the

calculation progresses, individual grid cells are marked for refinement. The

patch-based AMR strategy relies on algorithms to organize collections of

individual computational cells into rectangular patches. The mesh within

these newly formed patches can then be further refined, creating additional

nested patches.

The cell-based AMR, as proposed and discussed in [18] states that

cells may be refined individually and is stored using a tree data structure

(quad-tree in two dimensions, and octree for three dimensions). This cell-

based tree structure is flexible and readily allows for the local refinement of

the mesh by keeping track of the computational cell connectivity as new

grid points are generated by the refinement process (4 new cells in two

dimensions and 8 in three dimensions).

In a block-based AMR strategy, mesh adaptation is accomplished by

the dividing and coarsening of entire predefined grid blocks or groupings of

cells. Although not required, each of the groupings or blocks generally has

an equal number of cells. Tree data structures are again used for tracking

block connectivity and mesh refinement; however, the block-based AMR

strategy results in a much lighter tree structure as compared to that of cell-

based methods, which typically created larger numbers of mesh cells during

the refinement process.

A hybrid block-based AMR approach has been explained by Holst

and Keppens [19]. The hybrid approach incorporates methods from patch-

based strategies to modify tree data structure. The proposed hybrid AMR

strategy requires two means to traverse the grid hierarchy, in which, in

addition to the tree data structure a linked list of grid pointers are also

needed.

The adaptivity algorithms must be scalable as that of numerical

algorithms to achieve large scale adaptivity. The key features required for

the large scale adaptivity as discussed in [11] are, efficient mesh

partitioning, access of connectivity information at local level and exchange

of connectivity information across processes (i.e. parallel repartitioning).

7

In case of domain decomposition or mesh partitioning each

element must be assigned to a unique processor. The whole mesh is not

stored in any processors and the storage of the mesh is distributed. The

connectivity information between elements must be known beforehand

to facilitate the exchange of numerical information with neighbour

elements. The next requirement for the algorithm is to effectively find

the neighbour elements, find the owner processor of the neighbour and

find the degrees of freedom. The final requirement would be to have

global redistribution of elements. This main objective of parallel

partitioning is to minimize the run time. In addition to all the above the

encoding of connectivity information must be done with minimal

storage.

The cell or tree based AMR algorithm is found to satisfy [12,13]

all the above exceptionally. The octree algorithm is a hierarchic AMR

that uses Space Filling Curves (SFC). SPC ensures fast parallel

partitioning and elementary operations. The space filling curve (SFC)

is the direct representation of the octree structure thus provides better

encoding connectivity. Also, SFC maps the 1D curve to the 2D or 3D

domain thereby the ordering is covered effectively. Scaling to large

extent is possible because the tree is a recursive structure and the easy

traversing of the tree leaf makes the scheme cache-friendly.

The octree based structure makes the transfer of information

within the process and the across the process very effective [11]. The

search for a same-processor neighbour element is basically a tree search

approach executed in O(log𝑁

𝑃). Similarly, finding the owner processor

of the neighbour is executed in O(log P). It is a binary search operation.

The search of parent or child is a vertical tree step and has runtime O(1).

To summarize, the octree based AMR algorithm has,

Optimal cache due to Space filling curve ordering.

Straight forward expression of hanging nodes for FEM

Can be extended to complex geometries

Scalability is of unrestrained

8

All the above functionalities discussed are available in a lightweight parallel

adaptive mesh library p4est [1]. p4est treats the mesh topology as a forest

of octrees and hence complex geometries can be handled. The connectivity

is encoded to have minimal memory footprint, for local storage 24 bytes per

element and for global storage 32 bytes for each processor [11]. p4est can

generate and adapt multi octree meshes with up to 5.13 × 1011 octants on as

many as 220,320 CPU cores and execute the 2:1 balance algorithm in less

than 10 seconds per million octants per process. p4est also provides

interface to implement these algorithms that makes it a suitable candidate to

use in the development of Parallel Adaptive Mesh Refinement Tool.

9

CHAPTER 3

3 METHODOLOGY

In this chapter the strategies used in implementing the adaptive mesh

refinement are discussed. To develop the Adaptive Mesh Refinement code,

the following operations must be handled effectively.

Treatment of Hanging Nodes using Constraint Conditions.

Application of Constraint Condition to FEM System

Refinement Indicators

Data Transfer for Non-Linear AMR

The occurrence of these conditions and the specific strategies that were

used to handle them are discussed below.

3.1 Treatment of Hanging Nodes

In Adaptive Mesh Refinement when the coarse mesh is subjected to

refinement it leads to the occurrence of “Hanging Nodes” as shown in

Figure 3-1. With the introduction of hanging nodes the mesh is no longer

conformal and our standard method of basis construction would create

discontinuities. The strategy of handling continuity at hanging nodes is to

constrain the basis. [6,7]

Figure 3-1: Adaptive Mesh Refinement with Hanging Node

To constrain the basis, the shape functions are defined on each cell as

usual. The functions in finite element approximation are defined as linear

combination of shape functions.

10

Thereby the functions in the solution space are globally continuous. The

idea is explained in detail in the following example for a piecewise-bilinear

basis but can be extended to higher order polynomials similarly.

Let us consider the edge between vertices 1 and 2 containing a

hanging node at 3 as shown in Figure 3-2. The size of the element is taken

as ‘h’ with the coordinate system located at vertex 2. The elements along

edge 1-2 are indexed E1, E2 and E3 as shown in Figure 3-2.

Figure 3-2: Hanging Node along Edge 1-2

Firstly, the shape function on each element is constructed as usual.

The shape functions of the elements that are non-zero on the edge 1-2 are

given by equations below. The second subscript on Nje denotes the element

index,

For Element 1 (E1),

𝑁11 = (ℎ+𝑥

ℎ) (

𝑦

ℎ) , 𝑁21 = (

ℎ+𝑥

ℎ) (

𝑦

ℎ)

For Element 2 (E2),

𝑁12 = (ℎ 2⁄ −𝑥

ℎ 2⁄) (

𝑦−ℎ 2⁄

ℎ 2⁄) , 𝑁32 = (

ℎ 2⁄ −𝑥

ℎ 2⁄) (

ℎ−𝑦

ℎ 2⁄)

For Element 3 (E3),

𝑁23 = (ℎ 2⁄ −𝑥

ℎ 2⁄) (

ℎ 2−𝑦⁄

ℎ 2⁄) , 𝑁33 = (

ℎ 2⁄ −𝑥

ℎ 2⁄) (

𝑦

ℎ 2⁄)

Secondly, the functions in the solution space are taken as linear

combination of shape functions. This ensures the functions in solution space

are globally continuous.

11

The displacement on Element 1 along edge 1-2 is given as

𝑈(𝑥, 𝑦) = 𝑢1𝑁11(𝑥, 𝑦) + 𝑢2𝑁21(𝑥, 𝑦) (3.1)

Evaluating this at Node 3 yields,

𝑈(𝑥3, 𝑦3) =𝑢1+𝑢2

2, x < 0 (3.2)

Similarly, displacement on Element 2 and 3 along edge 1-2 is given as

𝑈(𝑥, 𝑦) = {𝑢1𝑁12(𝑥, 𝑦) + 𝑢3𝑁32(𝑥, 𝑦), 𝑖𝑓 𝑦 ≥ ℎ

2⁄

𝑢2𝑁23(𝑥, 𝑦) + 𝑢3𝑁33(𝑥, 𝑦), 𝑖𝑓 𝑦 < ℎ2⁄

In either case at Node 3,

𝑈(𝑥3, 𝑦3) = 𝑢3, x > 0 (3.3)

Equating Equations 3.2 and 3.3 for U (x3, y3) yields the “constraint

condition”.

𝑢3 =𝑢1 + 𝑢2

2

3.2 Application of Constraint Condition to FEM System

In this section the procedure to apply the constraint conditions to the

Finite Element system is discussed. The key feature of FEM is that the same

operation on every element is repeated. This property of finite element is

effective while assembling large system as the process remains same across

the assembly procedure.

This means that the elementary operation must remain same

irrespective of the presence or absence of hanging nodes. This is achieved

by the following procedure as outlined in [7,10].

12

Step 1: Build local stiffness matrix (Ke) and residual force vector (Fe) with

all DOF's as if there were no hanging nodes.

For instance, for the system shown in Figure 3-3 the residual force

vector of the Elements E1. E2 and E3 is given as,

For Element (E1)

[ 𝐹1

1

𝐹21

𝐹31

𝐹41]

For Element (E2)

[ 𝐹6

2

𝐹72

𝐹42

𝐹102 ]

For Element (E3)

[ 𝐹2

3

𝐹53

𝐹63

𝐹73]

Figure 3-3: Sample domain with Hanging Node

Step 2: Modify when copying local contributions into global matrices K, F

In this step while building the global stiffness matrix and residual

vector from local contributions constraint conditions are applied. The

contribution of the hanging node is distributed to its support nodes and the

hanging node is eliminated from the system. The Figure 3-4 illustrates this

operation.

13

Step 3: Solve the system KU=F

Figure 3-4: Assembly of RHS based on constraint condition

Step 4: Get all components of U and distribute.

Once the system of equations is solved the displacement of

independent nodes are extracted and distributed to the hanging nodes.

In this example, it would be

𝑢6 =𝑢2 + 𝑢4

2

3.3 Refinement Indicators

The next key feature for any adaptive refinement code is the

refinement indicator. The refinement indicator marks the elements that need

to be refined in the following iterative step. The choice of the element is to

be made such that the sensitive regions of the domain that impact the

convergence of the solution are covered.

The refinement indicators are problem specific. They are the chosen

such that they reflect the better selection of elements according to the nature

of problem. The refinement indicators can either be any physical parameter

related to the problem or a posteriori local error estimator.

14

An example for physics based indicator is the damage indicator for the

problems which model’s damage and for local error estimator two type

of error estimators are implemented and are discussed in detail in

following sections.

3.3.1 Kelly Error Estimator

Kelly error estimator belongs to the class of explicit error estimator

that involve the direct computation of the interior element residuals and

jumps at the element boundaries to calculate the error in the energy norm.

From [3] it can be noted that the contribution of the internal residual is

negligible in comparison to the jump across the boundary and the local error

is given as

𝜂2 = ∑ℎ

24𝑎∫ 𝐽2𝑑𝛤𝛤

where J is the jump of the gradient across the element edge γ and h is the

size of the element and a is a constant related to the material properties.

For instance, in case of a problem defined by Poisson’s equation with

Dirichlet and Neumann boundary conditions,

𝑢 = 0 𝑜𝑛 𝛤𝐷; 𝑛. 𝛻𝑢 − 𝑔 = 0 𝑜𝑛 𝛤𝑁

the jump in the gradients are defined as

𝐽 = {𝑛. 𝛻𝑢ℎ + 𝑛′. 𝛻𝑢′ℎ 𝑖𝑓 𝛾 ⊈ 𝛤

𝑔 − 𝑛. 𝛻𝑢ℎ 𝑖𝑓 𝛾 ⊂ 𝛤𝑁0 𝑖𝑓 𝛾 ⊂ 𝛤𝐷

where on inter element edges, γ ⊈ Γ, the edge γ separates the elements.

3.3.2 Carstensen Error Estimator

Carstensen error estimator is also an explicit error estimator based on

the comparison between averaged stress field and discrete (discontinuous)

stress field. The error estimator is based on the local error indicator,

𝜂𝑇 = ‖𝜎ℎ∗ − 𝜎ℎ‖𝐿2(𝑇)

where σ*h is the averaged stress field and σh is the discrete stress field.

15

The error estimator is given as

𝜂 = (∑𝜂𝑇2)

1/2

This error estimator is used in practical computation as a representation of

discretization error. From [4] the relative quantity,

𝜂 =√∑ 𝜂𝑇

2𝑇

√∑ ∫ σℎ∗ :₵−1σℎ

∗ 𝑑𝑥𝑇𝑇

with 𝜂𝑇2 = ∫ (σℎ − σℎ

∗ ): ₵−1(σℎ − σℎ∗ )𝑑𝑥

𝑇

is used as refinement indicator.

In the code both the physics based indicator and indicator based on

error estimator have been effectively implemented and tested. The Chapter

Numerical Examples explains the implementation of different indicators

and their outputs for various cases. In case of example 2 of Chapter 5 since

the analytical solution was available beforehand a displacement error

indicator was implemented. The displacement error was calculated as the

difference between analytical solution and discrete solution.

3.4 Data Transfer for Non-Linear AMR

In this section the procedure to transfer data between the iterations is

briefed. This is a vital requirement in case of problems that involves non-

linear modelling. For adaptive mesh refinement involving the non-linear

analysis the data from old mesh like stress at gauss points and other history

variables must be transferred to the new mesh. The variable transfer

procedure is executed in following steps as depicted in Figure 3-5.

Step 1: The variable in the old mesh at the Gauss points locations is

transferred to nodes using a global L2 projection.

Step 2: Once the mesh is locally refined the variables are transferred from

the existing nodes to the new nodes based on the constraint condition which

would account for hanging nodes as well.

16

Figure 3-5: Non-Linear AMR Data Transfer

Step 3: Finally, in this step the variables at the nodes are transferred back to

the gauss point using standard interpolation procedure.

17

CHAPTER 4

4 IMPLEMENTATION

This chapter of the report details about the Parallel Adaptive

Mesh Refinement code implementation, based on the ideas discussed

in the previous chapter. This chapter describes the software and the

features in two parts. The first part of the chapter describes about p4est,

the tree management library and the implementations done using it and

the latter covers the software layout and its features.

The Parallel AMR tool was developed using C++ programming

language. The two major open source libraries that forms the backbone

of the tool are;

PETSc

p4est

Figure 4-1: Parallel AMR Tool components

The Portable, Extensible Toolkit for Scientific Computation

(PETSc) [8], is a library of parallel linear and non-linear solvers for

applications modelled based on partial differential equations. The

Parallel AMR tool uses the features like matrix, vectors and SNES

solvers of PETSc to solve the linear and non-linear equations in

parallel.

The Figure 4-2 shows the salient features of the tool. The features

such as domain decomposition, 2:1 Balancing, parallel repartitioning

are achieved using the p4est library.

18

The hanging constraints are calculated and applied as described

in Chapter 3. The numerical solver capabilities are inherited from the

PETSc library.

Figure 4-2: Parallel AMR Tool features

4.1 p4est and Parallel AMR Implementation

p4est [1] is a software library that enables dynamic management

of a collection of adaptive octrees also termed as forest of octrees. The

term “octree” denotes a recursive tree structure where each node is

either a leaf or has eight children. The analogous construction in two

dimensions is termed “quad-tree,” where nodes have four children

instead of eight. Octrees and quad-trees can be associated with 3D and

2D cubic domains, where tree nodes are called octants and quadrants,

respectively, and the root node corresponds to a cubic domain that is

recursively subdivided according to the tree structure. The concept of

the tree is shown in Figure 4-3 [12,13]. In nutshell, p4est is a Parallel

AMR on Forests of Octrees.

The term “forest” to describe a collection of such logical cubes

that are connected conformingly through faces, edges, or corners, each

cube corresponding to an independent tree. Each octant or quadrant

belongs to precisely one process and is stored only there and the

distributed storage is implied by the term “Parallel”.

19

Figure 4-3: p4est: Tree and Forest terminology [12,13]

4.1.1 p4est Interface Schematics

The p4est library has many application program interfaces to

establish parallel adaptive refinement. The library is a pure AMR

module. This is possible since p4est separates the AMR topology from

any numerical information: The former is stored and modified internal

to the p4est software library, while an application is free to define the

way it uses this information and arranges numerical and other data [2].

Figure 4-4: Adaptive Mesh Refinement Simulation using p4est [2]

A general AMR simulation pipeline using p4est is shown in

Figure 4-4 [2].

20

The general task flow for the AMR application using p4est can

be divided into three parts,

Create a coarse mesh connectivity.

Modify the mesh by subjecting to refinement and partition.

Relate the mesh information from p4est to the application.

While the first operation is carried out redundantly in across all

processors, the second and third strictly implement distributed

parallelism.

The basic structure used in p4est is a connectivity of quad-trees

(2D) or octrees (3D) which covers the domain of interest in a

conforming macro-mesh. Thus, the mesh primitives used in p4est are

quadrilaterals in 2D and hexahedra in 3D. The p4est connectivity is a

data structure that contains numbers and orientations of neighbouring

coarse cells. It can be created by reading in the mesh data using the

built-in functions of p4est.

Once the macro structure connectivity is created then the

distributed p4est structure can be build and modify it in place. The

mesh management capabilities available for the application are,

Refine Adaptively subdivide octants based on a refinement

marker or callback function, once or recursively.

Coarsen Replace families of eight child octants by their common

parent octant, once or recursively.

Partition Redistribute the octants in parallel, according to a given

target number of octants for each process, or weights prescribed for all

octants.

Balance Ensure at most 2:1 size relations between neighbouring

octants by local refinement where necessary.

21

Nodes Create a globally unique numbering of the mesh nodes,

considering the classification into “independent” and “hanging” nodes.

Refinement and coarsening are controlled by callback functions

that usually query fags determined by the application. They provide two

modes of operation, non-recursive and recursive. Non-recursive Refine

replaces an octant with its eight children but does not consider newly

created children for refinement. Non-recursive Coarsen replaces eight

octant siblings by their parent but does not investigate the role of the

newly created parent as a sibling. Recursive mode, on the other hand,

is capable of radically changing the forest within one call, which is

sometimes advantageous for creation of a static or initial mesh

according to physical criteria.

The node enumeration is done using p4est_nodes or

p4est_lnodes. For a given p4est structure at an instance the enumeration

of the nodes is carried out in two steps,

Creation of a ghost layer, it is created using

p4est_ghost_new function.

Enumerate the nodes, the enumeration is carried out using

p4est_ nodes or p4est_lnodes.

The independent nodes are only numbered and the information

about hanging node on each octant is stored separated which can be

decoded for further use. Each independent node is assigned to one

owner process. The nodes are numbered globally in sequence of their

owner processes.

22

4.1.2 2:1 Balancing

2:1 balancing ensure that neighbouring mesh elements are either

of the same size or at most half or twice as big. 2:1 balancing limits the

number of hanging nodes on an element edge to one. The impending

introduction of a second hanging node on an edge requires refinement

of a neighbouring element. The Refine and Coarsen operations

generally destroy this invariant, which necessitates 2:1 Balance to be

re-established. This is done using the p4est_balance function.

Figure 4-5: 2:1 Balancing

The Figure 4-5 depicts 2:1 balancing application during

refinement. The mesh represented in Figure 4-5 (a) contains one

hanging node along the edge, in this scenario if the element highlighted

is refined it would lead to more hanging nodes along the edge. To avoid

this situation the element adjacent to the highlighted element is also

refined as shown in Figure 4-5 (b). This maintains the hanging node to

be not more than one along an edge. The objective of 2:1 balancing is

to avoid the complexities arise when there are too many hanging nodes

on an edge. 2:1 balancing also enables effective implementation of

hanging node treatment.

23

4.1.3 Parallel Repartitioning

Parallel partitioning generally refers to a redistribution of mesh

primitives among processes without changing the global mesh

topology (i.e., the global number of primitives or their neighbourhood

relations). The objective is most often to achieve load-balance (i.e.,

equal distribution of the computational work uniformly among

processes), which is necessary to ensure parallel scalability of an

application. This is done using the p4est_partition function.

Figure 4-6: Parallel Partitioning [11,12,13]

Each processor in a parallel program may only store a part of the

entire forest. The number of octants in each processor is not much

larger than the total number of cells divided by the number of

processors. For example, in the Figure 4-6 [11,12,13] the quad-tree

structure contains sixteen quads that are distributed among three

processors at an average of about five quads per processor.

Partition can also be carried out based on a user-specified weight

function that returns a non-negative integer weight for each octant and

creating a partition that is evenly distributed by weight.

24

4.2 Structure of Parallel Adaptive Code

The Parallel AMR tool was programmed based on a modular

approach. The various functionalities of the tool were programmed as

separate modules. These modules were finally linked together to

complete the package. This approach has a great advantage since the

code could be extended or modified easily in the future. The re usability

of the modules shortens the development time of the application and

makes debugging a lot easier. A generic layout of the tool is shown in

Figure 4-7 and the brief description of the modules are discussed

further.

Figure 4-7: Parallel AMR Tool Generic layout

25

The basic design of a user defined application starts with the

inclusion of the core module P4_structural_application. This provides

the common functions that are required such as to read inputs, define

replacement functions etc. This is followed by the addition of the

Material Law module and VTK handler module that provides to define

the system and export the output.

Core Module

The core module of the tool is the P4_structural_application. It

groups most of the common functionalities of the Finite Element

System. That includes the parallel assembly functions for stiffness

matrix and load vector, utilities for reading the input mesh and interface

functions for other modules that are associated with the tool. These

interfaces allow the access to the functions in other modules for various

operations. By this approach the core module acts as a central hub

where the operations are defined in a generic fashion that can be

augmented and enhanced based on requirements. Some key modules

and their functions are discussed below.

Quadrature Module

The quadrature module consists of the Gaussian points and

weights. These are essential to perform the Gauss quadrature integral

approximation. The quadrature functions take the dimension and

degree as parameters and return the quadrature point and weights as

output.

Math Utilities Module

This module consists of the specific mathematical operations

that are required for the tool. These operations are used at various

instances in the tool.

26

The notable functions that are available in the module are,

To calculate the coordinates of the nodes. The function uses the

p4est_quadrant_corner_node from p4est to calculate the node

coordinates.

To calculate the distance between two points.

To calculate the resultant of a vector from the components.

To extrapolate the variables from gauss, point to the corner

nodes. This operation is used in stress recovery [9].

As the module is an independent unit in the tool, it allows for

inclusion of application specific functions.

Replace Functions

This module consists of functions that handles the transfer of data

from parent element to child element during refinement and balancing

and vice versa during coarsening. The replace function is given as

parameter to the following p4est functions,

p4est_refine, with this function when the parent element is

replaced by four child elements, the replace function dictates the

transfer of data from parent to child.

p4est_coarsen, with this function when the four child elements

are replaced by a parent element, the replace function dictates the

transfer of data from parent to child

p4est_balance, with this function when the four child elements

are replaced by a parent element, the replace function dictates the

transfer of data from parent to child

27

Material Law

This module consists of functions to define the material law and

material properties based on the application. The functions in the

module define the constitutive law of the model considered for the

application. For instance, the application p4elas is an application to

modelled for a linear elasticity problem based on Hooke’s law. The

material law module of p4elas contains function for calculation of

stiffness matrix C, function to calculate strain vector based on

displacement vector and B matrix and function to calculate stress at

gauss points. Similarly, the applications p4plas and p4dama has

modules to model plasticity and isotropic damage respectively.

VTK handler

This module contains the helper functions to export the data from

p4est for post processing and visualization. This module makes use of

the vtk interface functions available in the p4est library. The functions

output the p4est data in the vtk format files which can be viewed using

post processing tool paraview.

Hanging Node Decoder

This is one of the prime module of the tool. The module helps to

decode the hanging node information based on the data from p4est. The

input data for the decoder function is the face_code attribute from the

p4est_lnodes data structure. The face_code attribute encodes the

hanging node information of each element as a bitwise integer.

Figure 4-8: Hanging Node Decoder

28

The first D (where D being the dimension) bits of the integer

represents the child id of the element with respect to the parent element

and the next D bits represent the status of the potentially hanging faces.

The decode function takes this face code as input and gives out an

integer array that contains the independent node numbers that

corresponds to the hanging nodes. It is to be noted that p4est_lnodes

does not assign node numbers for hanging nodes.

The applications developed are used to validate the correctness of

implementation by simulating various numerical problems as described

in Chapter 5.

29

CHAPTER 5

5 NUMERICAL EXAMPLES

In this section, the sample problems that were used to test and verify

the code are discussed. The main objective of these tests is to make sure the

proper implementation of the techniques and this is done by validating the

output of these sample problems. The following sample problems were

analysed and their outcomes are discussed below,

Patch test

Infinite plate with a circular hole

Slope stability

Extension of notched beam

These tests verify a range of features including the proper

implementation of treatment of hanging nodes, effect of different error

indicators for adaptive mesh refinement and the implementation of physics

based indicators. The comparison of the results to the analytical solution

was also carried out where ever applicable.

5.1 Patch test

In this case, a unit square domain is considered with hanging nodes

subjected to enforced displacement along one edge. The nodes along on the

bottom edge were fixed in y direction and the nodes along the left edge were

fixed in x direction. The right edge was subjected to enforced displacement

and plane stress condition was taken. The material properties considered

were Young modulus of 2.0 MPa and Poisson’s ratio of 0.3. The problem

description is represented in Figure 5-1.

The main objective of this patch test was to ensure the proper working

of the treatment of hanging nodes. In this case, the domain was subjected to

refinement along the top left corner and ideally the stress distribution across

the domain should be constant and the same was observed from the results

30

of the test. The displacement of the domain along x and y direction results

are shown in Figure 5-2and the stress distribution along x direction is shown

in Figure 5-3. The test performed as expected for both first and second order

discretization.

Figure 5-1: Patch test problem representation

Figure 5-2: Displacement of the domain along x and y direction

Figure 5-3: Stress distribution along x direction

Displacement x direction Displacement y direction

31

5.2 Infinite plate with a circular hole

The next example is an infinite plate with a circular hole of radius ‘a’

subjected to a unidirectional tensile load of ‘P’ in the x direction as shown

in Figure 5-4. In this case, only one quarter of the domain is analysed due

to symmetry along x and y axis. The analytical stress components for this

problem are [5]

𝜎𝑥𝑥(𝑥, 𝑦) = 𝑃 {1 −𝑎2

𝑟2 (3

2cos 2𝜃 + cos 4𝜃) +

3𝑎4

2𝑟4(cos 4𝜃)} (5.1)

𝜎𝑦𝑦(𝑥, 𝑦) = −𝑃 {𝑎2

𝑟2 (3

2cos 2𝜃 − cos 4𝜃) −

3𝑎4

2𝑟4(cos 4𝜃)} (5.2)

𝜎𝑥𝑦(𝑥, 𝑦) = −𝑃 {𝑎2

𝑟2 (3

2sin 2𝜃 + sin 4𝜃) +

3𝑎4

2𝑟4(sin 4𝜃)} (5.3)

The inner boundary of the hole is traction free and the right edge was

imposed with the tractions based on the analytical solutions in Equations

(5.1) – (5.3). The left edge is constrained in the x direction and the bottom

edge is constrained in the y direction, respectively. The plane stress

condition is considered and the parameters are

Figure 5-4: Infinite plate with a circular hole

Young modulus 3e07 MPa

Poisson Ratio 0.3

Load (P) 10 N/m2

a 1

32

The Figure 5-5 illustrates the stress distribution in the domain along x

direction and the Figure 5-6 shows the parallel repartitioning of the domain

during the course of adaptive refinement.

The adaptive refinement was done based on refinement indicators

described under refinement indicator section of the report.

Figure 5-5: Stress distribution along x direction

33

Figure 5-6: Parallel Repartitioning colour coded by MPI rank

34

Figure 5-7: Comparison of strain energy convergence

The Figure 5-7 shows the comparison between uniform refinement

and the adaptive refinement based on different refinement indicators. It can

be inferred from the figure that the convergence of strain energy in case of

adaptive refinement happens at a much lower degree of freedom when

compared to the uniform refinement. Thus, ensuring the solution

convergence at reduced computation cost.

35

5.3 Slope stability

In this example, the plasticity material model was tested. The

objective of the test was to ensure the adaptive mesh refinement of the

domain based on “plasticity indicator” a physics based indicator. The

problem description is shown in Figure 5-8. The bottom of the domain was

constrained and the model was subjected to gravitational load. The material

properties considered were,

Young modulus 20e3 MPa

Poisson Ratio 0.49

Constitutive law Mohr-Coulomb

Second order discretization was used in the test and the Mohr-

Coulomb parameters are listed below,

Friction angle 20

Dilatancy angle 20

Cohesion 50 Mpa

Hardening was not considered and perfect plasticity was used.

Figure 5-8: Slope stability problem representation

The physics based indicator worked as expected and the sensitive

zones were marked and refined in iterative time steps. The adaptive

refinement of the domain is shown in Figure 5-9.

36

Figure 5-9: Adaptive Refinement based on Plasticity Indicator

37

5.4 Extension of notched beam

This example is like previous example to test the adaptive refinement

of the domain based on “damage indicator” a physics based indicator. The

domain consists of a rectangular beam with a notch in the middle as shown

in Figure 5-10. The left edge of the beam is constrained in x direction and

the bottom edge is constrained in y direction. The right edge of the beam is

subjected to enforced displacement ‘u’. The material properties considered

were,

Young modulus 2.0 MPa

Poisson Ratio 0.3

Constitutive law Isotropic Damage

For damage, exponential softening rule was used and other damage

parameters are listed below,

Yield stress 0.91 MPa

E0 𝑓𝑡

√𝐸⁄

Ef 100E0

Figure 5-10: Extension of notched beam problem representation

38

Figure 5-11: Adaptive Refinement based on Damage Indicator

The damage indicator worked as expected and the damage

propagation was marked and refined in iterative time steps. The

adaptive refinement of the domain is shown in Figure 5-11.

39

CHAPTER 6

6 CONCLUSIONS

The main objective to develop a Parallel Adaptive Mesh Refinement

code was successfully achieved and verified. The refinement indicators

Carstensen and Kelly performed to the expectations and clearly marked the

sensitive zones. Notably, the Kelly indicator predications were in par with

the conclusions from the literature. Similarly, the performance of physical

indicator for plasticity and damage was also exceptionally good. These

outcomes would be a great support to extend the code further to explore 3D

problems and more physics based indicators. The code also could be used

as a base for Scalability study for large scale simulations in the future.

40

REFERENCES

1. Carsten Burstedde, Lucas C. Wilcox, and Omar Ghattas, p4est:

Scalable Algorithms for Parallel Adaptive Mesh Refinement on

Forests of Octrees. SIAM Journal on Scientific Computing 33 no. 3

(2011), pages 1103-1133.

2. Carsten Burstedde, Lucas C. Wilcox, and Tobin Isaac, The p4est

software for parallel AMR How to document and step-by-step

examples, 2014.

3. D.W Kelly et al, A posteriori error analysis and adaptive processes

in the finite element method: Part {I}--Error Analysis, Int. J. Num.

Meth. Engrg.

4. J. Alberty et al, Matlab Implementation of the Finite Element Method

in Elasticity, (DOI) 10.1007/s00607-002-1459-8.

5. H. Nguyen-Xuan, T. Nguyen-Thoi, Kaiyang Zeng, Wu S.C

Assessment of smoothed point interpolation methods for elastic

mechanics, Int. J. Numer. Meth. Biomed. Engng. 2010; 26:1635–

1655.

6. Joseph E. Flaherty, Adaptive Finite Element Techniques,

http://www.cs.rpi.edu/~flaherje/pdf/fea8.pdf

7. Wolfgang Bangreth, Finite Element Methods in Scientific Computing,

http://www.math.tamu.edu/~bangerth/videos/676/slides.15.pdf

8. PETSc, http://www.mcs.anl.gov/petsc, 2016.

41

9. Chapter 28 Stress Recovery, Introduction to Finite Element Methods,

University of Colorado at Boulder

http://www.colorado.edu/engineering/CAS/courses.d/IFEM.d/IFEM.

Ch28.d/IFEM.Ch28.pdf

10. Wolfgang Bangerth, Carsten Burstedde, Timo Heister, and

Martin Kronbichler, Algorithms and Data Structures for Massively

Parallel Generic Adaptive Finite Element Codes. Published in ACM

Transactions on Mathematical Software 38 No. 2 (2011), pages 14:1-

14:28.

11. Carsten Burstedde, Omar Ghattas, Georg Stadler, Lucas C.

Wilcox, Adaptive Mesh Refinement (AMR), CIG meeting on

Opportunities and Challenges in Computational Geophysics

California Institute of Technology, March 30, 2009.

12. Carsten Burstedde, Forest-of-octrees AMR: algorithms and

interfaces, Second [HPC]3 Workshop KAUST, Saudi Arabia, Feb 05,

2012.

13. Carsten Burstedde Parallel adaptive mesh refinement using

multiple octrees and the p4est software, August 29th, 2013.

14. Kevin M. Olson, Peter MacNeice, PARAMESH: A Parallel

Adaptive Mesh Refinement Community Toolkit, 1999.

15. B. S. Kirk, J. W. Peterson, R. H. Stogner, and G. F. Carey.

libMesh: A C++ Library for Parallel Adaptive Mesh

Refinement/Coarsening Simulations. Engineering with Computers,

22(3--4):237--254, 2006.

16. Jacob Fish, Ted Belytschko, A First Course in Finite Elements,

ISBN 978-0-470-03580-1

42

17. Berger, M. J. and Oliger, J. (1984). Adaptive Mesh Refinement

for Hyperbolic Partial Differential Equations. J. Comp. Phy., 53:484–

512.

18. D. L. De Zeeuw. A Quadtree-Based Adaptively-Refined

Cartesian-Grid Algorithm for Solution of the Euler Equations. PhD

thesis, University of Michigan, September 1993.

19. B. van der Holst and R. Keppens. Hybrid block-AMR in

Cartesian and curvilinear coordinates: MHD applications. Journal of

Computational Physics, 226:925-946,2007.

parallel adaptive finite element simulation using … · equations by classical analytical methods...

Documents