parallel adaptive finite element simulation using … · equations by classical analytical methods...
TRANSCRIPT
PARALLEL ADAPTIVE FINITE ELEMENT
SIMULATION USING DISTRIBUTED QUAD-
TREE/OCTREE FOREST
ARAVINTH RAMACHANDRAN
Vietnamese-German University
Thesis submitted in partial fulfilment
of the requirements for the Degree of
Master of Science in Computational Engineering
supervisors
Prof. Dr. techn Günther Meschke
MSc. Hoang-Giang Bui
Institute for Structural Mechanics
Ruhr University Bochum
Bochum, Germany
November, 2016
i
ABSTRACT
Adaptive Mesh Refinement approach helps to realize discretization that
improve the accuracy of the solution per degree of freedom and thereby
reduces the computational cost. In this thesis, a Parallel Adaptive Finite
Element implementation in C++ is described. The implementation is
based on p4est, a parallel mesh refinement library. The hanging nodes
that result from adaptive refinement are resolved by using constraint
conditions. The scalable algorithms in the p4est library are encoded
using octree based adaptivity and it allows to incorporate the dynamic
adaptivity in parallel. The parallel AMR finite element package is
developed as a modular platform which enables to develop AMR
applications and expand them easily. The capabilities of these
applications are demonstrated using various numerical examples.
ii
ACKNOWLEDGEMENT
Firstly, I would like to express my sincere gratitude to my supervisors
Prof. Dr. Techn Günther Meschke and MSc. Hoang-Giang Bui for their
continuous support for my master thesis research. I would also like to extend
my thanks to Hoang-Giang Bui for his patience, motivation and guidance that
helped me at throughout the research and writing of this thesis. My sincere
thanks also go to DAAD, for offering me the scholarship to carry out the
research at Ruhr University Bochum. I thank Dr. Han Duc Tran, my academic
coordinator for his support throughout the thesis. Last but not the least, I would
like to thank my parents for their support throughout my life.
iii
TABLE OF CONTENTS
Abstract.......................................................................................................... i
Acknowledgement ........................................................................................ ii
Table of Contents ........................................................................................ iii
List of Figures .............................................................................................. v
1 Introduction ............................................................................................ 1
1.1 Motivation for Adaptive Mesh Refinement ..................................... 1
1.2 Objectives ............................................................................................ 3
1.2 Thesis Organisation ............................................................................. 4
2 Literature Review ................................................................................... 5
3 Methodology........................................................................................... 9
3.1 Treatment of Hanging Nodes .............................................................. 9
3.2 Application of Constraint Condition to FEM System ...................... 11
3.3 Refinement Indicators ....................................................................... 13
3.3.1 Kelly Error Estimator .................................................................. 14
3.3.2 Carstensen Error Estimator ......................................................... 14
3.4 Data Transfer for Non-Linear AMR ................................................. 15
4 Implementation ..................................................................................... 17
4.1 p4est and Parallel AMR Implementation .......................................... 18
4.1.1 p4est Interface Schematics .......................................................... 19
4.1.2 2:1 Balancing ............................................................................... 22
4.1.3 Parallel Repartitioning ................................................................. 23
4.2 Structure of Parallel Adaptive Code ................................................. 24
iv
5 Numerical Examples ............................................................................ 29
5.1 Patch test ............................................................................................ 29
5.2 Infinite plate with a circular hole ...................................................... 31
5.3 Slope stability .................................................................................... 35
5.4 Extension of notched beam ............................................................... 37
6 Conclusions .......................................................................................... 39
References .................................................................................................. 40
v
LIST OF FIGURES
Figure 1-1: Linear Elasticity Boundary Value Problem .............................. 1
Figure 2-1: Types of Adaptive Mesh Refinement ....................................... 5
Figure 3-1: Adaptive Mesh Refinement with Hanging Node ...................... 9
Figure 3-2: Hanging Node along Edge 1-2 ................................................ 10
Figure 3-3: Sample domain with Hanging Node ....................................... 12
Figure 3-4: Assembly of RHS based on constraint condition ................... 13
Figure 3-5: Non-Linear AMR Data Transfer ............................................. 16
Figure 4-1: Parallel AMR Tool components.............................................. 17
Figure 4-2: Parallel AMR Tool features .................................................... 18
Figure 4-3: p4est: Tree and Forest terminology [12,13] ............................ 19
Figure 4-4: Adaptive Mesh Refinement Simulation using p4est [2] ......... 19
Figure 4-5: 2:1 Balancing ........................................................................... 22
Figure 4-6: Parallel Partitioning [11,12,13] ............................................... 23
Figure 4-7: Parallel AMR Tool Generic layout ......................................... 24
Figure 4-8: Hanging Node Decoder ........................................................... 27
Figure 5-1: Patch test problem representation ........................................... 30
Figure 5-2: Displacement of the domain along x and y direction ............. 30
Figure 5-3: Stress distribution along x direction ........................................ 30
Figure 5-4: Infinite plate with a circular hole ............................................ 31
Figure 5-5: Stress distribution along x direction ........................................ 32
Figure 5-6: Parallel Repartitioning colour coded by MPI rank ................. 33
Figure 5-7: Comparison of strain energy convergence .............................. 34
vi
Figure 5-8: Slope stability problem representation .................................... 35
Figure 5-9: Adaptive Refinement based on Plasticity Indicator ................ 36
Figure 5-10: Extension of notched beam problem representation ............. 37
Figure 5-11: Adaptive Refinement based on Damage Indicator ............... 38
1
CHAPTER 1
1 INTRODUCTION
Many physical phenomena in engineering and science can be
described in terms of partial differential equations. In general, solving these
equations by classical analytical methods for arbitrary shapes is almost
impossible. The finite element method (FEM) is a numerical technique for
solving problems which are described by partial differential equations or
can be formulated as functional minimization. A domain of interest is
represented as an assembly of finite elements. Approximating functions in
finite elements are determined in terms of nodal values of a physical field
which is sought. A continuous physical problem is transformed into a
discretized finite element problem with unknown nodal values [16]. The
individual elements are connected by a topological map called a mesh.
When modelling problems involving localized deformation, steep gradients
or discontinuous surfaces, a very fine mesh is usually required for accurate
results. To reduce the computational time, adaptive refinement techniques
are usually preferred over the uniform mesh refinement.
1.1 Motivation for Adaptive Mesh Refinement
Consider the linear elastic problem on an arbitrary shaped domain(Ω)
shown in Figure 1-1, subjected to traction along the boundary 𝛤𝑡
Figure 1-1: Linear Elasticity Boundary Value Problem
2
The governing equations for linear elasticity are,
Equilibrium Equation,
𝑑𝑖𝑣𝝈 + 𝜌𝒃 = 𝟎 𝑜𝑛 𝛺
Kinematics equation,
𝛆 = 𝛻𝑠𝐮 = 𝛻𝐮 + 𝛻𝑇𝐮
Constitutive equation,
𝛔 = ℂ ∶ 𝛆
The traction and displacement boundary condition in the domain are given as,
𝒖 = �̅� 𝑜𝑛 𝛤𝑢
𝒕 = �̅� 𝑜𝑛 𝛤𝑡
The variational form or the weak form of the governing equations is
∫ 𝛿𝜺 ∶ 𝝈
𝛺
𝑑𝑉 = ∫ 𝛿𝒖 ∙ 𝜌𝒃
𝛺
𝑑𝑉 + ∫ 𝛿𝒖 ∙ �̅�
𝛤𝑡
𝑑𝐴
where 𝛿𝐮 ∈ 𝐻𝐷1(Ω)
u is found such that 𝒖 ∈ 𝐻𝐷1(𝛺) = {𝜑 ∈ 𝐻1(𝛺):𝜑 = �̅� 𝑜𝑛 𝛤𝑢}
We triangulate the domain to get the triangulation (mesh) 𝒯 and
choose a discrete Sobolev space defined over mesh 𝒯 such as
𝑆𝐷1(𝒯) ⊂ 𝐻𝐷
1(Ω)
The solution 𝐮ℎ ∈ 𝑆𝐷1(𝒯) is sought satisfying for all 𝛿𝐮ℎ ∈ 𝑆𝐷
1(𝒯)
∫ 𝛿𝛆𝒉 : 𝛔𝒉
Ω
𝑑𝑉 = ∫ 𝛿𝐮𝒉 ∙ 𝜌𝐛
Ω
𝑑𝑉 + ∫ 𝛿𝐮𝒉 ∙ 𝐭̅
𝛤𝑡
𝑑𝐴
The residual of the solution is given as,
∫ 𝛿𝛆 : (𝛔 − 𝛔𝒉)
Ω
𝑑𝑉 = ∫ 𝛿𝐮 ∙ 𝜌𝐛
Ω
𝑑𝑉 + ∫ 𝛿𝐮 ∙ 𝐭̅
𝛤𝑡
𝑑𝐴 − ∫ 𝛿𝛆 : 𝛔𝒉
Ω
𝑑𝑉
= ℜΩ(𝐮, 𝐮ℎ)
3
The Friedrichs and Cauchy-Schwarz inequatilities shows that
𝑠𝑢𝑝 𝛿𝐮∈𝐻𝐷1 (Ω)
‖𝛿𝐮‖𝐻1(Ω)=1
{ℜΩ(𝐮, 𝐮ℎ)}
≤ ‖ℋ(𝐮): (𝛔 − 𝛔𝒉)‖𝐻1(Ω)
≤ √1 + 𝑐Ω2𝑠𝑢𝑝 𝛿𝐮∈𝐻𝐷
1 (Ω)
‖𝛿𝐮‖𝐻1(Ω)=1
{ℜΩ(𝐮, 𝐮ℎ)}
This indicates that the finite element solution error is bounded above
and below by the residual of the solution and when the residual is driven
down the accuracy of the solution improves.
𝑠𝑢𝑝 𝛿𝐮∈𝐻𝐷1 (Ω)
‖𝛿𝐮‖𝐻1(Ω)=1
{∫ 𝛿𝐮 ∙ 𝜌𝐛
Ω
𝑑𝑉 + ∫ 𝛿𝐮 ∙ 𝐭̅
𝛤𝑡
𝑑𝐴 − ∫ 𝛿𝛆:𝛔𝒉
Ω
𝑑𝑉} → 𝑚𝑖𝑛
‖𝐮 − 𝐮𝒉‖𝐻1(Ω)
≤ 𝑐 {∑ ℎ𝐾2
𝐾∈𝒯
‖𝑅𝐾(𝐮𝒉)‖2+ ∑ ℎ𝐸
𝐾∈𝒯
‖𝑅𝐸(𝐮𝒉)‖2}
This implies that error is dominated by largest cell size and the
(global) norm of the residual. To reduce the error, the mesh must be refined
globally.
However, a closer analysis show that to reduce the error the mesh need
to be refined only where the local residual is large. By this the accuracy of
the solution is retained but at a reduced computational cost.
1.2 Objectives
The main objective of this thesis is to develop a C ++ based Parallel
Adaptive Mesh Refinement FE tool that would provide a platform to
develop Parallel AMR applications easily. To achieve this, the tool would
be based on open source libraries that provides Parallelism, Adaptive Mesh
Refinement, and Solver capabilities. The next objective is to verify and
validate the functioning of various features of the tool. This is done by
creating sample applications that encompasses different numerical
problems. The output from these applications would be validated to ensure
the proper implementation and working of the tool.
4
1.2 Thesis Organisation
The thesis is structured as follows. Following this introduction,
provided here in Chapter 1, the different AMR techniques, their advantages
and disadvantages, are presented in Chapter 2. In Chapter 3, the different
methodologies that are used in the code development are discussed. Chapter
4 focuses on the implementation of the methods in the code and discuss
about the structure of the code. Validation of the Parallel AMR code and
their outcomes are discussed in the Chapter 5. The conclusions of the thesis
are highlighted in Chapter 6.
5
CHAPTER 2
2 LITERATURE REVIEW
The principal idea of adaptive mesh refinement (AMR) is to enable a
higher accuracy solution at lower costs, by optimal distribution of grid
points for the computational region. In a nutshell, AMR is a hierarchical
inter-mesh communication scheme. It relies on locally refined mesh or mesh
patches to increase the resolution of an underlying coarse mesh only where
needed. It can lighten some of the difficulties in the generation of high
quality grid and reduce the number of iterations between the grid generation
and solution required for tailoring the grid to the specification of a problem.
Thus, it can offer orders of magnitude saving in computational and storage
costs over an equivalent uniformly refined mesh.
The Adaptive Mesh Refinement strategies can be classified into four
broad categories depending on the partitioning algorithm and/or data
structure used to encode connectivity. The different approaches are, patch-
based AMR methods, cell-based AMR methods, block-based AMR
methods, and hybrid block-based AMR techniques. Figure 2-1 (a) to 2-1 (c)
shows the refinement resulting from the patch-based, cell-based, and block-
based AMR schemes. The elements for refinement are highlighted in the
base Cartesian mesh.
Figure 2-1: Types of Adaptive Mesh Refinement
6
Berger et al. [17] introduced the concept of AMR on structured grids.
This approach is now more generally referred to as patch-based AMR. The
technique begins with a coarse base-level Cartesian grid and, as the
calculation progresses, individual grid cells are marked for refinement. The
patch-based AMR strategy relies on algorithms to organize collections of
individual computational cells into rectangular patches. The mesh within
these newly formed patches can then be further refined, creating additional
nested patches.
The cell-based AMR, as proposed and discussed in [18] states that
cells may be refined individually and is stored using a tree data structure
(quad-tree in two dimensions, and octree for three dimensions). This cell-
based tree structure is flexible and readily allows for the local refinement of
the mesh by keeping track of the computational cell connectivity as new
grid points are generated by the refinement process (4 new cells in two
dimensions and 8 in three dimensions).
In a block-based AMR strategy, mesh adaptation is accomplished by
the dividing and coarsening of entire predefined grid blocks or groupings of
cells. Although not required, each of the groupings or blocks generally has
an equal number of cells. Tree data structures are again used for tracking
block connectivity and mesh refinement; however, the block-based AMR
strategy results in a much lighter tree structure as compared to that of cell-
based methods, which typically created larger numbers of mesh cells during
the refinement process.
A hybrid block-based AMR approach has been explained by Holst
and Keppens [19]. The hybrid approach incorporates methods from patch-
based strategies to modify tree data structure. The proposed hybrid AMR
strategy requires two means to traverse the grid hierarchy, in which, in
addition to the tree data structure a linked list of grid pointers are also
needed.
The adaptivity algorithms must be scalable as that of numerical
algorithms to achieve large scale adaptivity. The key features required for
the large scale adaptivity as discussed in [11] are, efficient mesh
partitioning, access of connectivity information at local level and exchange
of connectivity information across processes (i.e. parallel repartitioning).
7
In case of domain decomposition or mesh partitioning each
element must be assigned to a unique processor. The whole mesh is not
stored in any processors and the storage of the mesh is distributed. The
connectivity information between elements must be known beforehand
to facilitate the exchange of numerical information with neighbour
elements. The next requirement for the algorithm is to effectively find
the neighbour elements, find the owner processor of the neighbour and
find the degrees of freedom. The final requirement would be to have
global redistribution of elements. This main objective of parallel
partitioning is to minimize the run time. In addition to all the above the
encoding of connectivity information must be done with minimal
storage.
The cell or tree based AMR algorithm is found to satisfy [12,13]
all the above exceptionally. The octree algorithm is a hierarchic AMR
that uses Space Filling Curves (SFC). SPC ensures fast parallel
partitioning and elementary operations. The space filling curve (SFC)
is the direct representation of the octree structure thus provides better
encoding connectivity. Also, SFC maps the 1D curve to the 2D or 3D
domain thereby the ordering is covered effectively. Scaling to large
extent is possible because the tree is a recursive structure and the easy
traversing of the tree leaf makes the scheme cache-friendly.
The octree based structure makes the transfer of information
within the process and the across the process very effective [11]. The
search for a same-processor neighbour element is basically a tree search
approach executed in O(log𝑁
𝑃). Similarly, finding the owner processor
of the neighbour is executed in O(log P). It is a binary search operation.
The search of parent or child is a vertical tree step and has runtime O(1).
To summarize, the octree based AMR algorithm has,
Optimal cache due to Space filling curve ordering.
Straight forward expression of hanging nodes for FEM
Can be extended to complex geometries
Scalability is of unrestrained
8
All the above functionalities discussed are available in a lightweight parallel
adaptive mesh library p4est [1]. p4est treats the mesh topology as a forest
of octrees and hence complex geometries can be handled. The connectivity
is encoded to have minimal memory footprint, for local storage 24 bytes per
element and for global storage 32 bytes for each processor [11]. p4est can
generate and adapt multi octree meshes with up to 5.13 × 1011 octants on as
many as 220,320 CPU cores and execute the 2:1 balance algorithm in less
than 10 seconds per million octants per process. p4est also provides
interface to implement these algorithms that makes it a suitable candidate to
use in the development of Parallel Adaptive Mesh Refinement Tool.
9
CHAPTER 3
3 METHODOLOGY
In this chapter the strategies used in implementing the adaptive mesh
refinement are discussed. To develop the Adaptive Mesh Refinement code,
the following operations must be handled effectively.
Treatment of Hanging Nodes using Constraint Conditions.
Application of Constraint Condition to FEM System
Refinement Indicators
Data Transfer for Non-Linear AMR
The occurrence of these conditions and the specific strategies that were
used to handle them are discussed below.
3.1 Treatment of Hanging Nodes
In Adaptive Mesh Refinement when the coarse mesh is subjected to
refinement it leads to the occurrence of “Hanging Nodes” as shown in
Figure 3-1. With the introduction of hanging nodes the mesh is no longer
conformal and our standard method of basis construction would create
discontinuities. The strategy of handling continuity at hanging nodes is to
constrain the basis. [6,7]
Figure 3-1: Adaptive Mesh Refinement with Hanging Node
To constrain the basis, the shape functions are defined on each cell as
usual. The functions in finite element approximation are defined as linear
combination of shape functions.
10
Thereby the functions in the solution space are globally continuous. The
idea is explained in detail in the following example for a piecewise-bilinear
basis but can be extended to higher order polynomials similarly.
Let us consider the edge between vertices 1 and 2 containing a
hanging node at 3 as shown in Figure 3-2. The size of the element is taken
as ‘h’ with the coordinate system located at vertex 2. The elements along
edge 1-2 are indexed E1, E2 and E3 as shown in Figure 3-2.
Figure 3-2: Hanging Node along Edge 1-2
Firstly, the shape function on each element is constructed as usual.
The shape functions of the elements that are non-zero on the edge 1-2 are
given by equations below. The second subscript on Nje denotes the element
index,
For Element 1 (E1),
𝑁11 = (ℎ+𝑥
ℎ) (
𝑦
ℎ) , 𝑁21 = (
ℎ+𝑥
ℎ) (
𝑦
ℎ)
For Element 2 (E2),
𝑁12 = (ℎ 2⁄ −𝑥
ℎ 2⁄) (
𝑦−ℎ 2⁄
ℎ 2⁄) , 𝑁32 = (
ℎ 2⁄ −𝑥
ℎ 2⁄) (
ℎ−𝑦
ℎ 2⁄)
For Element 3 (E3),
𝑁23 = (ℎ 2⁄ −𝑥
ℎ 2⁄) (
ℎ 2−𝑦⁄
ℎ 2⁄) , 𝑁33 = (
ℎ 2⁄ −𝑥
ℎ 2⁄) (
𝑦
ℎ 2⁄)
Secondly, the functions in the solution space are taken as linear
combination of shape functions. This ensures the functions in solution space
are globally continuous.
11
The displacement on Element 1 along edge 1-2 is given as
𝑈(𝑥, 𝑦) = 𝑢1𝑁11(𝑥, 𝑦) + 𝑢2𝑁21(𝑥, 𝑦) (3.1)
Evaluating this at Node 3 yields,
𝑈(𝑥3, 𝑦3) =𝑢1+𝑢2
2, x < 0 (3.2)
Similarly, displacement on Element 2 and 3 along edge 1-2 is given as
𝑈(𝑥, 𝑦) = {𝑢1𝑁12(𝑥, 𝑦) + 𝑢3𝑁32(𝑥, 𝑦), 𝑖𝑓 𝑦 ≥ ℎ
2⁄
𝑢2𝑁23(𝑥, 𝑦) + 𝑢3𝑁33(𝑥, 𝑦), 𝑖𝑓 𝑦 < ℎ2⁄
In either case at Node 3,
𝑈(𝑥3, 𝑦3) = 𝑢3, x > 0 (3.3)
Equating Equations 3.2 and 3.3 for U (x3, y3) yields the “constraint
condition”.
𝑢3 =𝑢1 + 𝑢2
2
3.2 Application of Constraint Condition to FEM System
In this section the procedure to apply the constraint conditions to the
Finite Element system is discussed. The key feature of FEM is that the same
operation on every element is repeated. This property of finite element is
effective while assembling large system as the process remains same across
the assembly procedure.
This means that the elementary operation must remain same
irrespective of the presence or absence of hanging nodes. This is achieved
by the following procedure as outlined in [7,10].
12
Step 1: Build local stiffness matrix (Ke) and residual force vector (Fe) with
all DOF's as if there were no hanging nodes.
For instance, for the system shown in Figure 3-3 the residual force
vector of the Elements E1. E2 and E3 is given as,
For Element (E1)
[ 𝐹1
1
𝐹21
𝐹31
𝐹41]
For Element (E2)
[ 𝐹6
2
𝐹72
𝐹42
𝐹102 ]
For Element (E3)
[ 𝐹2
3
𝐹53
𝐹63
𝐹73]
Figure 3-3: Sample domain with Hanging Node
Step 2: Modify when copying local contributions into global matrices K, F
In this step while building the global stiffness matrix and residual
vector from local contributions constraint conditions are applied. The
contribution of the hanging node is distributed to its support nodes and the
hanging node is eliminated from the system. The Figure 3-4 illustrates this
operation.
13
Step 3: Solve the system KU=F
Figure 3-4: Assembly of RHS based on constraint condition
Step 4: Get all components of U and distribute.
Once the system of equations is solved the displacement of
independent nodes are extracted and distributed to the hanging nodes.
In this example, it would be
𝑢6 =𝑢2 + 𝑢4
2
3.3 Refinement Indicators
The next key feature for any adaptive refinement code is the
refinement indicator. The refinement indicator marks the elements that need
to be refined in the following iterative step. The choice of the element is to
be made such that the sensitive regions of the domain that impact the
convergence of the solution are covered.
The refinement indicators are problem specific. They are the chosen
such that they reflect the better selection of elements according to the nature
of problem. The refinement indicators can either be any physical parameter
related to the problem or a posteriori local error estimator.
14
An example for physics based indicator is the damage indicator for the
problems which model’s damage and for local error estimator two type
of error estimators are implemented and are discussed in detail in
following sections.
3.3.1 Kelly Error Estimator
Kelly error estimator belongs to the class of explicit error estimator
that involve the direct computation of the interior element residuals and
jumps at the element boundaries to calculate the error in the energy norm.
From [3] it can be noted that the contribution of the internal residual is
negligible in comparison to the jump across the boundary and the local error
is given as
𝜂2 = ∑ℎ
24𝑎∫ 𝐽2𝑑𝛤𝛤
where J is the jump of the gradient across the element edge γ and h is the
size of the element and a is a constant related to the material properties.
For instance, in case of a problem defined by Poisson’s equation with
Dirichlet and Neumann boundary conditions,
𝑢 = 0 𝑜𝑛 𝛤𝐷; 𝑛. 𝛻𝑢 − 𝑔 = 0 𝑜𝑛 𝛤𝑁
the jump in the gradients are defined as
𝐽 = {𝑛. 𝛻𝑢ℎ + 𝑛′. 𝛻𝑢′ℎ 𝑖𝑓 𝛾 ⊈ 𝛤
𝑔 − 𝑛. 𝛻𝑢ℎ 𝑖𝑓 𝛾 ⊂ 𝛤𝑁0 𝑖𝑓 𝛾 ⊂ 𝛤𝐷
where on inter element edges, γ ⊈ Γ, the edge γ separates the elements.
3.3.2 Carstensen Error Estimator
Carstensen error estimator is also an explicit error estimator based on
the comparison between averaged stress field and discrete (discontinuous)
stress field. The error estimator is based on the local error indicator,
𝜂𝑇 = ‖𝜎ℎ∗ − 𝜎ℎ‖𝐿2(𝑇)
where σ*h is the averaged stress field and σh is the discrete stress field.
15
The error estimator is given as
𝜂 = (∑𝜂𝑇2)
1/2
This error estimator is used in practical computation as a representation of
discretization error. From [4] the relative quantity,
𝜂 =√∑ 𝜂𝑇
2𝑇
√∑ ∫ σℎ∗ :₵−1σℎ
∗ 𝑑𝑥𝑇𝑇
with 𝜂𝑇2 = ∫ (σℎ − σℎ
∗ ): ₵−1(σℎ − σℎ∗ )𝑑𝑥
𝑇
is used as refinement indicator.
In the code both the physics based indicator and indicator based on
error estimator have been effectively implemented and tested. The Chapter
Numerical Examples explains the implementation of different indicators
and their outputs for various cases. In case of example 2 of Chapter 5 since
the analytical solution was available beforehand a displacement error
indicator was implemented. The displacement error was calculated as the
difference between analytical solution and discrete solution.
3.4 Data Transfer for Non-Linear AMR
In this section the procedure to transfer data between the iterations is
briefed. This is a vital requirement in case of problems that involves non-
linear modelling. For adaptive mesh refinement involving the non-linear
analysis the data from old mesh like stress at gauss points and other history
variables must be transferred to the new mesh. The variable transfer
procedure is executed in following steps as depicted in Figure 3-5.
Step 1: The variable in the old mesh at the Gauss points locations is
transferred to nodes using a global L2 projection.
Step 2: Once the mesh is locally refined the variables are transferred from
the existing nodes to the new nodes based on the constraint condition which
would account for hanging nodes as well.
16
Figure 3-5: Non-Linear AMR Data Transfer
Step 3: Finally, in this step the variables at the nodes are transferred back to
the gauss point using standard interpolation procedure.
17
CHAPTER 4
4 IMPLEMENTATION
This chapter of the report details about the Parallel Adaptive
Mesh Refinement code implementation, based on the ideas discussed
in the previous chapter. This chapter describes the software and the
features in two parts. The first part of the chapter describes about p4est,
the tree management library and the implementations done using it and
the latter covers the software layout and its features.
The Parallel AMR tool was developed using C++ programming
language. The two major open source libraries that forms the backbone
of the tool are;
PETSc
p4est
Figure 4-1: Parallel AMR Tool components
The Portable, Extensible Toolkit for Scientific Computation
(PETSc) [8], is a library of parallel linear and non-linear solvers for
applications modelled based on partial differential equations. The
Parallel AMR tool uses the features like matrix, vectors and SNES
solvers of PETSc to solve the linear and non-linear equations in
parallel.
The Figure 4-2 shows the salient features of the tool. The features
such as domain decomposition, 2:1 Balancing, parallel repartitioning
are achieved using the p4est library.
18
The hanging constraints are calculated and applied as described
in Chapter 3. The numerical solver capabilities are inherited from the
PETSc library.
Figure 4-2: Parallel AMR Tool features
4.1 p4est and Parallel AMR Implementation
p4est [1] is a software library that enables dynamic management
of a collection of adaptive octrees also termed as forest of octrees. The
term “octree” denotes a recursive tree structure where each node is
either a leaf or has eight children. The analogous construction in two
dimensions is termed “quad-tree,” where nodes have four children
instead of eight. Octrees and quad-trees can be associated with 3D and
2D cubic domains, where tree nodes are called octants and quadrants,
respectively, and the root node corresponds to a cubic domain that is
recursively subdivided according to the tree structure. The concept of
the tree is shown in Figure 4-3 [12,13]. In nutshell, p4est is a Parallel
AMR on Forests of Octrees.
The term “forest” to describe a collection of such logical cubes
that are connected conformingly through faces, edges, or corners, each
cube corresponding to an independent tree. Each octant or quadrant
belongs to precisely one process and is stored only there and the
distributed storage is implied by the term “Parallel”.
19
Figure 4-3: p4est: Tree and Forest terminology [12,13]
4.1.1 p4est Interface Schematics
The p4est library has many application program interfaces to
establish parallel adaptive refinement. The library is a pure AMR
module. This is possible since p4est separates the AMR topology from
any numerical information: The former is stored and modified internal
to the p4est software library, while an application is free to define the
way it uses this information and arranges numerical and other data [2].
Figure 4-4: Adaptive Mesh Refinement Simulation using p4est [2]
A general AMR simulation pipeline using p4est is shown in
Figure 4-4 [2].
20
The general task flow for the AMR application using p4est can
be divided into three parts,
Create a coarse mesh connectivity.
Modify the mesh by subjecting to refinement and partition.
Relate the mesh information from p4est to the application.
While the first operation is carried out redundantly in across all
processors, the second and third strictly implement distributed
parallelism.
The basic structure used in p4est is a connectivity of quad-trees
(2D) or octrees (3D) which covers the domain of interest in a
conforming macro-mesh. Thus, the mesh primitives used in p4est are
quadrilaterals in 2D and hexahedra in 3D. The p4est connectivity is a
data structure that contains numbers and orientations of neighbouring
coarse cells. It can be created by reading in the mesh data using the
built-in functions of p4est.
Once the macro structure connectivity is created then the
distributed p4est structure can be build and modify it in place. The
mesh management capabilities available for the application are,
Refine Adaptively subdivide octants based on a refinement
marker or callback function, once or recursively.
Coarsen Replace families of eight child octants by their common
parent octant, once or recursively.
Partition Redistribute the octants in parallel, according to a given
target number of octants for each process, or weights prescribed for all
octants.
Balance Ensure at most 2:1 size relations between neighbouring
octants by local refinement where necessary.
21
Nodes Create a globally unique numbering of the mesh nodes,
considering the classification into “independent” and “hanging” nodes.
Refinement and coarsening are controlled by callback functions
that usually query fags determined by the application. They provide two
modes of operation, non-recursive and recursive. Non-recursive Refine
replaces an octant with its eight children but does not consider newly
created children for refinement. Non-recursive Coarsen replaces eight
octant siblings by their parent but does not investigate the role of the
newly created parent as a sibling. Recursive mode, on the other hand,
is capable of radically changing the forest within one call, which is
sometimes advantageous for creation of a static or initial mesh
according to physical criteria.
The node enumeration is done using p4est_nodes or
p4est_lnodes. For a given p4est structure at an instance the enumeration
of the nodes is carried out in two steps,
Creation of a ghost layer, it is created using
p4est_ghost_new function.
Enumerate the nodes, the enumeration is carried out using
p4est_ nodes or p4est_lnodes.
The independent nodes are only numbered and the information
about hanging node on each octant is stored separated which can be
decoded for further use. Each independent node is assigned to one
owner process. The nodes are numbered globally in sequence of their
owner processes.
22
4.1.2 2:1 Balancing
2:1 balancing ensure that neighbouring mesh elements are either
of the same size or at most half or twice as big. 2:1 balancing limits the
number of hanging nodes on an element edge to one. The impending
introduction of a second hanging node on an edge requires refinement
of a neighbouring element. The Refine and Coarsen operations
generally destroy this invariant, which necessitates 2:1 Balance to be
re-established. This is done using the p4est_balance function.
Figure 4-5: 2:1 Balancing
The Figure 4-5 depicts 2:1 balancing application during
refinement. The mesh represented in Figure 4-5 (a) contains one
hanging node along the edge, in this scenario if the element highlighted
is refined it would lead to more hanging nodes along the edge. To avoid
this situation the element adjacent to the highlighted element is also
refined as shown in Figure 4-5 (b). This maintains the hanging node to
be not more than one along an edge. The objective of 2:1 balancing is
to avoid the complexities arise when there are too many hanging nodes
on an edge. 2:1 balancing also enables effective implementation of
hanging node treatment.
23
4.1.3 Parallel Repartitioning
Parallel partitioning generally refers to a redistribution of mesh
primitives among processes without changing the global mesh
topology (i.e., the global number of primitives or their neighbourhood
relations). The objective is most often to achieve load-balance (i.e.,
equal distribution of the computational work uniformly among
processes), which is necessary to ensure parallel scalability of an
application. This is done using the p4est_partition function.
Figure 4-6: Parallel Partitioning [11,12,13]
Each processor in a parallel program may only store a part of the
entire forest. The number of octants in each processor is not much
larger than the total number of cells divided by the number of
processors. For example, in the Figure 4-6 [11,12,13] the quad-tree
structure contains sixteen quads that are distributed among three
processors at an average of about five quads per processor.
Partition can also be carried out based on a user-specified weight
function that returns a non-negative integer weight for each octant and
creating a partition that is evenly distributed by weight.
24
4.2 Structure of Parallel Adaptive Code
The Parallel AMR tool was programmed based on a modular
approach. The various functionalities of the tool were programmed as
separate modules. These modules were finally linked together to
complete the package. This approach has a great advantage since the
code could be extended or modified easily in the future. The re usability
of the modules shortens the development time of the application and
makes debugging a lot easier. A generic layout of the tool is shown in
Figure 4-7 and the brief description of the modules are discussed
further.
Figure 4-7: Parallel AMR Tool Generic layout
25
The basic design of a user defined application starts with the
inclusion of the core module P4_structural_application. This provides
the common functions that are required such as to read inputs, define
replacement functions etc. This is followed by the addition of the
Material Law module and VTK handler module that provides to define
the system and export the output.
Core Module
The core module of the tool is the P4_structural_application. It
groups most of the common functionalities of the Finite Element
System. That includes the parallel assembly functions for stiffness
matrix and load vector, utilities for reading the input mesh and interface
functions for other modules that are associated with the tool. These
interfaces allow the access to the functions in other modules for various
operations. By this approach the core module acts as a central hub
where the operations are defined in a generic fashion that can be
augmented and enhanced based on requirements. Some key modules
and their functions are discussed below.
Quadrature Module
The quadrature module consists of the Gaussian points and
weights. These are essential to perform the Gauss quadrature integral
approximation. The quadrature functions take the dimension and
degree as parameters and return the quadrature point and weights as
output.
Math Utilities Module
This module consists of the specific mathematical operations
that are required for the tool. These operations are used at various
instances in the tool.
26
The notable functions that are available in the module are,
To calculate the coordinates of the nodes. The function uses the
p4est_quadrant_corner_node from p4est to calculate the node
coordinates.
To calculate the distance between two points.
To calculate the resultant of a vector from the components.
To extrapolate the variables from gauss, point to the corner
nodes. This operation is used in stress recovery [9].
As the module is an independent unit in the tool, it allows for
inclusion of application specific functions.
Replace Functions
This module consists of functions that handles the transfer of data
from parent element to child element during refinement and balancing
and vice versa during coarsening. The replace function is given as
parameter to the following p4est functions,
p4est_refine, with this function when the parent element is
replaced by four child elements, the replace function dictates the
transfer of data from parent to child.
p4est_coarsen, with this function when the four child elements
are replaced by a parent element, the replace function dictates the
transfer of data from parent to child
p4est_balance, with this function when the four child elements
are replaced by a parent element, the replace function dictates the
transfer of data from parent to child
27
Material Law
This module consists of functions to define the material law and
material properties based on the application. The functions in the
module define the constitutive law of the model considered for the
application. For instance, the application p4elas is an application to
modelled for a linear elasticity problem based on Hooke’s law. The
material law module of p4elas contains function for calculation of
stiffness matrix C, function to calculate strain vector based on
displacement vector and B matrix and function to calculate stress at
gauss points. Similarly, the applications p4plas and p4dama has
modules to model plasticity and isotropic damage respectively.
VTK handler
This module contains the helper functions to export the data from
p4est for post processing and visualization. This module makes use of
the vtk interface functions available in the p4est library. The functions
output the p4est data in the vtk format files which can be viewed using
post processing tool paraview.
Hanging Node Decoder
This is one of the prime module of the tool. The module helps to
decode the hanging node information based on the data from p4est. The
input data for the decoder function is the face_code attribute from the
p4est_lnodes data structure. The face_code attribute encodes the
hanging node information of each element as a bitwise integer.
Figure 4-8: Hanging Node Decoder
28
The first D (where D being the dimension) bits of the integer
represents the child id of the element with respect to the parent element
and the next D bits represent the status of the potentially hanging faces.
The decode function takes this face code as input and gives out an
integer array that contains the independent node numbers that
corresponds to the hanging nodes. It is to be noted that p4est_lnodes
does not assign node numbers for hanging nodes.
The applications developed are used to validate the correctness of
implementation by simulating various numerical problems as described
in Chapter 5.
29
CHAPTER 5
5 NUMERICAL EXAMPLES
In this section, the sample problems that were used to test and verify
the code are discussed. The main objective of these tests is to make sure the
proper implementation of the techniques and this is done by validating the
output of these sample problems. The following sample problems were
analysed and their outcomes are discussed below,
Patch test
Infinite plate with a circular hole
Slope stability
Extension of notched beam
These tests verify a range of features including the proper
implementation of treatment of hanging nodes, effect of different error
indicators for adaptive mesh refinement and the implementation of physics
based indicators. The comparison of the results to the analytical solution
was also carried out where ever applicable.
5.1 Patch test
In this case, a unit square domain is considered with hanging nodes
subjected to enforced displacement along one edge. The nodes along on the
bottom edge were fixed in y direction and the nodes along the left edge were
fixed in x direction. The right edge was subjected to enforced displacement
and plane stress condition was taken. The material properties considered
were Young modulus of 2.0 MPa and Poisson’s ratio of 0.3. The problem
description is represented in Figure 5-1.
The main objective of this patch test was to ensure the proper working
of the treatment of hanging nodes. In this case, the domain was subjected to
refinement along the top left corner and ideally the stress distribution across
the domain should be constant and the same was observed from the results
30
of the test. The displacement of the domain along x and y direction results
are shown in Figure 5-2and the stress distribution along x direction is shown
in Figure 5-3. The test performed as expected for both first and second order
discretization.
Figure 5-1: Patch test problem representation
Figure 5-2: Displacement of the domain along x and y direction
Figure 5-3: Stress distribution along x direction
Displacement x direction Displacement y direction
31
5.2 Infinite plate with a circular hole
The next example is an infinite plate with a circular hole of radius ‘a’
subjected to a unidirectional tensile load of ‘P’ in the x direction as shown
in Figure 5-4. In this case, only one quarter of the domain is analysed due
to symmetry along x and y axis. The analytical stress components for this
problem are [5]
𝜎𝑥𝑥(𝑥, 𝑦) = 𝑃 {1 −𝑎2
𝑟2 (3
2cos 2𝜃 + cos 4𝜃) +
3𝑎4
2𝑟4(cos 4𝜃)} (5.1)
𝜎𝑦𝑦(𝑥, 𝑦) = −𝑃 {𝑎2
𝑟2 (3
2cos 2𝜃 − cos 4𝜃) −
3𝑎4
2𝑟4(cos 4𝜃)} (5.2)
𝜎𝑥𝑦(𝑥, 𝑦) = −𝑃 {𝑎2
𝑟2 (3
2sin 2𝜃 + sin 4𝜃) +
3𝑎4
2𝑟4(sin 4𝜃)} (5.3)
The inner boundary of the hole is traction free and the right edge was
imposed with the tractions based on the analytical solutions in Equations
(5.1) – (5.3). The left edge is constrained in the x direction and the bottom
edge is constrained in the y direction, respectively. The plane stress
condition is considered and the parameters are
Figure 5-4: Infinite plate with a circular hole
Young modulus 3e07 MPa
Poisson Ratio 0.3
Load (P) 10 N/m2
a 1
32
The Figure 5-5 illustrates the stress distribution in the domain along x
direction and the Figure 5-6 shows the parallel repartitioning of the domain
during the course of adaptive refinement.
The adaptive refinement was done based on refinement indicators
described under refinement indicator section of the report.
Figure 5-5: Stress distribution along x direction
33
Figure 5-6: Parallel Repartitioning colour coded by MPI rank
34
Figure 5-7: Comparison of strain energy convergence
The Figure 5-7 shows the comparison between uniform refinement
and the adaptive refinement based on different refinement indicators. It can
be inferred from the figure that the convergence of strain energy in case of
adaptive refinement happens at a much lower degree of freedom when
compared to the uniform refinement. Thus, ensuring the solution
convergence at reduced computation cost.
35
5.3 Slope stability
In this example, the plasticity material model was tested. The
objective of the test was to ensure the adaptive mesh refinement of the
domain based on “plasticity indicator” a physics based indicator. The
problem description is shown in Figure 5-8. The bottom of the domain was
constrained and the model was subjected to gravitational load. The material
properties considered were,
Young modulus 20e3 MPa
Poisson Ratio 0.49
Constitutive law Mohr-Coulomb
Second order discretization was used in the test and the Mohr-
Coulomb parameters are listed below,
Friction angle 20
Dilatancy angle 20
Cohesion 50 Mpa
Hardening was not considered and perfect plasticity was used.
Figure 5-8: Slope stability problem representation
The physics based indicator worked as expected and the sensitive
zones were marked and refined in iterative time steps. The adaptive
refinement of the domain is shown in Figure 5-9.
36
Figure 5-9: Adaptive Refinement based on Plasticity Indicator
37
5.4 Extension of notched beam
This example is like previous example to test the adaptive refinement
of the domain based on “damage indicator” a physics based indicator. The
domain consists of a rectangular beam with a notch in the middle as shown
in Figure 5-10. The left edge of the beam is constrained in x direction and
the bottom edge is constrained in y direction. The right edge of the beam is
subjected to enforced displacement ‘u’. The material properties considered
were,
Young modulus 2.0 MPa
Poisson Ratio 0.3
Constitutive law Isotropic Damage
For damage, exponential softening rule was used and other damage
parameters are listed below,
Yield stress 0.91 MPa
E0 𝑓𝑡
√𝐸⁄
Ef 100E0
Figure 5-10: Extension of notched beam problem representation
38
Figure 5-11: Adaptive Refinement based on Damage Indicator
The damage indicator worked as expected and the damage
propagation was marked and refined in iterative time steps. The
adaptive refinement of the domain is shown in Figure 5-11.
39
CHAPTER 6
6 CONCLUSIONS
The main objective to develop a Parallel Adaptive Mesh Refinement
code was successfully achieved and verified. The refinement indicators
Carstensen and Kelly performed to the expectations and clearly marked the
sensitive zones. Notably, the Kelly indicator predications were in par with
the conclusions from the literature. Similarly, the performance of physical
indicator for plasticity and damage was also exceptionally good. These
outcomes would be a great support to extend the code further to explore 3D
problems and more physics based indicators. The code also could be used
as a base for Scalability study for large scale simulations in the future.
40
REFERENCES
1. Carsten Burstedde, Lucas C. Wilcox, and Omar Ghattas, p4est:
Scalable Algorithms for Parallel Adaptive Mesh Refinement on
Forests of Octrees. SIAM Journal on Scientific Computing 33 no. 3
(2011), pages 1103-1133.
2. Carsten Burstedde, Lucas C. Wilcox, and Tobin Isaac, The p4est
software for parallel AMR How to document and step-by-step
examples, 2014.
3. D.W Kelly et al, A posteriori error analysis and adaptive processes
in the finite element method: Part {I}--Error Analysis, Int. J. Num.
Meth. Engrg.
4. J. Alberty et al, Matlab Implementation of the Finite Element Method
in Elasticity, (DOI) 10.1007/s00607-002-1459-8.
5. H. Nguyen-Xuan, T. Nguyen-Thoi, Kaiyang Zeng, Wu S.C
Assessment of smoothed point interpolation methods for elastic
mechanics, Int. J. Numer. Meth. Biomed. Engng. 2010; 26:1635–
1655.
6. Joseph E. Flaherty, Adaptive Finite Element Techniques,
http://www.cs.rpi.edu/~flaherje/pdf/fea8.pdf
7. Wolfgang Bangreth, Finite Element Methods in Scientific Computing,
http://www.math.tamu.edu/~bangerth/videos/676/slides.15.pdf
8. PETSc, http://www.mcs.anl.gov/petsc, 2016.
41
9. Chapter 28 Stress Recovery, Introduction to Finite Element Methods,
University of Colorado at Boulder
http://www.colorado.edu/engineering/CAS/courses.d/IFEM.d/IFEM.
Ch28.d/IFEM.Ch28.pdf
10. Wolfgang Bangerth, Carsten Burstedde, Timo Heister, and
Martin Kronbichler, Algorithms and Data Structures for Massively
Parallel Generic Adaptive Finite Element Codes. Published in ACM
Transactions on Mathematical Software 38 No. 2 (2011), pages 14:1-
14:28.
11. Carsten Burstedde, Omar Ghattas, Georg Stadler, Lucas C.
Wilcox, Adaptive Mesh Refinement (AMR), CIG meeting on
Opportunities and Challenges in Computational Geophysics
California Institute of Technology, March 30, 2009.
12. Carsten Burstedde, Forest-of-octrees AMR: algorithms and
interfaces, Second [HPC]3 Workshop KAUST, Saudi Arabia, Feb 05,
2012.
13. Carsten Burstedde Parallel adaptive mesh refinement using
multiple octrees and the p4est software, August 29th, 2013.
14. Kevin M. Olson, Peter MacNeice, PARAMESH: A Parallel
Adaptive Mesh Refinement Community Toolkit, 1999.
15. B. S. Kirk, J. W. Peterson, R. H. Stogner, and G. F. Carey.
libMesh: A C++ Library for Parallel Adaptive Mesh
Refinement/Coarsening Simulations. Engineering with Computers,
22(3--4):237--254, 2006.
16. Jacob Fish, Ted Belytschko, A First Course in Finite Elements,
ISBN 978-0-470-03580-1
42
17. Berger, M. J. and Oliger, J. (1984). Adaptive Mesh Refinement
for Hyperbolic Partial Differential Equations. J. Comp. Phy., 53:484–
512.
18. D. L. De Zeeuw. A Quadtree-Based Adaptively-Refined
Cartesian-Grid Algorithm for Solution of the Euler Equations. PhD
thesis, University of Michigan, September 1993.
19. B. van der Holst and R. Keppens. Hybrid block-AMR in
Cartesian and curvilinear coordinates: MHD applications. Journal of
Computational Physics, 226:925-946,2007.