vinodh veda chalam

105
Discrete Element Modelling Of Granular Snow Particles Using LIGGGHTS Author Vinodh Vedachalam Supervisor Davy Virdee EPCC Edinburgh Parallel Computing Centre The University of Edinburgh UK August 2011

Upload: pedro-rodriguez

Post on 28-Nov-2015

102 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Vinodh Veda Chalam

Discrete Element Modelling Of Granular Snow Particles Using

LIGGGHTS

Author

Vinodh Vedachalam

Supervisor

Davy Virdee EPCC

Edinburgh Parallel Computing Centre The University of Edinburgh

UK

August 2011

Page 2: Vinodh Veda Chalam

Discrete Element Modelling Of Granular Snow Particles Using LIGGGHTS

Author

Vinodh Vedachalam

A thesis submitted in partial fulfilment of the requirement for the degree of

M.Sc. High Performance Computing

Thesis Supervisor

Davy Virdee

MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2011

Page 3: Vinodh Veda Chalam

Abstract

The idea behind this thesis is to investigate and develop a large-scale three-dimensional

Discrete Element Model that can simulate several millions of snow particles falling

under the influence of gravity. This model can be used to simulate the behaviour of the

snow particles to allow for in-elastic collisions and cohesion using high performance

computing. This model is then profiled and benchmarked for scalability and finally

some suggestions are listed for optimisation for future research. The project extensively

studies and discusses the development of the model and the various driving factors

behind the high performance computing (HPC) solution.

Page 4: Vinodh Veda Chalam

i

Contents

Chapter 1 Introduction ...................................................................................................... 1!

1.1 What is a Granular Material?.................................................................................. 2!

1.2 Mechanics of Snow................................................................................................. 2!

1.3 Computer Simulation of Granular Materials .......................................................... 2!

1.4 Need for High Performance Computing................................................................. 3!

1.5 Research Objective and Approach.......................................................................... 4!

1.6 Literature Review.................................................................................................... 5!

1.7 Organization of the report ....................................................................................... 7!

Chapter 2 Background....................................................................................................... 8!

2.1 Introduction to Molecular Modelling ..................................................................... 9!

2.2 Review of Molecular Dynamics ............................................................................. 9!

2.3 Force Calculations and Ensembles ....................................................................... 10!

2.4 Interaction.............................................................................................................. 11!

2.5 Integration ............................................................................................................. 11!

2.6 Periodic Boundary Condition ............................................................................... 12!

2.7 Neighbour List....................................................................................................... 13!

2.8 Discrete Element Method ..................................................................................... 14!

2.9 Parallelisation and Communication of DEM Simulations ................................... 15!

2.10 Summary ............................................................................................................. 17!

Page 5: Vinodh Veda Chalam

ii

Chapter 3 Experimental Setup ........................................................................................ 18!

3.1 The Platforms ........................................................................................................ 19!

3.1.1 HECToR Cray XE6 ....................................................................................... 19!

3.1.2 Ness ................................................................................................................ 22!

3.2 The Software ......................................................................................................... 22!

3.2.1 LAMMPS – Code Introduction ..................................................................... 23!

3.2.2 LAMMPS – Installation................................................................................. 23!

3.2.3 LAMMPS – Working .................................................................................... 25!

Figure 3.4: Execution of LAMMPS flow chart...................................................... 25!

3.2.4 LAMMPS - Input Script Structure ................................................................ 25!

3.2.5 LAMMPS - Input Script Basics..................................................................... 26!

3.2.6 LAMMPS - Parsing Rules ............................................................................. 26!

3.2.7 LIGGGHTS.................................................................................................... 27!

3.3 Visualization Setup ............................................................................................... 28!

3.4 Description of the Mechanical Properties of Snow.............................................. 28!

3.4.1 Size and shape of snow particles ................................................................... 28!

3.4.2 Density ........................................................................................................... 29!

3.4.3 Young’s Modulus........................................................................................... 29!

3.4.4 Poisson’s Ratio............................................................................................... 29!

3.4.5 Coefficient of restitution ................................................................................ 30!

3.4.6 Coefficient of kinetic friction......................................................................... 30!

3.5 Summary ............................................................................................................... 30!

Chapter 4 Modelling of Cohesive Interactions ............................................................... 31!

Page 6: Vinodh Veda Chalam

iii

4.1 DEM Revisited...................................................................................................... 32!

4.2 Defining a particle and particle collision.............................................................. 33!

4.2.1 Particle Definition .......................................................................................... 33!

4.2.2 Cohesive forces .............................................................................................. 34!

4.3 Modelling Cohesive Contacts ............................................................................... 35!

4.3.1 Contact Point and Collision Normal.............................................................. 35!

4.3.2 Normal Deformation and Contact force ........................................................ 36!

4.3.3 Collision Detection ........................................................................................ 37!

4.4 Basics of Contact force models............................................................................. 37!

4.5 Physical Models of Cohesive Contact .................................................................. 38!

4.5.1 Linear cohesion model................................................................................... 38!

4.5.2 JKR cohesion model ...................................................................................... 39!

4.6 Summary ............................................................................................................... 40!

Chapter 5 Implementation Details .................................................................................. 41!

5.1 Porting LAMMPS Cohesion Add-on to HECToR............................................... 42!

5.1.1 Modifying the fix_cohesive.h header file...................................................... 43!

5.1.2 Modifying the fix_cohesive.cpp header file.................................................. 43!

5.2 Building the granular module within LAMMPS.................................................. 46!

5.3 LAMMPS Granular Essentials ............................................................................. 46!

5.4 Determination of simulation time-step ................................................................. 47!

5.5 LAMMPS Simulation ........................................................................................... 47!

5.5.1 Implementation Details.................................................................................. 47!

5.5.2 Visualisation................................................................................................... 50!

Page 7: Vinodh Veda Chalam

iv

5.5.3 LAMMPS Simulation Results and Discussions............................................ 50!

5.6 LIGGGHTS Simulation ........................................................................................ 52!

5.6.1 Material parameter values.............................................................................. 52!

5.6.2 LIGGGHTS Implementation Details ............................................................ 53!

5.6.3 LIGGGHTS Simulation Results.................................................................... 55!

5.6.4 Improved Chute Geometry ............................................................................ 57!

5.6.5 Improved Simulation Results ........................................................................ 58!

5.7 Summary ............................................................................................................... 60!

Chapter 6 Benchmarking................................................................................................. 61!

6.1 Cost of Accessing HECToR ................................................................................. 62!

6.2 Performance Benchmarks ..................................................................................... 62!

6.3 Performance per time-step .................................................................................... 65!

6.4 Performance Comparison - Cohesion and Non-Cohesion ................................... 66!

6.5 Summary ............................................................................................................... 66!

Chapter 7 Profiling and Performance Analysis .............................................................. 67!

7.1 Description about profiling tools available........................................................... 68!

7.2 Profiling using CrayPAT ...................................................................................... 69!

7.2.1 Profiling - User functions............................................................................... 69!

7.2.2 Profiling - Percentage Time of MPI Calls ..................................................... 70!

7.2.3 Profiling – Messages/Sizes ............................................................................ 71!

7.2.4 Profiling – Memory Usage ............................................................................ 72!

7.3 Timing O/P directly by the code and its description ............................................ 72!

7.4 Summary ............................................................................................................... 74!

Page 8: Vinodh Veda Chalam

v

Chapter 8 Conclusions and Future Work........................................................................ 75!

8.1.1 Summary ........................................................................................................ 75!

8.1.2 Recommendations for future Research ......................................................... 77!

References ....................................................................................................................... 79!

Appendix A! Project Management................................................................................ 82!

Appendix B! Parallel Processing on Ness & HECToR................................................ 84!

Appendix C! AutoCAD Details .................................................................................... 87!

Page 9: Vinodh Veda Chalam

vi

List of Tables

Table 1.1: Estimate of Number of Snow Particles ......................................................... 3

Table 3.1: Ness Specification August 2011.................................................................. 22

Table 5.1: Material Parameters .................................................................................... 53

Table 5.2: Chute Specification...................................................................................... 57

Table 7.1: LIGGGHTS timing output........................................................................... 73

Table A.1: Updated work plan...................................................................................... 82

Table A.2: Risk Assessment ......................................................................................... 83

Page 10: Vinodh Veda Chalam

vii

List of Figures

Figure 1.1: A small avlanche slab of 100mx50x10 metre.............................................. 4

Figure 2.1: Flowchart of Molecular Dynamics Approach............................................ 10

Figure 2.2: 2D MD simulation with periodic images of itself ..................................... 13

Figure 2.3: Neighbour list of one of all particles (in red) drafted in 2D ...................... 14

Figure 2.4: DEM Contact model in normal and tangential direction........................... 15

Figure 2.5: Spatial decomposition approach................................................................. 16

Figure 3.1: Picture of the Cray XE6 ............................................................................. 19

Figure 3.2: Magny-Cours architecutre diagram............................................................ 20

Figure 3.3: HECToR file system................................................................................... 21

Figure 3.4: Flowchart for LAMMPS execution ........................................................... 25

Figure 3.5: Graupel snow particles ............................................................................... 29

Figure 4.1: Diagram to illustrate the typical flow of a DEM simulation ..................... 32

Figure 4.2: Definition of a computational particle ....................................................... 33

Figure 4.3: Different possible collision states............................................................... 34

Figure 4.4: Two ways to calculate cohesion normal and ocntact point ....................... 35

Figure 4.5: Contact zone between two spherical particles ........................................... 36

Figure 4.6: Stages of collision detectin......................................................................... 37

Figure 4.7: Hertz contact force model .......................................................................... 40

Figure 5.1: LAMMPS Simulation screenshot 1 ........................................................... 51

Page 11: Vinodh Veda Chalam

viii

Figure 5.2: LAMMPS Simulation screenshot 2 ........................................................... 51

Figure 5.3: LAMMPS Simulation screenshot 3 ........................................................... 52

Figure 5.4: LIGGGHTS Simulation screenshot 1 ........................................................ 55

Figure 5.5: LIGGGHTS Simulation screenshot 2 ........................................................ 56

Figure 5.6: LIGGGHTS Simulation screenshot 3 ........................................................ 56

Figure 5.7: Cross section of the improved chute .......................................................... 57

Figure 5.8: Improved LIGGGHTS Simulation screenshot 1 ....................................... 58

Figure 5.9: Improved LIGGGHTS Simulation screenshot 2 ....................................... 59

Figure 5.10: Improved LIGGGHTS Simulation screenshot 3 ..................................... 59

Figure 6.1: Performance of the simulation model ........................................................ 63

Figure 6.2: Benchmark of different system size on 480, 960 processors .................... 63

Figure 6.3: Execution time of 75,000 and 1000,000 particles...................................... 64

Figure 6.4: Speedup of 75,000 and 1000,000 particles ............................................... 65

Figure 6.5: Speedup of 75,000 and 1000,000 particles ............................................... 65

Figure 6.6: Speedup of 75,000 and 1000,000 particles ............................................... 66

Figure 7.1: Profiling results of the code by function groups ....................................... 69

Figure 7.2: Time time consuming user functions ........................................................ 70

Figure 7.3: Top time consuming MPI functions .......................................................... 70

Figure 7.4: Top time consuming MPI_SYNC functions ............................................. 71

Figure 7.5: Profiling by message size .......................................................................... 72

Figure 7.6: LIGGGHTS timing output ........................................................................ 73

Figure 8.1: Linear regression analysis of scaled size benchmark ............................... 77

Page 12: Vinodh Veda Chalam

ix

Acknowledgements I wish to sincerely thank my supervisor Davy Virdee for his guidance and support during this research. He has continuously encouraged me and has contributed greatly to my professional growth. I also thank Dr. Fiona Reid of EPCC for her help in the initial stages of the project in setting up of LAMMPS on HECToR and Ness. I wish to thank Dr. Jin Sun of Institute for Infrastructure and Environment University of Edinburgh for providing the cohesion-add-on code for LAMMPS. I also wish to thank Dr. Jane Blackford of Centre for Materials Science and Engineering and Institute of Materials and Processes, School of Engineering and Electronics, University of Edinburgh for her inputs on the physics of snow particles and for her feedback on the simulation model.

Page 13: Vinodh Veda Chalam

x

Nomenclature

A Hamaker constant

a acceleration (m/s2)

E* reduced elastic modulus (Pa)

E Young’s modulus (Pa)

!

! F force vector acting on particles (N)

Fij force of i at contact with neighbouring particles j (N)

!

fnd repulsive forces at contact (N)

!

fnc cohesion force (N)

!

fne viscous force (N)

g acceleration due to gravity (m/s2)

k normal spring stiffness

m mass of the particle (kg)

r radius of the particle (m)

rb buffer radius (m)

rcut cut off radius (m)

rnl neighbour list cut-off radius (m)

U potential energy (Jm-2)

!

upq relative velocity between two particles p, q (ms-1)

Vn relative normal velocity of the colliding particle (ms-1)

Page 14: Vinodh Veda Chalam

xi

xpq relative position of two particles p , q (m)

!

xi..

translational acceleration (ms-1)

Greek Symbols

!

"t time-step (s)

!

"s distance between two particles in tangential direction (m)

!

" body surface normal overlap between particles (

!

µm )

!

" density of particles (kg/m3)

!

" coefficient of restitution

!

" inclination angle of the chute (degrees)

!

µ coefficient of friction

!

" dumping coefficient

!

" Poison’s ratio

Page 15: Vinodh Veda Chalam

xii

Abbreviations

2D two dimensional

3D three-dimensional

AU accounting unit

CFD computational fluid dynamics

CPU central processor unit

CAD computer aided design

DEM discrete element method

EAM embedded atom method

EDM event driven method

FEM finite element method

GPU graphical processing unit

HECToR high end computing terascale resource

JKR Johnson – Kendall – Roberts

KB kilobytes

LAMMPS large-scale atomic/molecular massively parallel simulator

LIGGGHTS LAMMPS improved for general granular and granular heat transfer simulations

NUMA Non-uniform memory architecture

MB mega bytes

Page 16: Vinodh Veda Chalam

xiii

MD molecular dynamics

MPP massively parallel processor

MPI message passing interface

PBC periodic boundary conditions

STL stereolithography

TDM time driven method

Page 17: Vinodh Veda Chalam

1

Chapter 1 Introduction

This Chapter is an introduction to the thesis providing an overview of the objective and approach of this research and emphasis the need for High Performance Computing for this research. Snow is introduced as a granular material and some background information about Snow Avalanches and their formation are discussed. In the past, many researchers have proposed different models to describe the movement of snow and snow mechanics ranging from simple theoretical model to complicated computational models. Many such techniques to model snow particles and snow avalanches are reviewed.

Page 18: Vinodh Veda Chalam

2

1.1 What is a Granular Material?

A Granular material is an assembly of many discrete solid particles interacting with each other due to dissipative collisions and is dispersed in a vacuum or an interstitial fluid. Granular particles can be considered as the fourth state of matter very different from solids, liquids or gas. For example, a pile of granular sand particles at rest on an inclined plane produces a solid like behaviour if the angle of inclination is less than a certain angle called the angle of response. This is due to the static friction between the granular particles and the slope. If the inclined plane is tilted a bit above the angle of response, then the grains start to flow, exhibiting a fluid like behaviour (though the flow is very much different from the actual flow of a fluid). The granular particles behave like gaseous particles under highly agitated systems (Jenkins and Savage, 1983). The force interactions between the particles play a key role in defining the mechanics of granular flows.

Several forms of granular flows exist in nature and in industrial process ranging from avalanches in the form of landslides to powder mixing in chemical industry. Granular materials cover a broad area of research at the intersection of different scientific fields including soft matter physics, soil mechanics, powder technology and geological processes. Despite the wide variety of properties, the discrete granular structure of these materials leads to a rich generic phenomena, which has motivated research for its fundamental understanding.

1.2 Mechanics of Snow

Snow is a form of precipitation that is made up of crystalline water ice crystals. It is an example of a geo-material whose microstructure plays a significant role in its overall behaviour. After snow falls, the physical structure of snowpack is affected by factors like interaction with ground, temperature and other meteorological conditions. The initial snow crystals are transformed into ice grains because of deformation through wind or melting/freezing process. The resulting snow cover can be considered as porous granular material made up of ice grains, water droplets and dust. Though on macroscopic level, they are considered as continuous media, that is, the granular structure may sometimes no longer be visible, but at microscopic level, a snow sample can be considered as a cohesive granular assembly of elementary particles that are assumed as rigid particles. Thus, in the approach presented in this thesis snow is considered as granular medium.

1.3 Computer Simulation of Granular Materials

In the past, experimental studies were carried out to study the behaviour of granular particles. In recent years, due to the advancement in computer processing speed numerical simulation of such granular flow is seen as an effective alternative tool to study and understand the behaviour of granular flows. Since a granular system is composed of individual particles and each particle moves independently of each other,

Page 19: Vinodh Veda Chalam

3

it is difficult to predict the behaviour of granular system using continuous models. In this context, the discrete approach developed for particle scale numerical modelling of granular materials has become a powerful and reliable tool. It is considered as an alternate to the continuum approach. This discrete approach is called as Discrete Element Method (DEM). The philosophy behind the DEM simulation of granular flows is to model the system at microscopic level or particle level and study their behaviour including the detection and collision between particles and their environment. DEM can efficiently and effectively model the dynamics of assemblages of particles. Technically, the discrete approach requires a time-discretised form of equations of motion governing particle displacements and rotations, and a force law or force-displacement relation describing particle interactions. DEM is particularly useful in modelling materials that undergo discontinuous deformations because of the contact with other particles in the system, breakage of contact bonds and compaction of broken fragments. In this thesis, DEM is employed to simulate the flow of snow particles under gravity – a snow avalanche.

Snow can be considered as a granular material (section 1.2), a DEM approach is chosen instead of continuum approach and it is, in principal, possible to capture almost all the granular physical phenomena of snow particles using DEM. Chapter 2 discusses the background details of the DEM approach in detail.

1.4 Need for High Performance Computing

There are two main reasons to use super computers to develop the DEM model of snow avalanches. First, the number of snow particles in a real snow avalanche is huge. To give an idea of number of particles in a powder snow avalanches, an estimate on number of snow particles of size 2mm and 5mm in 5-litre and 1000 litre volume is done and is summarised in table 1.1.

Particle Diameter (mm)

Volume of a Spherical Particle (m3) Fill Volume (m3) Number of Particles

per fill volume 2 4.18 x 10-09 0.005 (5 litres) 1.2 million 2 4.18 x 10-09 1 (1000 litres) 240 million 5 6.54 x 10-08 0.005 (5 litres) 76 thousand 5 6.54 x 10-08 1 (1000 litres) 15 million Note: 0.005 m3 volume = 5 litres and 1 m3 = 1000 litres

Table 1.1: Estimate of number of snow particles

If a slab of snow measuring 100m in length with a width of 50m and a depth of 10m were to slide – see figure 1.2 – the volume would be 50,000 m3. This would contain roughly 1.2 x 1013 snow particles of size 5mm in diameter.

Page 20: Vinodh Veda Chalam

4

Figure 1.1: A small avalanche slab of 100x50x10 metres (Photograph from D. Virdee collection)

Second, in order to accurately simulate the behaviour of the particles, all DEM simulations require very large number of particles in the system. These signify the importance of High performance infrastructures for our model not only for the computational purpose but also for post-processing and visualisation purposes. Given these, it would clearly be impossible to model such large-scale avalanches. Hence this thesis will look at a small scale snow slide/avalanche of volumes up to 5 litres – 0.005 m3 – with about a million particles flowing under the influence of gravity and study the computational behaviour of the model to make an estimate on the resources required to model much larger number of particles.

1.5 Research Objective and Approach

The research will aim to address the following aspects: Is it possible to model the spherical snow particles using DEM? Can the model be profiled and benchmarked for scalability? Can it be optimised for better performance? The aims of this research encompass both the experimental and computational aspects and investigate the discrete numerical simulation of granular (spherical) snow particles flowing down a slope – an avalanche. The experimental aim is to: 1. Design a discrete model of 1000 spherical snow particles of 5mm diameter surrounded by granular walls on the x and y boundaries.

Page 21: Vinodh Veda Chalam

5

2. Implement the model using LAMMPS/LIGGGHTS. Validate the model. Extend it for up to 1 million particles – roughly the number in a 5-litre volume. 3. Profile the code to understand the performance bottlenecks and suggest optimisation strategies. 4. Visualise the LAMMPS/LIGGGHTS output. Identify a suitable technique to visualise the LAMMPS/LIGGGHTS snapshot of the simulation. Predicting snow avalanche is a very complex process and this project will not attempt to predict/forecast the occurrence of an avalanche. The goal of our model is to attempt to capture the movement and deformation of small loose snow avalanche with cohesion as discrete particles using high performance computing, benchmark this model and examine the HPC artifacts observed after profiling the code. The intention is to model the system approximately in terms of size, shapes (preferably) and material characteristics subject to the computational limit. This can be further developed, in the future, as a full-fledged model to investigate the mechanics of granular snow avalanches.

1.6 Literature Review

In this section, the theories and computational models for snow avalanches are reviewed as a basis for understanding the snow avalanche modelling. Various models ranging from simple theoretical methods to complicated computational models can describe the movement of snow and snow mechanics. All basic approaches used for snow modelling employ statistical models, analytical models and numerical models.

Traditional avalanche models use point-wise and piece-wise analytical solutions for governing differential equations to describe momentum conservations laws. To obtain the crude estimations of important avalanche features such as velocity, pressure and run-out distance only simple models have been used for the past 80 years. They are capable of producing accurate results (Christophe, 2002). Logotala (1924) first computed the velocity of avalanches down a predetermined path. Voellmy, Salm and Gubler (VSG) examined the model proposed by Logotala and new extensions have been proposed by them to increase the accuracy of the model. In VSG model, the flowing avalanche is treated like a sliding block. The main advantage of this model is that it can be used to predict the maximum velocity, run-out distance and the impact pressure of an avalanche very easily. However, according to Mear (2005) this model is not reliable because of the unrealistic assumptions made on the avalanche path. The avalanche path is divided into three segments that include the avalanche track, the run-out zone and the release zone. All of these segments are assumed to have constant slope, constant width and uniform flow. Mear (2005) points out that the assumptions that are taken into account for the avalanche path are not efficient and effective to model all avalanche terrain and a system for snow modelling which is a dynamic modelling cannot be done with these unrealistic assumptions.

Page 22: Vinodh Veda Chalam

6

The run-out distance of an avalanche is also described by many statistical approaches. Bovis and Mears (1976), Lied and Bakkehoi (1980) and Mc Clund and Lied (1987) proposed a different approach to solve the existing problem. They predicted the run-out distances of avalanche from topographic features for a particular avalanche path. Relating the slope angle of the avalanche path to the run-out distance of an avalanche develops the statistical approaches. Advantages of this model are simplicity, ease of use and are derived from local history of avalanche in the region of interest over the other complicated models. Statistical models are very useful and accurate when historic data of snow avalanches are available. Statistical models can be classified into two types: “alpha-beta” (!") model and “run out ratio “model (#X/ X"). Both these models are based in the correlation between run out distances and some topographic features and they are based on the theory that the avalanche dynamics is based in the longitudinal profile of the avalanche path. Since such models are based on historic real data, according to Mear (2005), they have their own limitations: each mountain terrain and the run out path is unique which makes the model applicable only to the region where the data was gathered and not generic. Another drawback with such models is they provide only very limited information about the velocity and impact pressure of an avalanche. Hydraulic models are developed as a means to capture the dynamics of avalanche that provides accurate information about the velocity, flow depth and impact pressure of an avalanche over the run out path. A different approach to avalanche modelling is to model them using depth-averaged (the size of snow particles are on average scale) average scale continuum approach. Voellmy (1955) was the first to develop one such model (the Voellmy-fluid model). He described the dense-snow avalanche movement based on the principles of conservation of mass and momentum. He treated the avalanche as a sliding block of snow moving with a drag force that is proportional to the square of the speed of its flow. This model is widely accepted as it gives correct and reliable results. Furthermore, this model gives unsatisfactory results for large-scale avalanches (Bartelt 1999). Eglit et al. (1960) from Moscow State University developed a model, based on Voellmy-fluid model, in which there is no upper limit on the sliding friction and that it makes no distinction between the active and passive parts of the avalanche flow. This makes the Russian model suitable for large-scale avalanches. Savage and Hutter (1988) modelled the motion of finite cohesion-less granular materials down an inclined plane as a fluid (continuum approach). This is one of most advanced concept to model the motion of granular model. It is based on the system of differential equations for the conservation of mass and momentum and the velocity is assumed constant. The model best describes the motion of front and rear edges of a finite mass of granular material released from rest. Hopkins and Johnson (2005) of US Army Engineer Research Development Center (ERDC) have developed a dynamic model (

!

µSNOW) of dry snow deformation based on DEM that can sinter together and break apart. Their model identifies the micro-structural deformation mechanics of snow particles. In this model the snow particles are represented as randomly oriented cylinders of random length with hemispherical ends. The contact detection is handled by a new iterative method based on the dilation operation (Hopkins, 1995). This model accurately represents the mechanics that

Page 23: Vinodh Veda Chalam

7

control dry-snow deformation. My first interaction with the group, during the literature review phase of the project, provided me few insights about the

!

µSNOW model they have developed. They have tried to develop a virtual snow laboratory having mechanical, heat transfer, visible light interaction, metamorphism modules and air flow/permeability. During my telephonic conversation with A.Hopkins of ERDC, we discussed about the DEM modelling of snow particles. According to him, DEM modelling of snow mechanics is a mixed bag, as they say. With the DEM it is possible to model snow particles made of discrete grains of various shapes that have sintered together. But to model a more general, metamorphosed sample then finite element method (FEM) may be more capable of resolving the complex structure that is difficult, if not impossible to resolve into grains. However, there are couple of unanswered questions: a. what to do with the FEM structure once it starts breaking apart into grains and fragments. b. in that stage the can DEM solve the problem. These questions are still not resolved. This meeting helped me gain lot of useful information for the thesis.

1.7 Organization of the report

This dissertation report is organised into eight chapters based on the order in which the work has been done for the thesis and it follows a rational and progressive order. This chapter is the introduction to the thesis. Chapter 2 provides background information describing DEM and some of the Molecular Dynamics (MD) components required to begin simulations. Methods of decreasing the computation time are also discussed. The hardware and software (program used) requirements are discussed in Chapter 3. In addition, the description of the material properties of snow that is required for the simulations are explored. Chapter 4 is devoted to more advanced developments dealing with complex particle shapes, cohesion forces, hydrodynamic and thermal interactions and modelling of complex granular systems. In addition, the two physical cohesion models that are employed in building the DEM model are explained in detail. The Implementation details, Numerical results and Validation of the DEM model is discussed in detail in Chapter 5. The benchmarking results are presented in Chapter 6. The code is profiled using CrayPAT and the profiling results are discussed in Chapter 7. Chapter 8 provides optimisation suggestions and some conclusions of this research and suggestions for future enhancements to the project.

Page 24: Vinodh Veda Chalam

8

Chapter 2 Background

This chapter provides background information about MD-DEM and discusses its advantages and disadvantages. The numerical modelling requires not only a simulation method but also a “toolbox” of different methods for the management of initial boundary conditions and choice of parameters. All these topics are discussed in this chapter in the framework of a minimalistic model. An explanation on how to calculate the forces and the potential energy functions that are required to predict the trajectory of particles are provided. The Integration method used to calculate the velocities and positions of particles is described in detail. In addition to these, several optimization techniques like neighbour lists, periodic boundary conditions are also discussed. In addition, the constraints of DEM simulations and parallelisation and communication strategies are explored.

Page 25: Vinodh Veda Chalam

9

2.1 Introduction to Molecular Modelling

Snow particles are very much bigger than molecules. However, molecular modelling is the basis for DEM. This section and the next section discuss molecular modelling to give some background for DEM. Molecular modelling is defined as the computational technique to construct or mimic the behaviour of molecules and performs a variety of calculation on these molecules to predict their characteristics and behaviour. Molecular-modelling is used in the field of material sciences for studying three main characteristics about an individual molecule or system of molecules: its chemical structure (number and type of particles), its properties (energy) and its characteristics behaviour in the presences of other molecules (electrostatic potentials). These determinations help in validating experimental studies or can help in predicting experimental results. The key feature of Molecular modelling techniques is the atomistic level description of molecular systems, that is, describing the system at individual atom (or a small group of atoms) level. The main advantage of such approach is it reduces the complexity of the system, allowing many more particles to be considered during the simulations. There are a number of techniques employed in molecular modelling. MD is a popular and most commonly used technique for molecular modelling.

2.2 Review of Molecular Dynamics

MD is a very established microscopic form of computer simulation method for studying the properties and behaviour of complex systems like solids, liquids and gases by calculating motion of every particle in the system over a given time. A basic MD simulation contains five steps as summarised below.

• Initialise - Read initial states of the particles.

• Ensembles and Interaction - Calculate the forces acting on each particle based on neighbour list.

• Force Calculations - Compute the acceleration of each particle.

• Integration - Obtain the new velocity and position of the particles after each

time step.

• Analyse - Compute magnitudes of Interest and measure Observables. Repeat steps 2 through 5 for the required number of time-steps.

P. Cundall (1979) developed DEM approach nearly 30 years ago to model granular materials. MD was also used at that time for the simulations of molecular systems with classical schemes that could be directly applied to granular media. For this reason, many authors keep using indiscriminately the acronyms MD and DEM for the discrete simulation methods of granular materials

Page 26: Vinodh Veda Chalam

10

Figure 2.1: Flowchart of MD approach

Note that both MD and DEM approach are identical in spirit but the physics is fundamentally different. However, especially for modelling granular materials, the particle properties, force interactions and integration laws that are often referred to as MD techniques are used in DEM, in order to understand the collective behaviour of the dissipative large-particle system. These MD techniques are explained in detail in Section 2.3 to 2.6 in this chapter before proceeding to DEM.

2.3 Force Calculations and Ensembles

Particles in the MD simulation move due to the forces acting on them as governed by Newton’s second law of motion, which is given by equation 2.1

!

! F = m! a = m d! v

dt= m d2! r

dt 2 (2.1)

Page 27: Vinodh Veda Chalam

11

Calculating the potential energy between the individual particles helps in determining the atomic interactions that are described by inter atomic forces between particles. The sum of the potential energy associated with all types of atomic interactions gives the total potential energy of the system. The forces acting between the pair of particles are computed by evaluating the negative first order derivate of the potential energy with respect to the separation distance. This is given by equation 2.2

!

Fi = "#U#rij

(2.2)

There are three commonly used Ensembles in MD simulations. 1. NVE – Number of atoms, Volume and Energy of the system are kept constant. This is called Micro Canonical ensemble. 2. NVT – Number of atoms, Volume and Temperature of the system are kept constant. This is called Canonical ensemble. 3. NPT - Number of atoms, Pressure and Temperature of the system are kept constant. This is called Gibb’s ensemble. NVE ensemble approach is used in this thesis.

2.4 Interaction

Potential energy calculations play a key role in MD simulations. The selection of appropriate potential is the fundamental step in any MD simulation so that it provides useful results as well as not computationally expensive. The potential used in this thesis is granular potential. The granular potential uses Hertzian interactions when the distance r between two particles of radii Ri and Rj is less than their contact distance d = Ri + Rj. There is no force between the particles when radius r is greater than contact distance d. Refer to Chapter 4 for more details about the interaction potential used in this thesis.

2.5 Integration

All the atoms in the molecular dynamics simulation move randomly within the system. For the atoms to move the forces must be integrated. It is very difficult to solve such MD simulations analytically as it usually involves millions of atoms. Therefore, to solve such systems, a numerical integration method is necessary. There are many numerical integration methods available such as Verlet algorithm, leapfrog algorithm, Velocity-Verlet and Beeman’s algorithm. LAMMPS, the molecular dynamics software used in this thesis, uses Velocity-Verlet integration scheme. Velocity-Verlet algorithm, a modified version of Verlet algorithm, is a numeric integration method that determines the positions of the atoms after every time-step. For the given position, basic Verlet algorithm is derived as Taylor expansions, one forward in time and one backward in time as follows.

Page 28: Vinodh Veda Chalam

12

!

r(t + "t) = r(t) + v(t)"t +12a(t)"t 2 +

16b(t)"t 3 +O("t 4 ) (2.3)

!

r(t " #t) = r(t) " v(t)#t +12a(t)#t 2 " 1

6b(t)#t 3 +O(#t 4 ) (2.4)

Adding equations 2.3 and 2.4 we get

!

r(t + "t) = 2r(t) # r(t # "t) + a(t)"t 2 +O("t 4 ) (2.5) Equation 2.5 gives the position after time t. The position is expressed as a function of the current position, the previous time-step position and the acceleration. The truncation error of the Velocity-Verlet algorithm is of the order

!

"t 4 . One constraint with the basic Verlet algorithm is, for the very first time-step, the previous position,

!

r(t " #t) , is not defined. This is fixed in the Velocity-Verlet algorithm by explicitly including the velocities of the atoms, that is, velocity and position are calculated at the same time-step.

!

r(t + "t) = r(t) + v(t)"t +12a(t)("t)2 (2.6)

!

v(t + "t) = v(t) +a(t) + a(t + "t)

2a(t) (2.7)

2.6 Periodic Boundary Condition

Almost all the MD simulations take place in a box or a container of any shape/size. They aim to model infinite systems at the microscopic level given finite means. If a container with rigid boundary conditions is chosen then at microscopic level most of the particles are affected by the edge and wall effect of the system. According to (Rapaport, 2004), for a microscopic simulation with 1000 particle nearly 500 particles sticks to the boundary walls/edges. This situation is avoided by using the Periodic Boundary Conditions (PBC) in which the particles are treated as infinite array of identical translated images of itself as shown in figure 2.2. There are two considerations with this PBC approach. First, particles moving out at one end of the boundary re-enters at the opposite boundary, creating a periodic movement of it. Second consequence is particles that are within rc distance of a boundary interact with particles within the same distance near the opposite boundary. These two considerations are taken into account in LAMMPS while doing the force calculations and the integration of position and velocity.

Page 29: Vinodh Veda Chalam

13

Figure 2.2: A 2D MD simulation with periodic images

2.7 Neighbour List

Figure 2.3 is the neighbour list illustrated visually. Each particle in the simulation needs to interact with every other particle in the simulation step; there are a total of

!

O(n) interactions/forces to be calculated.

!

n2"

# $ %

& ' =

n(n (1)2

)O(n2)) (2.8)

This value is even bigger for short-range forces, that is,

!

O(n2) interactions have to be calculated. Only few of these make useful contributions while others are not updated. In order tackle this problem of huge interactions, LAMMPS implements neighbour list strategy to calculate the forces/interactions. The cut off radius rcut is used to specify or limit the number of interactions associated with individual atoms. Only those interactions associated with the atoms in the neighbour list is accounted for the force calculations. This greatly helps in reducing the time of the simulation (Subramani, 2008). Even this approach has a drawback that the neighbour list requires frequent updates for every time step, which is undesirable because it consumes considerable amount of time in generating the list. To avoid very frequent updates to the neighbour list, a buffer radius rb is chosen such that the buffer radius (rb) is added to the cut-off radius rcut to get neighbour list cut-off radius rnl that is greater than the cut-off radius (e.g. rnl = 2rcut). This buffer radius helps in the displacement of atoms beyond the cut-off radius but still within the neighbour list radius thereby helping in reducing the number of times the neighbour list is refreshed. Though this requires additional memory of order

!

O(n) it reduces the complexity of

Page 30: Vinodh Veda Chalam

14

the computation to

!

O(n) which is better than the naïve

!

O(n2) for large number of particles.

Figure 2.3: Neighbour list for one of all particles (in red) drafted in 2D

2.8 Discrete Element Method

A DEM approach is very much similar to the standard MD approach in which the positions of the particles are updated gradually in time using many discrete time steps (Plimpton, 1995). The basic properties of a DEM are:

• They have many numbers of particles. • Particles are relatively stationary. • Force between the particles is very short-range. • Time taken to compute forces dominate the simulation.

!A DEM analysis starts with a collection of particles or by creating particles in a designated region. Either a sphere or another geometrically well-defined volume or a combination of them mathematically represents each physical particle. The movement of these spherical particles is based on the corresponding momentum balances. Along with the current position and velocity of a particle, the particles physical characteristics are used to calculate the current forces upon the particle. The forces typically include gravity friction, pressure from contact with other particles and physical system boundaries that may include other effects such as those caused by cohesion. These forces are then used to predict the particles future location and velocity for some minor increment called the time-step. Normally the time-step is on the order of millionths of a second. This process is repeated for every particle in the system for each time step. When particles collide with each other or with other parts of the system, the particles are modelled like linear springs, dashpots and joints in the normal and tangential directions as show in figure 2.4.

Therefore, the force �FLJ between two particles in distance �r is obviously:

�FLJ(�r) = −�∇ULJ(r) = 4ε�12

�σ

r

�12− 6

�σ

r

�6�

�r

r2.

To save calculation time, the influence of short-range potentials is usually neglected fordistances r greater than a cut-off radius rc:

ULJ,cut(r) =�

ULJ(r) for r ≤ rc

0 otherwise (2.4)

�FLJ,cut(�r) =�

�FLJ(�r) for r ≤ rc

0 otherwise.

2.2.2 Neighbour lists

Since every of the n particles needs to interact with every other particle in a simulation step,there are �

n

2

�=

n(n− 1)2

∈ O(n2)

forces to be calculated. Especially for short range potentials with cut-off radii, O(n2) inter-actions have to be checked, but only few of them actually make a contribution. Therefore,MD simulators usually implement a neighbour list strategy for these forces, i.e. a list ofparticles within the distance rneigh (of e.g. rneigh = 2 rc) is kept for every particle for sometime-steps. The slower the interaction sphere of the particles move out of that radius rneigh,the longer can the neighbour list be kept. Assuming a fixed rneigh and that the macroscopicdensity of the particles has an upper bound, there can only be a limited number of particleswithin the neighbour list radius for each of the n particles, thus, reducing the complexity ofpure force computation to O(n). While this consumes O(n) extra memory, it is of course abetter method than the naive O(n2) one for a sufficient large number of particles.

Figure 2.2:

Figure 2.2 illustrates the neighbour list for oneof all particles (crossed red) drafted in 2D: the in-teraction sphere is coloured in vertically striped /blue, the additional neighbour particles are horizon-tally striped / green.

15

Page 31: Vinodh Veda Chalam

15

Figure 2.4: Contact model for DEM simulations in the (a) normal and (b) tangential directions.

The contact force is the result of elastic, viscous and frictional resistance between the moving particles, which can be modelled as the spring, dashpot and shear slider. The spring models the elastic interactions while the dashpot expresses the dissipation of energy in the system. Such a model helps in calculating the forces acting on each particle. The new position, velocity and acceleration of the particle are estimated by numerical integration of the Newton’s second law:

!

mi x..i = mig + Fij

j"

!

Ii..

" = (rij # Fij )j$ (2.9)

!

xi..

- translational acceleration

!

mi - mass of the particles i

!

g - acceleration due to gravity

!

Fij - force at contact with neighbouring particles j

!

rij - vector directed from the centre of the particle i to the contact point with particle j

The implementation specifics of the deformations enforced by contacts between the particles, methods of modelling interactions are discussed in detail in Chapter 4.

2.9 Parallelisation and Communication of DEM Simulations

The main constraint of a MD-DEM simulation is it requires many calculations and lot of memory and CPU time. Usually the order of simulation time is proportional to the square of the number of particles in the system. As the system grows, the time taken for the simulation grows exponentially. In order to overcome these constraints, most of the MD-DEM simulations are designed to be implemented in parallel. There are a number of ways to do this parallelisation. One common approach is to run different scenarios each with different start up conditions on independent processors and make them run in parallel. There are two ways to implement this parallel strategy. Either each parallel processor has access to shared memory or they execute the scenario on independent memory and communicate data and results among themselves. Since the

170

rately having its own mass, velocity and contact properties.

Contacts between cylindrical elements in the system are

modelled by the set of linear springs, dashpots and joints in

normal and tangential directions and frictional slider in tan-

gential direction (Figure 1).

The contact force results from elastic, viscous and fric-

tional resistance which can be modelled as the spring, dash-

pot and shear slider. The spring expresses elastic interaction

while the dashpot models dissipation of energy in the system.

Shear slider, that represents frictional force at contact point,

,FF nt (1)

t and Fn are tangential and normal forces, respective-

ly, and is a coefficient of friction.

Such a contact model allows calculation of forces acting

on each particle. Position, velocity and acceleration of the

particle are estimated by numerical integration of the New-

mi xi= mi g+j Fij Ii

¨i= j

(rij·Fij) (2)

mi is a mass of particle i, xi is its translational accelera-

tion, g is acceleration of the gravity, Fij is the force at contact

with neighbouring particles j, Ii is a moment of inertia, ¨i is

particle rotational acceleration, rij is the vector directed from

the centre of the particle i to the contact point with particle j.Deformations enforced by contacts between particles or

particle and boundary during collision are represented by

overlap at contact point. Two methods of modelling inter-

action of two particles during impact have been proposed.

Hard-sphere model does not require as short simulation time

step as soft-sphere model does as a result of different way of

description of velocities and positions of elements in the two

data, which limits its applicability to dilute free-flowing sys-

-

adays an effective tool used for modelling numerous processes

-

aguchi et al., 1998], particles flow [Langston et al., 1997, 2004],

et al -

etc.

Determination of simulation time step. Calculation

of a simulation time step is one of essential questions in DE

modelling. Sufficiently short time step ensures stability of the

system and enables stimulation of the real processes.

-

son [2004], during motion of particles in a granular system

the disturbances propagate in a form of Rayleigh waves along

surface of solid. The simulation time step is a part of Rayleigh

time which is taken by energy wave to transverse the smallest

element in the particular system. It should be so short that

disturbance of particle’s motion propagates only to the near-

est neighbours. It is assumed that velocity and acceleration

are constant during the time step. The time step should be

smaller than critical time increment calculated from theory.

A number of equations have been proposed for calculation

Bray, 2004], however usually it is estimated based on natural

,k/ft ic (3)

proper value of the constant f is very important but not easy.

The reason of difficulties is a fact that the f depends strongly

on packing configuration, number of contacts and proper-

ties of particles. It is different for two- and three-dimensional

Models of contact interaction. Application of a proper

description of particles collision. The particle contact forces

occur only when particles penetrate or overlap. For circular

ij=(rj – ri)–|xj – xi| (4)

ij is the amount of overlap between particles i and j, r is a particle radius and x is the position vector for the particle

centre. Linear and non-linear contact models may be applied.

In the former model normal contact force is a linear function

of the overlap ij and relative velocity of particles ij

F =k ij+c ij (5)

k is coefficient of stiffness and c is coefficient of damp-

ing. The linear model is sufficient to investigate simple pro-

cesses occurring in grain assembly for elastic collisions. In

certain cases, in spite of its simplicity linear contact model

Renzo, 2004]. The more complex processes should be exam-

ined by application of non-linear contact model that was pro-

F =k ij c ij ij (6)

and -

theory for elastic granular materials. The need of examin-

ing materials with various properties enforced extending the

tangential (b) directions.

Page 32: Vinodh Veda Chalam

16

shared memory approach is difficult to implement on large scale, as it requires special memory access patterns to avoid any synchronisation issues among processors, the communication-based approach is preferred. The simplest way to implement the communication-based parallelisation approach is to replicate all the data to every processor that requires additional memory and communication time. The challenge here is to reduce the memory and communication time by aiming to provide each processor only with the minimum and required data it needs for the computation. There are three types of decomposition strategies discussed by Beazleyand and Lomdahl (1994) to overcome this challenge for parallel computing. They are force decomposition, atom decomposition, and spatial decomposition. The MD simulation package used in this thesis (LAMMPS) employs Spatial Decomposition (also known as Doman Decomposition) technique.

Figure 2.5: Spatial Decomposition approach The Spatial Decomposition approach divides the whole simulation domain geometrically into small regions and assigns each region to individual processors. The regions are further divided to cells. The advantage of this approach is each processor has to work only on the particles in its own region (and cells). The only exception is the particles close to the boundary of its neighbouring processor’s region. In this case, message passing has to happen between the neighbouring processors to communicate vital information required for the computation. This helps in limiting the communication only to particles that movie in-and-out of the region and particles on boundaries and has a lesser data size. Thus, the domain decomposition algorithm works efficiently with minimal communication overheads. If there are (N) processors used in the simulation each with the same calculation speed, to calculate the estimated Speed-up (S) it is just enough to count the number of instructions (I) or just the number of processor clock cycles.

!

S =tsequentialt parallel

=IsequentialIparallel (2.10)

10

2 3

1

5

3

72 3

6 7

Domain - whole system

Regions - for each node

Cells -within a region

Page 33: Vinodh Veda Chalam

17

Let

!

" denote the fraction of parallelizable instructions, then there are

!

("# Isequential # N) times parallel instructions and

!

(1"#)$ Isequential non-parallel instructions. Let Ioverhead be the overhead time spent on initialization, communication and synchronization process, the Speed-up S is given by

!

S =Isequential

(1"#)$ Isequential +1N$ #$ Isequential + Ioverhead

!

=1

(1"#) +#N

+IoverheadIsequential

!

=1

(1"#) +#N

+$ , where

!

" =IoverheadIsequential

(2.11)

When

!

"= 0, equation 2.10 is called as Amdahl’s law (Jordan and Alaghband, 2003) that implies even with the negligible overhead time, for a program with 90% parallelizable sections in it, the maximum speedup possible is not more than 10 times faster by parallelisation, no matter how many number of processors are used.

2.10 Summary

With the advent of powerful computational resources, DEM simulations have become a popular method of addressing problems in granular and discontinuous media. Several decomposition techniques are available that help in parallelizing large DEM simulations. Potential functions to find the forces on each particle and a numerical integration technique to evaluate the movement of particle are essential for all DEM simulations. In this thesis, granular potential was chosen because of its ability to model granular materials. The motion of particles is evaluated using the Velocity-Verlet algorithm.

Page 34: Vinodh Veda Chalam

18

Chapter 3 Experimental Setup

This chapter describes the details of the experimental set-up used in the current study. Included in the description are the salient features of each hardware platform used, details about the software package and the visualization setup. It also describes the choice of the experimental parameters required for the simulation and the ways to calculate those parameters values.

Page 35: Vinodh Veda Chalam

19

3.1 The Platforms

3.1.1 HECToR Cray XE6

Architecture Overview

HECToR service (phase 2b) consists of a Cray XE6 massively parallel processor (MPP) distributed memory system. It uses AMD processors with a custom built memory and communication system. This is tightly coupled with operating and management system making it highly scalable and reliable system (cray.com, 2011).

Figure 3.1: The Cray XE6 (cray.com, 2011)

The XE6 as part of HECToR service consists of 1856 compute nodes, which contain two AMD Opteron 6172 2.1 GHz 12-core processors, code-named ‘Magny-Cours’ which means that essentially two hexa-core cluster connected within the same socket. Two such Magny-Cours form a processor in HECToR. This gives 44,544 cores, which gives theoretical peak performance of 373 Tflops. In addition to this, there are 64 service modes: 6core, 2.2 GHz, 16 GB memory.

The processors on XE6 are used as compute nodes or service nodes.

• Compute Nodes run Computer Node Linux (CNL), are configured for user application, and ensures that there is a little OS noise during application execution.

Page 36: Vinodh Veda Chalam

20

• Service Nodes run SuSE Linux and can be configured for login, network and system functions

Memory and Cache

There is 32GB of main memory per 24 core processor accessible using Non-Uniform Memory Architecture (NUMA), giving the XE6 a total memory capacity of 58 TB. Each core has 64KB of dedicated L1 data cache and 64KB of dedicated L1 instruction cache, 512 KB of dedicated L2 cache, 6MB of shared L3 cache. Out of the 6MB of shared L3 cache, 1MB is allocated for maintaining cache coherency. In XE, it is possible to allocate all 32GB main memory to only one core in a processor, which helps in running sparsely populated jobs so that more memory is available per core, although this requires higher computational cycles.

Figure 3.2: Magny-Cours Architecture diagram (hector.ac.uk, 2011)

Page 37: Vinodh Veda Chalam

21

Communication Network The key feature of XE6 is the Gemini Interconnect. The Gemini ASIC is capable of handling tens of millions of MPI messages per second, thus improving the performances drastically. Each dual-socket node is interfaced to Gemini Interconnect through HyperTransport link. The bandwidth of each link is very high – 8GB/s and low latency – around 1.5

!

µs. Data I/O and Storage Out of the 64 service nodes on Phase 2b, 12 nodes are configured as IO nodes. These are integrated to a 3D torus communication network via their own Gemini chips. They provide connection between the machine and its 576 TB RAID disk via Infiniband fibre. A high performance parallel file system (esFS) is used to allow access to the disk by all the IO nodes. The phase 2b system has a 70TB of home file system that is backed up to a 168 TB backup system. Figure 3.3 summarises the current file system.

Figure 3.3: HECToR File System (hector.ac.uk, 2011)

Programming Environment On login to HECToR the Cray XE programming environment is loaded that sets the environmental variables for the compilers and parallel libraries. Cray provides wrappers scripts for compilers – ftn for Fortran compiler and c/CC for C/C++ compilers. They wrappers serve as a single command to compile and link all the necessary parallel libraries.

Page 38: Vinodh Veda Chalam

22

The Portable Batch System – PBS – batch scheduler is used to run the jobs on HECToR. The easiest way to submit jobs is using a batch script that launches the job using the aprun command. Refer to Appendix for more details on the PBS submission script and the parameters that need to be specified in the script.

3.1.2 Ness

Ness is a much smaller parallel machine that is mostly used by students of EPCC and for general research activities (epcc.ed.ac.uk, 2011). Ness is a Sun fire system machine with two 16-core shared memory processors (SMP) nodes, at most one of which can be allocated to a single job. It has the same combination of processor technology, operating system and compiler suite as that of HECToR. Thus, it is very useful to do some inexpensive code development, correctness studies and some performance tests on this machine before transferring to HECToR. Table 2.1 summarises the specification of Ness.

Property Ness Machine Type Sun fire X4600 Machine Category Shared Memory Processor Type 2.6 GHz AMD Opteron (AMD64e) Theoretical Peak Flop rate per core 10.4Gflops/s Cores per node 16 Total Cores 32 Maximum Job Size 16 Memory Per Core 2 GB Interconnect N/A MPI Bandwidth 1 GB/s MPI Latency 0.8 L1 Data Cache Type Private Data L1 Data Cache Size 64 KB L1 Instruction Cache Type Private Instruction L1 Instruction Cache Size 64 KB L2 Cache Type Private Unified L2 Cache Size 1024 KB

Table 3.1: Ness Specification

3.2 The Software

There are a number of commercial simulation packages available to perform DEM modelling. Two software packages – LAMMPS and LIGGGHTS are used to carry out the computations for this thesis.

Page 39: Vinodh Veda Chalam

23

3.2.1 LAMMPS – Code Introduction

LAMMPS (Sandia.gov, 2011) stands for Large Scale Atomic/Molecular Massively Parallel Simulator that is written in C++ and is an open source code distributed under GNU public license developed at Sandia National Laboratories, USA. LAMMPS can be used to model wide range of materials ranging from large-scale atomic and molecular systems using a variety of field forces and boundary conditions. Given the boundary conditions LAMMPS simulates the materials by integrating Newton's equations of motion for a system of interacting particles via short or long-range forces. These forces include pair wise potentials, many body potentials such as Embedded Atom Method (EAM), and long-range columbic forces such as Particle-Particle Particle-Mesh (PPPM) force fields. LAMMPS can be run on single processor as well as multi-processor machines and even on multiple PCs that are connected through an Ethernet network. The LAMMPS code has been parallelised using MPI for parallel communications and it uses spatial decomposition strategy to decompose the domain into small dimensions that can be given to each processor on the parallel machine. All of the LAMMPS and LIGGGHTS simulation for this thesis were run on Ness as well as on the HECToR supercomputer to take advantage of the parallel processing capabilities of LAMMPS. Why LAMMPS 1. LAMMPS is free, easy to use, its source code is well structured, easy to

understand and modify. 2. LAMMPS is fast and it is suitable for massively parallel computing. 3. LAMMPS is well documented and it has large user community. 4. LAMMPS has a good MPI coupling interface. 5. LAMMPS scales well to large number of processors; its speedup generally

increases with the increase in the number of atoms used in the simulation (Reid and Smith, 2005).

6. GPU acceleration is possible in LAMMPS.

3.2.2 LAMMPS – Installation

The latest version (January 2011) of LAMMPS was downloaded from the LAMMPS website as a tar file. This zip file is ftp-ed to an appropriate directory in HECToR. Unzipping the file automatically creates the directory structure for LAMMPS on HECToR. The /src directory contains all the C++ source and header files required for LAMMPS. Given below are the instructions for building LAMMPS January 2011 version on HECToR XE6.

1. Create a Makefile for HECToR

LAMMPS contains two level Makefiles. A top-level Makefile is located in the /src directory and a MAKE subdirectory that has the low level Makefiles or machine specific Makefiles. First stage in building LAMMPS is to create a make file specific for HECToR. A Makefile called Makefile.hector was created using the sample

Page 40: Vinodh Veda Chalam

24

makefile that comes with LAMMPS as a starting template. The Makefile contains a number of system-specific settings, rules to compile and link the source files and many other dependencies that are required in building the executable.

The first step is to change the first line of the Makefile to list the word HECToR after the ‘#’. This line is displayed first when building LAMMPS. In addition, this will include HECToR in the list of available options for the make command.

# HECToR XT4 system

The compiler/linker section lists the compiler and linker settings for HECToR. PGI compiler is the default compiler on HECToR. But there is an open bug in using PGI compilers with new version of LAMMPS. So it was then decided to use GNU compiler instead of PGI compiler. Thanks to Dr. Fiona Reid for her help in implementing this change. The following flags were used to compile LAMMPS:

CC = CC CCFLAGS = -O3 -g -DFFT_FFTW -DMPICH_IGNORE_CXX_SEEK DEPFLAGS = -M LINK = CC $(CCFLAGS) USRLIB = -ldfftw SIZE = size

The location of FFTW libraries needs to be mentioned in the CCFLAGS option. The option –DFFT_FFTW includes the centrally installed one-dimensional FFTW library.

2. Load the appropriate environment

Load the xtpe-mc12 environment and also FFTW 2.1.5.* environment.

module load xtpe-mc12 module load fftw/2.1.5.2

To properly link the FFTW libraries few modifications to the source code is required. The name of the header files in the fft3d.h header file needs to be altered to change fftw.h -> dfftw.h so that they compile correctly against FFTW 2.1.5. This is the version of FFTW library supported by LAMMPS.

3. Build LAMMPS

Now execute the below command being in the /src directory of LAMMPS.

make hector

This will create the executable lmp_hector in the same directory when the build is complete.

Page 41: Vinodh Veda Chalam

25

3.2.3 LAMMPS – Working

This section provides some background to the way LAMMPS works that is very different from other commercial MD-DEM simulation packages. It is not required to compile the code repeatedly for different MD scenarios. Instead, LAMMPS has come up with its own scripting style in the form of an input file. It works by first reading the input file, which has information about the initialization parameters like dimension, units, boundary and particle definitions size, shape and initial co-ordinates and the time-step required for the simulation. LAMMPS then builds the atomic system and periodically writes the thermodynamic information to a log file. Once the simulation is complete, the final state of the system is printed to the log file along with other information like total execution time. All of these are discussed in detail in the subsequent sections.

Figure 3.4: Execution of LAMMPS flow chart

3.2.4 LAMMPS - Input Script Structure

This section describes the structure of a typical LAMMPS input script. As shown in figure 3.4, the input file of LAMMPS is divided into four parts.

Initialization: This is the very first stage to sets the parameters that defines the molecular system and the simulation domain. Example: Define the processor topology.

Particle Definition: This sets the position and forces of particles in the simulation domain. There are three ways to do this. The details can be read from a new data file or restart file from the previous simulation or an input lattice can be created as part of the simulation itself.

Page 42: Vinodh Veda Chalam

26

Settings: Once the simulation topology and the particles are created, this sets various parameters for the simulation like boundary conditions, time-steps, and force field options.

Execute: Once all the properties required are set, the simulation is started and run for the desired time-step.

3.2.5 LAMMPS - Input Script Basics

LAMMPS works by start reading the Input commands from the Input file one at a time. Each command in the input file prompts LAMMPS to take some action, which includes setting some internal variables or reading the data file or setting some of the material parameters, or run the simulation. The execution stops when LAMMPS reaches end of the input file. The order in which the commands are placed in the input file is not important but the following rules apply.

1. The ordering of commands in the input file has logic impact on the working of LAMMPS. Thus the sequence of commands

!"#$%&!$'()*+(

,-.((((((((((/))(

,-.((((((((((/))(

does something different than this sequence:

,-.((((((( ((/))(

(!"#$%&!$'()*+(

,-.((((((( ((/))(

In the first case, both the simulation run for 100 iterations each uses the time-step of 0.5 fmsec but in the second case, for the first 100 iterations the default time-step (1.0 fmsec) is used and a time-step of 0.5 fmsec is used for the second 100 iterations.

2. Sometimes the output of command A might be used in command B. This means command A must precede command B in the input script for the desired effect to happen.

3. Some commands are valid only when they follow other commands. For example, the command to set the temperature of group of particles in the simulation cannot be carried out until the particles are created and the region is defined.

3.2.6 LAMMPS - Parsing Rules

• Each non-blank line in the input file is treated as a command by LAMMPS. LAMMPS commands are case sensitive. All the LAMMPS commands are in lower case, while upper case may be used for some user-defined commands or in file names. Here is how LAMMPS parses the input script.

Page 43: Vinodh Veda Chalam

27

• The very first word in any non-blank line is the command name, which is followed by the list of arguments (specified in the same line)

• If the last character in any line is ‘&’, then it is assumed that the command is

continued in the next line. The previous line is concatenated with the next line by removing the ‘&’.

• Any line that starts with ‘#’ is treated as comment and is discarded.

• All user defined variables are followed by a ‘$’ sign. If the variable name is just a single character then it follows immediately next to the ‘$’ sign. But if the variable name is more than a single printable character then it is specified within curly brackets. For example: $x, ${my_var}. The variable in this case are x and my_var.

• All output text to the screen or to the log files are enclosed within double quotes.

3.2.7 LIGGGHTS

LIGGGHTS (liggghts.com, 2011) is an open source, C++, MPI parallel DEM code for modelling granular materials. LIGGGHTS stands for LAMMPS Improved for General Granular and Granular Heat Transfer Simulations developed and distributed by Christoph Kloss of Christian Doppler Laboratory on Particulate Flow Modelling at Johannes Kepler University, Austria. LIGGGHTS is part of the CFDEM project with the goal to develop a new CFD-DEM approach. LIGGGHTS is based on LAMMPS and it provides potentials for modelling soft materials, solid-state materials and coarse-grained granular materials. It can be used to model particles at the atomic, meso or continuum scale. DEM methods involve the simulation of coarse-grained granular particles and LAMMPS offers both linear and non-linear granular potentials for this purpose. All these features of LAMMPS for granular simulations are improved on in LIGGGHTS. The following are some of the new features that LIGGGHTS improves on LAMMPS:

• It is possible to import complex geometry from computer-aided design (CAD) into a LIGGGHTS simulation

• Pair style parameters like stiffness and damping can be linked to material properties that can be derived from lab experiments (e.g. density, Young’s Modulus, Poison’s ratio and coefficient of restitution)

• It has the potential to model macroscopic cohesion • LIGGGHTS has Dynamic load balancing

All the LAMMPS features, rules, commands and working discussed in previous sections are applicable to LIGGGHTS as well.

Page 44: Vinodh Veda Chalam

28

3.3 Visualization Setup

LAMMPS does not do any post processing or visualization of the simulations. However many visualization tools available can be coupled with LAMMPS to visualise the output. Pizza.py toolkit (sandia.gov, 2011) and Paraview (paraview.org, 2011) are used in this thesis for Visualisation of LAMMPS output. Pizza.py is an integrated collection of tools that provide post-processing capabilities for LAMMPS package. August 2010 version of pizza toolkit is used in this thesis.

3.4 Description of the Mechanical Properties of Snow

Many of the theory and results discussed in this thesis involve some or all of the mechanical properties that are discussed in this section. This section outlines the physical properties. Refer to section 5.7.2 for actual values of these parameters used in this simulation.

3.4.1 Size and shape of snow particles

There are several forms of snow particles, of which three are described here:

Snow Crystals: Typical the size of crystals range from microscopic to at most a few millimetres in diameter. Snow Flakes: Several snow crystals clump together to form the Snowflake. The snowflake can grow large in size up-to 10 mm across in some cases when the snow is especially wet and thick. Graupel: When the loose collections of super cool water droplets coat a snowflake, they form Graupel. The typical size of a Graupel is 2 to 5 millimetres in diameter.

For this thesis, it was decided to use spherical snow particles – like graupel – of size 5mm – the maximum size of particles observed in field (In a conversation on 08th February 2011 Virdee stated that he has observed graupel snow particles of size 5mm while on field)

Page 45: Vinodh Veda Chalam

29

Figure 3.5: Graupel Snow particles (Wikipedia.com, 2010, creative commons licence)

3.4.2 Density

Symbol: !S Unit: kg m–3 Density is the fundamental parameter of any material, which is calculated as mass per unit volume (kg/m3). For porous materials, density refers to bulk density, which is the total mass per volume. It is determined by weighing snow of a known volume. Total snow density includes all constituents of snow that includes ice, liquid water and air (Armstrong et al., 2009).

3.4.3 Young’s Modulus

Symbol: E Unit: pascal or N/m2

Young’s modulus is used to characterise the stiffness of an elastic material (Godbout et al., 2000). It is the ratio of stress (measures in units of pressure), to strain (dimensionless unit). For snow ice, Young’s modulus can be measure from Drouin and Michel 1971:

!

E = (5442.3 " 67.3Ti) Ti is the temperature of ice (

!

0!C) (3.1)

3.4.4 Poisson’s Ratio

Symbol:

!

" Unit: No Unit

In 3D, when an elastic material is stretched in one direction, it tends to get thinner in the other two directions. Poisson’s ratio is defined as the ratio of contraction or lateral stress to the extension or longitudinal strain under the influence of uniform uni-dimensional stress. Poisson's ratio is related to K the bulk modulus, G the shear modulus; and E the Young's modulus, by the following: (Sinha, 1987)

Page 46: Vinodh Veda Chalam

30

!

" =(3K # 2G)(6K + 2G)

(3.2)

3.4.5 Coefficient of restitution

Symbol:

!

" Unit: No Unit

The coefficient of restitution is defined as the ratio of rebound velocity (

!

"r) to the impact velocity (

!

" i) in the normal direction (Higa, et al., 1995).

!

" =# r

# i (3.3)

3.4.6 Coefficient of kinetic friction

Symbol: f Unit: No Unit

Coefficient of friction is the dimensionless scalar value that is given by the ratio of the force of friction between two bodies and the force pressing them together (Schaerer, n.d).

!

f =5u

u is the avalanche speed in m/s (3.4)

3.5 Summary

This chapter presented both the hardware and software setup used in the project. Both LAMMPS and LIGGGHTS are installed on Ness as well as on HECToR. Taking advantage of the configuration similarities between Ness and HECToR, the model was first developed and tested on Ness before porting it to HECToR. The material parameters discussed in this Chapter are used in actual implementation of the model.

Page 47: Vinodh Veda Chalam

31

Chapter 4

Modelling of Cohesive Interactions

Snow particles will stick together as they flow in an avalanche. Therefore, this study needs to investigate cohesive numerical models of DEM. In recent years, DEM of granular materials have been largely developed and they have great potentiality in both industrial application and academic research. This chapter presents the numerical models of cohesion in a discrete element framework. There are three levels of cohesion: adhesion, cementation and capillarity. The focus of this thesis is only on one level of cohesion: adhesion. Numerical modelling of cohesive phenomena must take into account the shapes of particles and hydrodynamic interactions. The numerical implementation of these interactions depends on the solving strategy: Molecular Dynamics discussed in Chapter 2, which is based on the means of the equations of dynamics and pair-wise contact interactions. They can be extended for cohesive interactions, which are supplemented to the repulsive elastic and frictional interactions of cohesion-less materials.

Page 48: Vinodh Veda Chalam

32

4.1 DEM Revisited

In this work, DEM is applied to individual snow particles whose larger-scale bulk behaviour is defined by the way these particles interact with each other. Each particle is computationally defined along with their shape, initial position, velocity and other physical properties and changes in these parameters over time calculated for each particle as they move around and interact/collide with other particles in the simulation domain. There are two ways to resolve these collisions between particles – hard or event driven approach (EDM) and soft or time driven approach (TDM). Since EDM assumes instantaneous collisions between particles (impulses), it is more suited for dilute granular materials. Only the TDM approach helps in resolving the collision forces between particles. Each particle is treated as a rigid body that can overlap with its neighbour. Cundall and Strack (Cundall and Strack, 1979) first developed such approach. TDM simulations are time driven, that is, the state of all particles at time t is based on and updated after an adaptive time-step delta t. A more detailed review of the DEM approach followed in this thesis is presented in rest of this chapter.

Figure 4.1: Diagram to illustrate the typical flow of a DEM simulation

Page 49: Vinodh Veda Chalam

33

4.2 Defining a particle and particle collision

A DEM algorithm tracks trajectory and forces of each particle individually at a microscopic level. This section discusses the definition of a particle and a collision from Computational perspective.

4.2.1 Particle Definition

In general, a particle can be defined as a small-localised object with physical properties such as volume and mass with two volume boundary surfaces. The boundary Sb defines the physical surface of the particle. In a practical scenario, particles can interact with other particles in the system even before the physical contact occurs between them (e.g. electrostatic forces). Therefore, each particle is considered to have a virtual boundary called effect surface boundary, Se, which defines the boundary at which the particles interact with its neighbours. The entire body volume of the particle should fall within the effect volume due to Se. Each particle has a co-ordinate centre/origin, Oxyz and centre of mass Ocm. Oxyz and Ocm need not be same always. Figure 4.2 illustrates the basic computational aspects of a particle.

Figure 4.2: Definition of a computation particle

The body and effect surfaces vary depending on the shape of the particles. The particle shape has direct impact on the DEM simulation (Cleary and Sawley, 2002)(Jensen et al., 2001). The use of complex geometry for particle shapes increases the computation time significantly for collision detection. For this reason, Sb and Se are defined as simple Spheres in this thesis such that Oxyz = Ocm. If rp is the base radius of the particle with density

!

" , the particle mass mp is given by

!

mp =43"#rp

3 (4.1)

Page 50: Vinodh Veda Chalam

34

4.2.2 Cohesive forces

Each particle has two surface boundaries: Sb – body surface, Se – Effect Surface and two volumes: body volume and effect volume. There are three possible collision states between particles as shown in figure 4.3. Independent: Two particles are said to be independent of each other when their effect volume due to effect surface (Se) do not overlap. Interacting: Two particles are said to be interacting with each other when, the effect volumes of the two particles overlap. The particles are not in contact physically but they interact through some kind of long-range forces. Colliding: In this state, the particles are in physical contact with each other and their effect volumes and body volumes overlap.

Figure 4.3: Different possible collision states When two spherical particles p and q are in collision state, a number of their properties can be combined to describe the particle in collision. The below approach is followed in this thesis to calculate the position, velocity and mass of particles in collision. The relative position, xpq, of two particles p and q is the vector from particle q to particle p

!

xppq = xp " xq (4.2) The relative velocity, upq, is also calculated in the same way

!

upq = up " uq (4.3)

Page 51: Vinodh Veda Chalam

35

The reduced mass, mpq, is the effective inertial mass of the two particles

!

mpq =mpmq

mp +mq

(4.4)

4.3 Modelling Cohesive Contacts

This section briefly presents a general framework for the contact cohesion in which various cohesion laws can be implemented. This framework is based on the determination of a behaviour law and a failure criterion. The specific physical models that are implemented in this thesis are discussed in section 4.5

4.3.1 Contact Point and Collision Normal

There are two commonly used methods to determine the contact point and collision normal between the two colliding particles: Common Normal approach and Intersection approach (Hogue and Newland, 1994)(Dziugys and Peters, 2001). In common normal approach, a point on each of the interacting particle’s boundary that possesses a common normal vector is identified first. The contact point is the mid-point on the line joining these two points and the normal vector is the common normal vector between the two points. In Intersection method, the contact point is the mid-point of the line joining the intersection points of the particles boundaries and the collision normal is defined perpendicular to the line. The same holds well for 3D, with the change as intersection, line becomes a plane, with the contact point at the centre.

Figure 4.4: Two ways to calculate collision normal and contact point

For spherical particles used in this thesis, both the intersection and the common normal methods give the same normal vectors but different contact points. For contacting particles with similar material parameters the common normal approach is

Page 52: Vinodh Veda Chalam

36

more realistic (Dziugys and Peters, 2001). For this reason, common normal approach is used in this thesis.

4.3.2 Normal Deformation and Contact force

The particles deform in size and shape because of the collision. Although it is possible to model the actual particle deformation (Feng, 2000), practically it is not possible to do so because of the computational effort required for such modelling. Instead, the deformation process is approximated by an overlap volume or distance between the two colliding particles. The normal overlap distance between the two particles is given by

!

" as shown in figure 4.5.

Figure 4.5: Contact zone between two spherical particles

For the two colliding particles p and q with radii rp and rq respectively, the normal body surface overlap distance

!

" pq is given by

!

" pq = rp + rq # xpq

!

(4.5)

The relative velocity is expressed into normal and tangential components as given in equation 4.6 and 4.7 respectively.

!

upq n = (upq • n^pq )npq

^ (4.6)

!

upq t = upq " upq n (4.7) The forces acting between the particles are normal to the contact plane and it is expressed as equation 4.8.

!

fn = fne + fn

d + fnc (4.8)

!

fnd – Repulsive forces at contact

!

fnd - Viscous force

!

fnc – Cohesion force

Page 53: Vinodh Veda Chalam

37

For spherical particles, the contact region takes the shape of a disk of radius (say a) at which the pressure is not uniformly distributed. The contact force is given by Hertz contact law, which is expressed as in equation 4.9.

!

fne =

43E * R* ("#n )

32 (4.9)

With

!

1E * =

(1"#12)

E1

!

+(1"#2

2)E2

where E1 and E2 are young’s modulus of the two

particles

!

"1 and

!

"2 are the poison’s ratio and R* is the harmonic mean of the particles radii. In the case of cohesive granular systems, the macroscopic behaviour is much more dominated by the cohesive interactions than by the non-linear behaviour at the contact. In such case, the linear approximation of the contact force is more relevant.

!

fne = kn ("#n ) (4.10)

4.3.3 Collision Detection

In DEM collision, detection consists of two phases: neighbour searching and geometric restitution. The neighbour-searching phase identifies the list of particle pairs that might come in physical contact with the particle and the geometric restitution phase calculates the exact collision properties. Building the neighbour list helps in optimizing the neighbour searching process (section 2.7).

Figure 4.6: Stages of collision detection

4.4 Basics of Contact force models

The particle interaction forces play a major role in the DEM simulation of granular materials. This section introduces the physical bases that are needed to understand the interaction models.

Page 54: Vinodh Veda Chalam

38

Direct Contact Interaction When two particles interact with each other, it produces repulsive forces because of elastic surface deformation. This elastic deformation of spherical particles is defined by Hertz theory (Hertz, 1882). Direct contact of two particle surfaces produces frictional forces that resist the sliding (tangential) motion of particles. The approximation technique followed for friction forces is known as Coulomb friction, which is given by product of normal force and friction coefficient

!

" . The value of

!

" depends on whether the flow is static or sliding. Contact-Independent Interaction Some forces act between particles even when they are not physically in contact with each other. Cohesion between particles is because of the Van-der-Waals forces. For non-deformable spherical particles, Hamaker constant defines the Van-der-Waals forces (Hamaker, 1937). For deformable spheres using Hertz contact model, the Van-der-Waals forces are defined by what is known as JKR model (section 4.5.2). Defining Force Models Almost all the particles in the system influence the interacting particles for the reasons discussed in the previous section. In DEM approach, some kind of numerical model is used to evaluate the magnitude of these forces exerted by the particles upon each other. These models are based on the contact geometry described in section 4.3 and divide the forces into normal component Fn and tangential component Ft that act through the point of contact C.

4.5 Physical Models of Cohesive Contact

The general framework discussed in previous section can be used to model several cohesive interactions. Many cohesive contact force models exist. An early model of contact adhesion was developed by Bradly. In this model, the adhesion is calculated using the Van-der-Waals interactions and the contact deformation of the surface is completely neglected. Since contact deformation is the key in cohesive granular particles, this model is not chosen for this thesis. In this thesis two popular force models are implemented:

1. Non-Linear cohesion model (LAMMPS simulation) (Aarons, et al., 2009) 2. JKR cohesion model (LIGGGHTS simulation) (Radjaï and Dubois, 2008)

4.5.1 Linear cohesion model

Just like any standard DEM approach, the particles were allowed to overlap when they collide with other particle, at which point the particles exert a repulsive force on each other. This repulsive force that act in the normal direction is given by the linear spring-dashpot normal force model,

!

Fn = k" #$vn (4.11)

Page 55: Vinodh Veda Chalam

39

Here

k - normal spring stiffness

!

" - overlap

!

" - damping coefficient

!

Vn- relative normal velocity of the colliding particles.

The inelastic behaviour of the model is characterised by the damping term. The elasticity is defined by the coefficient of restitution – which is given by

!

" = exp #$%

2mk #%2&

' ( (

)

* + + (4.12)

Here

!

m = "#d3 /6 is the mass of a particle. A linear spring slider model gives the force exerted by the colliding particles in the tangential direction,

!

Ft =min(kt"s,µFn ) (4.13) Here kt is the tangential spring constant and

!

"s is the distance between the two particles in tangential direction. The cohesive particles interact with each other via the van der Waals forces whose magnitude is given by

!

FvdW =Ad6

6s2(s+ 2d)2(s+ d)3 (4.14)

Here A – Hamaker Constant s – distance between surface of the two colliding particle When the particles collide, this model tends to diverge. To avoid this, a minimum cut-off separation smin is used such that s <smin the van der Waals force remained equal to the force experience at smin. The strength of the cohesion is expresses as the ratio of the maximum van der Waals forces experienced by a particle to its weight.

4.5.2 JKR cohesion model

JKR model developed by Johson I et al. is based on Hertz elastic model. Figure 4.7 is the pictorial representation of a particle of diameter d attached to a flat surface (in green). Let P be the external force applied to the particle, ‘a’ is the contact radii; Fad is the adhesion force between the particle and the surface.

Page 56: Vinodh Veda Chalam

40

Figure 4.7: Hertz contact force model

In Hertzian model, the normal pushback force between two particles is proportional to the area of overlap between the two particles. By considering the contact forces between the two smooth particle surfaces and under the assumption of hertz elastic model, the JKR model leads to the expression: (equation 4.15)

!

a3 =R*

E * fn + 3"#R* + 6"#R* fn + (3"#R*)2[ ] (4.15)

!

E * – reduced elastic modulus

!

R* – reduced radius of the particles in contact

!

" – surface energy in J/m2

!

fn – normal force JKR cohesion model is more accurate on system with large cohesion density and larger particles.

4.6 Summary

This chapter introduced DEM as a suitable method to capture the snow particle behaviour. Its context has been explained in terms of its constituent parts – collision detection, contact force evaluation, inter-particle force models - each is explained and critically reviewed. Details of the physical cohesion model implemented in this thesis are presented.

Page 57: Vinodh Veda Chalam

41

Chapter 5

Implementation Details

For the purpose of this thesis two cohesive models were considered - one supplied after discussions with Dr. Jin Sun - a granular material expert at Institute for Infrastructure and Environment at University of Edinburgh and the other is the built-in model that comes with LIGGGHTS. The implementation details of these two models using LAMMPS/LIGGGHTS are presented in detail in this chapter along with some discussions on the simulation results.

Page 58: Vinodh Veda Chalam

42

5.1 Porting LAMMPS Cohesion Add-on to HECToR

LAMMPS granular packing does not have functionality to model Cohesion forces between granular materials. Dr. Jin Sun has developed a Cohesion potential based in the Linear Cohesion model, as discussed in section 4.5.1, for a different project of his. Thanks to Dr. Jin Sun for providing the cohesion add-on code to use in this project.

The first step in the implementation phase is to port the cohesion add-on code to HECToR. The add-on consists of two files – fix_cohesive.cpp and fix_cohesive.h. Two main challenges in porting the code to HECToR are: First, the code has been tested on serial Linux based machine but it has not been ported to any of the massively parallel machine like HECToR. Second, Dr. Sun’s code was developed for a very old version of LAMMPS (June 2007). It might not be compatible with the currently installed version on HECToR (January 2011).

The code was ported successfully on HECToR, the LAMMPS was compiled, and the binary executable was built successfully. Nevertheless, it did not execute because of the compatibility issues. Many of the data structures used in the June 2007 version were deprecated and new data structures were added in the new version. It was first decided to use the version of LAMMPS (June 2007) that Dr. Jin Sun used for his code development. Then the new version (January 2011) LAMMPS has many new features that will be of great help for the project. So, it was decided to fix the issue and make the add-on compatible to the new version. The project plan was updated accordingly to include this in the plan (see Appendix A). The fix was not as easy as it was first estimated. After thorough analysis of the code, it was found that the problem was because of two reasons as discussed in section 5.1.1 and 5.1.2.

Keeping in mind the budget constraint on HECToR, it was decided to use Ness for code development and testing of the code changes before porting it to HECToR. There is no central installation of LAMMPS on Ness. So my own installation of LAMMPS was done on Ness along with pizza.py toolkit and Paraview that are required for visualisation.

To port LAMMPS to Ness, a makefile called Makefile.Ness is created using the Makefile for the HECToR as a starting template. Due to the configuration of Ness, it is necessary to include the absolute path to the directories containing the compiler or required libraries. Refer to Appendix B for the makefile used to build the parallel version of LAMMPS on Ness. Ness has got both PGI and GNU compliers. Because of the issue with PGI compilers and new version of LAMMPS, GNU compiler is used on Ness as well. The Ness helpdesk team was contacted to get the location of the necessary libraries (FFTW libraries).

Page 59: Vinodh Veda Chalam

43

5.1.1 Modifying the fix_cohesive.h header file

First problem was the new cohesion add-on was not recognised by the new version of LAMMPS (January 2011). The way the header files are implemented in the new version of LAMMPS has changed. The fix_cohesive.h file was re-written to match the style followed in new LAMMPS version. The fix_cohesive.h should be structured as follows

#ifdef FIX_CLASS FixStyle(cohesive,FixCohe) #else ……………… (class definition of FixCohe) ……………… #endif

Here ‘cohesive’ is the new fix style that is added to LAMMPS, and FixCohe is the class name defined in the fix_cohesive.h and fix_cohesive.cpp files. Now when LAMMPS is re-built, the new fix style (cohe) becomes part of the executable and can be invoked with the fix command like as given below.

fix cohe all cohesive 9.6e-8 0.01 4.0E-5 0.25 1

In this command, the "cohe" is the name of the new cohesive command and "all” is the particle group. Because there are two formulas for van der Waals force implemented, the last parameter is an option to choose which to use – 0 for a slightly complicated version. Normally option 1 is used, a common formula. 9.6e-8 is for the Hamaker constant. 0.01 is for London retardation wavelength and not used for option 1. 4.0e-5 and 0.25 are for the minimum and maximum separation, respectively.

5.1.2 Modifying the fix_cohesive.cpp header file

Second issue is because of the way the Neighbour list is built in the new version (January 2011) of LAMMPS has changed significantly.

fix_cohesive.cpp:120:21: error: 'class LAMMPS_NS::Neighbor' has no member named 'firstneigh' fix_cohesive.cpp:121:23: error: 'class LAMMPS_NS::Neighbor' has no member named 'numneigh' When investigating the reason for the error message it was found that the variables 'firstneigh' and 'numneigh' that was declared and initialised in the neighbor.h header file in the old version of LAMMPS (June 2007) is now deprecated in the new version (January 2011). Fixing the second issue was a bigger challenge. LAMMPS has more than 350,0000 lines of code organised, as parent and child class and virtual functions and it were a challenge to find out the way the variables are declared and used.

Page 60: Vinodh Veda Chalam

44

In the June 2007 version of LAMMPS, both the neighbour and neighbour list objects are part of the Neighbour class. In the new version of LAMMPS, a new parent class called NeighList is created to do the entire neighbour list related processing. In Jin’s code the firstneigh and numneigh variables belong to the Neighbour class. It was modified to be part of NeighList class as given below. neighs = neigh_list->firstneigh[i]; numneigh = neigh_list->numneigh[i]; This didn’t fix the issue. The code failed with segmentation fault error and the core files were created. Two things were tried out. Intermediate print statements were placed in the code to print the objects, pointers and other loop variables on to the screen and also the Compiler debugging features were turned and debugger was used to find out where the code crashed. These two steps helped to identify the issue. The segmentation fault was because the list object was empty. The below two functions were over ridden in the fix_cohesive.cpp class to populate the list object. void FixCohe::init() { int irequest = neighbor->request((void *) this); neighbor->requests[irequest]->pair = 0; neighbor->requests[irequest]->fix = 1; if (strcmp(update->integrate_style,"respa") == 0) nlevels_respa = ((Respa *) update->integrate)->nlevels;

}

void FixCohe::init_list(int id, NeighList *ptr) { list = ptr;

}

The way the force fields are calculated also requires some changes so that the cohesive code blends well with the new LAMMPS structure. The modified code is given below.

for (ii = 0; ii < nlocal; ii++){ i = ilist[ii]; if (!(mask[i] & groupbit)) continue; xtmp = x[i][0]; ytmp = x[i][1]; ztmp = x[i][2]; radi = radius[i]; jlist = firstneigh[i]; jnum = numneigh[i]; for (jj = 0; jj < jnum; jj++) { j = jlist[jj];

Page 61: Vinodh Veda Chalam

45

delx = xtmp - x[j][0]; dely = ytmp - x[j][1]; delz = ztmp - x[j][2]; rsq = delx*delx + dely*dely + delz*delz; radj = radius[j]; radsum = radi + radj; if (rsq < (radsum + smax)*(radsum + smax)) { r = sqrt(rsq); del = r - radsum; if (del > lam*PInv) ccel = - ah*radsum*lam* (6.4988e-3 - 4.5316e-4*lam/del + 1.1326e-5*lam*lam/del/del)/del/del/del; else if (del > smin) ccel = - ah * (lam + 22.242*del)*radsum*lam/24.0/(lam + 11.121*del) /(lam + 11.121*del)/del/del; else ccel = - ah * (lam + 22.242*smin)*radsum*lam/24.0/(lam + 11.121*smin) /(lam + 11.121*smin)/smin/smin; rinv = 1/r; ccelx = delx*ccel*rinv ; ccely = dely*ccel*rinv ; ccelz = delz*ccel*rinv ; f[i][0] += ccelx; f[i][1] += ccely; f[i][2] += ccelz; if (newton_pair || j < nlocal) { f[j][0] -= ccelx; f[j][1] -= ccely; f[j][2] -= ccelz; } } } }

These changes fixed the compatibility issue and the code was successfully running on Ness. The code was tested by executing some example simulations those come with LAMMPS and verifying their output for correctness. The code was then ported successfully it HECToR. No issues were encountered in porting the code to HECToR. It was tested on HECToR as well for correctness before proceeding with developing the snow model.

Page 62: Vinodh Veda Chalam

46

5.2 Building the granular module within LAMMPS

LAMMPS comes with a Granular module exclusively for granular DEM simulations. In the LAMMPS distribution, the granular module is distributed as an add-on module, which means it is not included in the default compilation of LAMMPS. It has to be built separately. In order to do that, go the LAMMPS source sub-directory (/sub) and type

make yes-granular

followed by

make hector

to compile LAMMPS with the granular package on Hector

5.3 LAMMPS Granular Essentials

Granular Systems are composed of spherical particles of certain diameter value, which mean they have an angular velocity and it is possible to make them rotate, by imparting torque. As a general guideline, to run the granular simulation in LAMMPS it is required to use the following commands.

• "#$%&'#()*+',-*.*! ! !• /01!23*+',-*.*! ! ! ! ! !• /01!4."30#(!

This compute calculates rotational kinetic energy, which can be outputted to the dump file.

• 5$%,6#*!*.$#"#*+',-*.*!

Use one of the 3 pair potentials that calculate the forces and torques between the particles.

• 7"0.&'#()*!4."2+-0'#$.(! !• 7"0.&'#()*!4."2+2$&-0'#$.(! ! ! !• 7"0.&'#()*!4."2+8*.#90"2!

Any of the below fix options specific for granular systems

• :01!/.**9*!• :01!,$6.!• :01!30'5$6'!• :01!;"))+4."2!

Page 63: Vinodh Veda Chalam

47

5.4 Determination of simulation time-step

Determining the simulation time step is one of the key steps in DE modelling. The computational time of the simulation is determined by the time step chosen. If the time step is very small then the trajectory will cover only a limited proportion in the simulation space. While a larger time step value will result in instabilities due to high energy over laps. Such instabilities might lead to LAMMPS failure. The disturbances due to the motion of particles propagate in a form of Rayleigh waves along the surface of solid. The simulation time-step should be so short that the disturbance of particle’s motion propagates only to the nearest neighbours. The time-step should be smaller than the critical time increment calculated from theory. The velocity and acceleration are kept constant in the calculation. The time step is estimated based on the natural frequency in a liner spring system (Raji and Favier, 2004).

!

"tc = f mi /k (5.1)

where k is effective stiffness and f is a factor.

Choosing the correct value for f is not the easy task. It is not easy because f depends strongly on packing configurations, number of contacts and properties of particles.

Different values for the simulation time step were chosen to examine their influence on the accuracy of the results obtained. The time-step value 0.00001 is found to be the optimal value for the simulation.

5.5 LAMMPS Simulation

5.5.1 Implementation Details

The following scheme is used in this thesis to generate the input file for LAMMPS. The input file is given in Algorithm 1. 1. First step is to define the simulation domain. This includes definition of the particle style, boundary conditions, how the ghost particles are handles and the units used in the simulation. The simulation requires the ‘granular’ atom style be used. This is defined in the input script in line number 2. An associated command tells LAMMPS to create a data structure used to index particles. This is specified using the atom_modify command (line 3). The atom_modify command is to modify the properties of the particle and determines the way to find out the local particle index associated with a global particle ID (particle look up). Array storage technique is used in the simulation to do the particle lookup because it is the fastest method to do particle lookup. The keyword array (line 3) means each processor maintains a lookup table of size N (number of particles in the simulation). It requires memory proportional to N/P, where P is the total number of processors used. As discussed in section 2.6, periodic boundary conditions are used in this thesis for bulk simulations (line 4). The inter-processor communication is turned on to communicate velocity information with particles in the ghost particles. The ghost particles store this velocity

Page 64: Vinodh Veda Chalam

48

information since it is need for calculating the pairwise potential (line 6). For the purpose of this simulation, all quantities specified in the input script, data file as well as the quantities output to the log file, and the dump files all use SI units (line 7). 2. Next, the inter-atomic potential equations and its parameters were specified. Inter atomic potential equations help in describing the forces acting between particles. This simulation uses Hertz potential (line 12). 3. The third step is particle creation. This can be done in several methods like reading from a data file or using the restart file from previous simulation or directly using the pour command. In this thesis, the particles are defined using the pour command (line 19). The pour command is the most important command in the simulation. It inserts granular particles into the system every few time-step until all the particles are inserted in to the system. The region for insertion (insertion volume) is specified using the region command in line 18. Inserted particles will be of granular type with diameter 0.005 m, density 2500. For each insertion time-step, a fraction of total particles is inserted inside the simulation region that mimics the stream of poured particles. The inserted particles flow out of the insertion region under the influence of gravity, until then next fraction of particles is not inserted. More particles are inserted at any one time-step if the insertion volume specified using the vol command is high. However, the insertion volume cannot be more than 0.6 (on a scale of 0 to 1). If this value is more than 0.6 then the particles tend to overlap which is not correct. For this simulation, the insertion volume is set as 0.5. 4. Then, to stabilise the atomic structure energy minimization algorithms are used. Once the system is stable and the initial boundary coordinates are assigned, an initial round of equilibration is required before starting the simulation. A random number generator is assigned to set the initial velocities of particles in the simulation domain. Time-step and duration of MD simulation are specified. NVE (Constant temperature, Constant volume) integration algorithm is used to update the position, velocity and angular velocity of the particles. 5. The simulation is started using the run command (line 25). Initially the simulation is run only for 1 time-step and the thermodynamics details of the particles are written to the dump file (LAMMPS output file) using the dump command (line 26). This is required for the proper functioning of the visualisation tool. The Integration algorithms (Velocity-Verlet algorithm in this thesis) are then used to solve the Newton’s equations and calculate the new positions and velocities of particles. The simulation is run for up to 5000 time-step (line 27). 6. Once the particle insertion is completed and the particles settle in the simulation box, the cohesion potential is applied to all the particles in the system using the fix cohe command (line 31). The first parameter in the fix-cohe command is the Hamaker constant. Then the chute flow is induced under the influence of gravity with the flow occurring in the X-axis at a specified inclination of

!

35!(line 32). The simulation is now run for another 35000 time-step to visualise the chute effect.

Page 65: Vinodh Veda Chalam

49

7. The output file is examined to extract the final positions and velocity of particles, their thermodynamic statistics like energy, pressure and temperature to visualise it in Paraview.

Algorithm 1: Input Script for LAMMPS Simulation

((/0(1(23-,(4,5.-65,('5,!"76$&(".!3(73.!5".$,8(!9$.(".:-7$(79-!$(;63<(

((=0(5!3#>&!?6$( ( 4,5.-65,((

((@0(5!3#>#3:";?( ( #5'(5,,5?(

((A0(B3-.:5,?( ( '('(#(

((+0(.$<!3.( ( ( 3;;(

((C0(73##-."75!$( ( &".46$(D$6(?$&(

((E0(-."!&( ( ( &"(

((F0(,$4"3.( ( ( ,$4(B637G()()*/+()()*/+()()*/+(-."!&(B3H(

((I0(7,$5!$>B3H( (( /(,$4(

/)0(.$"49B3,( ( )*))+(B".(

//0(.$"49>#3:";?( ( :$65?()(

/=0('5",>&!?6$( ( 4,5.J9$,!KJ9"&!3,?(=))))))*)(LMNN(+)*)(LMNN()*+(/(

/@0('5",>73$;;( ( O(O(

/A0(!"#$%&!$'( ( )*))))+(

/+0(;"H(( ( ( /(566(.D$J&'9$,$(

/C0(;"H(( ( ( =(566(4,5D"!?(I*F/(D$7!3,()*)()*)(%/*)(

/E0(;"H(( ( ( K<566&(566(<566J4,5.(=))))))*)(LMNN(+)*)(LMNN()*+()(K'65.$()*)P(

( ( ( ( )*/+(

/F0(,$4"3.( ( ( &65B(B637G()*)=()*/A()*)=()*/A()*/@()*/A(-."!&(B3H(

/I0(;"H(( ( ( ".&(566('3-,(/))))(/(/(D36()*+(/))(:"5#()*))+()*))+(:$.&(=+))(P(((((

=+))(D$6()*()*()*()*(%/*((,$4"3.(&65B(

=)0(73#'-!$(( ( /(566($,3!5!$J&'9$,$(

=/0(!9$,#3>&!?6$( ( 7-&!3#(&!$'(5!3#&(G$(7>/(D36(

==0(!9$,#3( ( ( /)))(

=@0(!9$,#3>#3:";?( 63&!("4.3,$(.3,#(.3(

=A0(73#'-!$>#3:";?( !9$,#3>!$#'(:?.5#"7(?$&(

=+0(,-.( ( ( /(

=C0(:-#'( ( ( #?:#'(566(7-&!3#(/))(:-#'*'3-,Q39$(":(!?'$(H(?(K("H("?("K(DH(D?(DK(

( ( ( ( P(;H(;?(;K(3#$45H(3#$45?(3#$45K(,5:"-&(

=E0(,-.( ( ( +)))(-'!3(

=F0(-.;"H( ( ( ".&(

=I0(:-#'>#3:";?( ( #?:#'($D$,?(/))))(

@)0(,-.( ( ( /+)))(-'!3(

@/0(;"H(( ( ( 739$(566(739$&"D$(I*C$%F()*)/(A*)R%+()*=+(/(

@=0(;"H(( ( ( @(566(4,5D"!?(I*F/(79-!$(@+*)(

@@0(:-#'>#3:";?( ( #?:#'($D$,?(/))(

@A0(,-.( ( ( @+)))(-'!3(

(

Page 66: Vinodh Veda Chalam

50

5.5.2 Visualisation

The post processing stage consists of two separate steps. First, is to extract the snapshot of the simulation from the dump files created by LAMMPS. The dump file contains the energy of each particle for every specified frame. It is a tough to read the information from the dump file manually. The pizza.py toolkit provides dummy.py tool for this purpose. The dummy tool reads the LAMMPS dump file and stores their contents as snapshots with 2D arrays of atom attributes, which can be accessed and manipulated as required. It is now possible to read the snapshots and convert them to VTK format. The vtk.py tool is used for this purpose. It reads the LAMMPS snapshots and converts them to VTK format used by various visualisation packages. The VTK files are visualised using Paraview. Below is the usage of the tools: /* Load pizza.py */ python -i /home/gran_pizza_17Aug10/src/pizza.py /* Create snapshots.d is the dump object containing particle co-ordinates. */ d = dump(“dump_filename”) /* Convert the snapshots to vtk format */ v = vtk(d) /* Write the snapshots to imag0.vtk,imag1.vtk, etc */ v.manyGran()

5.5.3 LAMMPS Simulation Results and Discussions

A representative snapshot of the simulation is shown in figure 5.1 to 5.3 taken at various time intervals. In this simulation, a stream of 10,000 granular snow particles is poured from top of a 3D box. The particles are colored by magnitude of the normal velocity acting on the particle from slowest to fastest. The particles in red are with high normal velocity while the particles in blue are with the lowest.

In the simulation, there are three qualitatively distinct regimes determined by the height of the pour/simulation box, the angle of inclination of the lower surface with respect to the direction of gravity and the inter-particle cohesive stresses determined by the Hamaker constant value. When the angle of inclination is small (i.e., below the angle of response) or for short cohesive force (i.e., small value for Hamaker constant), the granular system is stationary, that is, after the pour completes the particles settle on the lower surface and do not flow. Below the angle of response for a given cohesive energy, the particles dissipate more energy than the gravitational potential energy and hence the particles do not flow. In avalanche terms this is referred as no flow regime. On the other hand, at much larger angles, the energy dissipation of the particles is much lower than the gravitational potential energy and hence the particles continue to acceleration along the axis of inclination resulting in an unstable regime. For angles intermediate between these two values, particles flow steady and this referred to as steady state behaviour. The value of Hamaker constant for steady flow is 9.6x10-6 and the angle of inclination is

!

35!.

Page 67: Vinodh Veda Chalam

51

Figure 5.1: LAMMPS Simulation Screenshot 1

Figure 5.2: LAMMPS Simulation Screenshot 2

Page 68: Vinodh Veda Chalam

52

Figure 5.3: LAMMPS Simulation Screenshot 3

This cohesion model is based on Hamaker theory which produces a force singularity that must be avoided when the surface of particles overlap. It models particles - they do stick - because they are cohesive - but they don't bond/sinter/join crystallographically as a grain boundary in the way snow particles does. For this reason, and after viewing the output and having discussions with Dr. Jane Blackford, it was decided to consider the cohesive model provided with LIGGGHTS.

5.6 LIGGGHTS Simulation

LIGGGHTS simulation is based on the JKR cohesion model (section 4.5.2), which predicts greater cohesion between particles. The material parameters discussed in section 3.4 are used in LIGGGHTS simulation to determine the pairwise potential of particles. The actual values used in the simulation are given in section 5.6.1. Since it possible to import complex CAD geometry as granular walls in LIGGGHTS, simple chute geometry is used in the simulation. Particles are poured from the top of the chute and they slide down the inclined surface. Details are given in section 5.6.2 and 5.6.3.

5.6.1 Material parameter values

Table 5.1 summarised the values for the material properties used in the simulation. All the values used in the simulation are for a constant temperature (

!

5!C).

Page 69: Vinodh Veda Chalam

53

Parameter Unit Value Reference Shape No unit Spherical Diameter mm 5 Density Kg/m3 500 Armstrong et al., 2009 Coefficient of Restitution No unit 0.89 Higa, et al., 1995 Coefficient of Friction No unit 0.1 Schaerer, n.d Young’s Modulus Pascal 5e6 Godbout et al., 2000 Poisson’s Ratio No unit 0.32 Sinha, 1987 Gravity Acceleration m/s2 9.81

Table 5.1: Material Parameters

5.6.2 LIGGGHTS Implementation Details

The input file used for LIGGGHTS is given in Algorithm 2. Since LIGGGHTS is based on LAMMPS, it also follows the same implementation style of LAMMPS. Given below are the key implementation features of LIGGGHTS. 1. Just like LAMMPS simulation, the first step is to define the simulation domain with granular particles (line numbers 1 to line 3). In this simulation, all three dimensions are defined as non-periodic so that the particles do not interact across the boundary and position of the face is fixed. The inter-processor communication is turned on to communicate velocity information with particles in the ghost particles. The ghost particles store this velocity information since it is need for calculating the pairwise potential. This is implemented in line 5 of the algorithm. 2. The material properties that are required to calculate the stiffness and damping coefficients and other pair potentials for the simulation are set using the fix style property/global in lines from 11 to 16. This fix style is not available in LAMMPS. This command fixes the global properties to be accessed by other fix or pair styles. The variables Young’ s modulus, Poison’s ratio etc used in the fix are standard C++ variables. 3. The pair potential is defined using the pair_style command (line 17). This simulation uses Hertzian potential forces for interaction between particles. The mesh/gran fix style (line 22) allows importing of complex wall geometry for granular simulations from CAD by means of ASCII STL files or legacy ASCII VTK files. It is possible to apply offset or scaling to the geometry. For this simulation, the geometry is scaled by a factor of 1.0. 4. The imported geometry is used to bind the simulation domain of the granular system with a frictional wall. All particles in the group interact with the wall when they are close enough to touch it. The equation for the force between the wall and particles touching it is the same as the Hertzian potential defined using the pair_style command.

Page 70: Vinodh Veda Chalam

54

5. The particles are poured into the simulation domain using the pour command. This command is same as LAMMPS simulation with some improvements to it. The particles that are generated in a way that they are now completely located within the insertion region, it is now possible to use the whole simulation box as insertion region.

Algorithm 2: Input Script for LIGGGHTS Simulation

((/0(5!3#>&!?6$( ( 4,5.-65,(

((=0(5!3#>#3:";?( ( #5'(5,,5?(

((@0(B3-.:5,?( ( ;(;(;(

((A0(.$<!3.( ( ( 3;;(

((+0(73##-."75!$( ( &".46$(D$6(?$&(

((C0(-."!&( ( ( &"(

((E0(,$4"3.( ( ( ,$4(B637G()*))(/*/E(%)*@()*II(%)*/(/*/=(-."!&(B3H(

((F0(7,$5!$>B3H( ( /(,$4(

((I0(.$"49B3,(( ( )*))=F(B".(

/)0(.$"49>#3:";?( ( :$65?()(

//0(;"H(( ( ( #/(566(',3'$,!?J463B56(?3-.4&S3:-6-&('$,5!3#!?'$(+*$C(

/=0(;"H(( ( ( #=(566(',3'$,!?J463B56('3"&&3.&T5!"3('$,5!3#!?'$()*@=+(

/@0(;"H(( ( ( ( #@(566(',3'$,!?J463B56(73$;;"7"$.!T$&!"!-!"3.('$,5!3#!?'$'5",(P(

( ( ( ( ( )*FI(

/A0(;"H(( ( ( #A(566(',3'$,!?J463B56(73$;;"7"$.!U,"7!"3.('$,5!3#!?'$'5",(/()*/(

/+0(;"H(( ( ( #+(566(',3'$,!?J463B56(795,57!$,"&!"7V$637"!?(&7565,(/))*(

/C0(;"H(( ( ( #C(566(',3'$,!?J463B56(739$&"3.R.$,4?W$.&"!?('$,5!3#!?'$'5",(P((

( ( ( ( /(/=))))(

/E0('5",>&!?6$(( ( 4,5.J9$,!KJ9"&!3,?(/(/((1X$,!K"5.(<"!9(739$&"3.(

/F0('5",>73$;;( ( O(O(

=)0(!"#$%&!$'( ( )*))))+(

=/0(;"H(( ( ( 4,5D"(566(4,5D"!?(I*F/(D$7!3,()*)()*)(%/*)(

==0(;"H(( ( ( 75:(566(#$&9J4,5.(#?!".?#$&9*&!6(/(/*)()*()*()*()*()*()*(

=@0(;"H(( ( ( 4,5.<566&(566(<566J4,5.J9$,!KJ9"&!3,?(/(/(#$&9J4,5.(/(75:(

=A0(4,3-'( ( ( .D$>4,3-'(,$4"3.(,$4(

=+0(,$4"3.( ( ( B7(7?6".:$,(K()*=E()*=/()*)C()*+C()*+F(-."!&(B3H(

=C0(;"H(( ( ( (".&(.D$>4,3-'('3-,(+)))))(/(=IAIA(,$4"3.(B7(:"5#(-.";3,#(P(

)*))+(()*))+(:$.&(-.";3,#(+))(+))(,$4"3.(B7(

=E0(;"H(( ( ( ".!$4,(.D$>4,3-'(.D$J&'9$,$(

=F0(73#'-!$(( ( /(566($,3!5!$J&'9$,$(

=I0(!9$,#3>&!?6$( ( 7-&!3#(&!$'(5!3#&(G$(7>/(D36(

@)0(!9$,#3( ( ( A)))(

@/0(!9$,#3>#3:";?( 63&!("4.3,$(.3,#(.3(

@/0(73#'-!$>#3:";?( !9$,#3>!$#'(:?.5#"7(?$&(

@@0(:-#'( ( ( :-#'&!6(566(&!6(/))(:-#'*&!6(

@A0(,-.( ( ( /(

@+0(:-#'( ( ( :#'(566(7-&!3#(+)))(:-#'*79-!$(":(!?'$(!?'$(H(?(K("H("?("K(DH(D?((((((((

DK(;H(;?(;K(3#$45H(3#$45?(3#$45K(,5:"-&((

@C0(-.:-#'(( ( :-#'&!6(

@E0(,-.( ( ( /)))))( -'!3(

@F0(-.;"H( ( ( ".&(

Page 71: Vinodh Veda Chalam

55

5.6.3 LIGGGHTS Simulation Results

Figure 5.4 – 5.6 show the snapshot of the first LIGGGHTS simulation. This simulation uses JKR cohesion model with Hertzian potentials. As seen in the figure, the chute geometry is used as the granular wall. The snow particles are poured from a certain height and they flow down the chute till they reach the end of the granular wall. Since the simulation uses non-periodic boundary conditions, once the particles reach the end of the chute, they disappear and do not re-enter the domain.

In realistic scenario, granular snow avalanches start to flow from a point and they gather mass progressively in a fan-like shape as they flow down the slope. This effect was attempted in this simulation, but couldn’t succeed. LIGGGHTS do not have facility to model such a feature. It was first thought to code a new fix style for LIGGGHTS that simulates this kind of behaviour, but was not carried further due to time constraints. It is possible to use the fix adapt command to change the physical properties of the particles. Diameter of the particles is increased at a certain rate after every certain time-step to create an effect of particle (mass) accumulation – though it is not possible to achieve the fan-like shape of the avalanche flow.

Figure 5.4: LIGGGHTS Simulation Screenshot 1

Page 72: Vinodh Veda Chalam

56

Figure 5.5: LIGGGHTS Simulation Screenshot 2

Figure 5.6: LIGGGHTS Simulation Screenshot 3

Page 73: Vinodh Veda Chalam

57

5.6.4 Improved Chute Geometry

Most of the avalanches do not happen on a regular inclined surface. It will have an initial inclined path where the avalanche is triggered and a run out path. So it was decided to modify the chute geometry accordingly. The cross section of the new chute geometry that is used in the simulation is show in figure 5.1. The chute is divided into three parts: the upper inclined zone, the circular transition zone and the horizontal run-out zone. The inclination angle

!

" of the upper inclined zone is fixed to

!

37!. According to (Perla, 1999), most avalanches run on slopes between 25o and 45o; with the optimal slope angle for an avalanche as 37o. Hence the chute inclination angle was fixed at 37o. The dimensions of the chute are summarised in Table 5.2 (Chiou, 2005).

Chute Detail Value Upper Inclined part, l1 936 mm Transition part, l2 144 mm Horizontal run-out part, l3 835mm Chute width 100 mm Inclination angle,

!

" 37 degree

Table 5.2: Chute Specifications

AutoCAD was used to draw the chute, first, as a solid geometry. It was not possible to directly use the CAD geometry in LIGGGHTS as granular wall. To use the solid geometry in the LIGGGHTS simulation the solid object was converted into a mesh using the gmsh conversion tool. The AutoCAD was exported as ASCII stereolithography (stl) file and it is imported to the gmsh tool to convert the geometry to mesh. It is then used in the simulation as granular wall using the fix command. Refer to section 5.8 for more details on the implementation. Refer to Appendix C for more details on the AutoCAD implementation.

Figure 5.7: Cross section of the chute

Page 74: Vinodh Veda Chalam

58

5.6.5 Improved Simulation Results

Figure 5.8 to 5.10 show the snapshot of the improved LIGGGHTS simulation. The new chute geometry simulates the flow of an avalanche in a more realistic scenario from terrain the perspective. The avalanche terrain is divided into three parts – acceleration zone, steady flow zone, and deceleration zone. The upper inclined part (l1) of the chute corresponds to the acceleration zone, transition part (l2) is the steady flow zone (which is kept very minimal for computational efficiency) and the horizontal run-out part (l3) corresponds to the deceleration zone of an avalanche path. From computational perspective, this improved LIGGHTS simulation requires more time-step to complete (i.e. takes more time) because of the new chute geometry.

Figure 5.8: Improved LIGGGHTS Simulation Screenshot 1

Page 75: Vinodh Veda Chalam

59

Figure 5.9: Improved LIGGGHTS Simulation Screenshot 2

Figure 5.10: Improved LIGGGHTS Simulation Screenshot 3

Page 76: Vinodh Veda Chalam

60

5.7 Summary

The implementation details of the cohesion model are presented in detail in this Chapter along with the results of the simulation. From the results it was clear that the Hamaker theory based cohesion model is not suitable for modelling snow particles and they fail to capture the sintering effect of snow particles. The JKR based cohesion model is more suitable for the simulation.

Page 77: Vinodh Veda Chalam

61

Chapter 6

Benchmarking

This Chapter lists the benchmarks of the presented implementation and begins with the most interesting property: Speed-up. Afterwards, the performance is investigated for multiple processors, where the communication overheads start to increase. The speedup and scaling is also shown for the different parts of the program. All benchmarks are carried with the code based on the neighbour list approach.

Page 78: Vinodh Veda Chalam

62

6.1 Cost of Accessing HECToR

It was decided not to use Ness to study the performance of the code for two reasons. First, Ness does not guarantee exclusive access to the nodes. There might be variations in the timings by factors like system noise and usage of resources by other jobs running on the same node at same time. Second, it is not well suited for scalability test as it has only 32 nodes. HECToR is very massive machine and is very fast in calculations. Hence, it was decided to use HECToR for the performance and scalability tests. Due to the limited amount of time and budget on HECToR (MSc Students resource group), it was decided to limit the number of nodes to 10 and numbers of particles to 1 million for all the tests.

On HECToR, 1 core hour is set to 7.5 AUs (hector.ac.uk, 2011). The total number of AUs used is calculated as:

Total AUs = Number of Processors * Run time in hours * 7.5 (6.1)

For a simulation with 1 million particles running on 2400 processors, say for 1 hour, will consume: 2400 * 1 * 7.5 = 18,000 AUs. This number will increase with the increase in the number of processors and number of particles.

The students resource group on HECToR has limited time allocated to it and is used by all other student of the MSc, a new resource group was created in HECToR exclusive for this project with 30,000 AUs moved from the student project reserve. All the tests carried out for this project used the new resource group that is created.

6.2 Performance Benchmarks

In this section, the performance of the model is measured. Figure 6.1 shows the performance of the simulation using LIGGGHTS measured on HECToR. Results are shown for the simulation with 75,000 particles and 1000,000 particles. The performance is measured in steps per second to allow two simulation sizes to be compared in the same plot. The timings are taken from the ‘loop time’ reported by the code which is the total time spent in the main MD loop. In order to reduce/avoid the start-up cost and the cost due to any instability, two runs of each system were performed and average values are recorded. As expected, as the number of particles increases the number of steps computed per second decreases. This is likely due to the decrease in problem size and lack of I/O.

Page 79: Vinodh Veda Chalam

63

Figure 6.1: Performance of the model. System size ranges from 75,000 particles to 1000,000 particles

Figure 6.2: Benchmark of different system size on 480 and 960 processors

<!

=<<!

><<<!

>=<<!

?<<<!

?=<<!

@<<<!

<! =<<! ><<<! >=<<! ?<<<! ?=<<! @<<<!

!"#$%#&

'()"*+,-"./,")0*

12&3"#*%$*!#%)",,%#,*

45&26'-5%(*!"#$%#&'()"*

A=B!,".#05)*'!>=<B!,".#05)*'!?=<B!,".#05)*'!=<<B!,".#05)*'!><<<B!,".#05)*'!

=<<!

==<!

C<<!

C=<!

A<<!

A=<!

D<<!

D=<!

E<<!

E=<!

><<<!

>! ?! @! F! =!

72(*85&"*+4")%(9,0*

*******12&3"#*%$*!'#-5)6",*******************************************************************************+:;<=>>>?*@;:=>?>>>?*A;@=>?>>>?*B;=>>?>>>?*=;:>>>?>>>*

.'#-5)6",0*

4)'6"9*,5C"*3"()D&'#E*

FD<!7.$5*''$.'!

EC<!7.$5*''$.'!

Page 80: Vinodh Veda Chalam

64

From Figure 6.2 it is obvious that the run time decreases with an increase in the number of processors. This means that for higher number of particles and with a large number of processors, the simulation performs better.

Figure 6.3 and 6.4 show the execution time and speed-up of the model respectively. Since in HECToR each Node has 24 cores, it was decided to calculate the speedups relative to 24 processor loop times. From figure 6.2, it is clear that the execution time continues to reduce up to 2400 processors on HECToR. For figure 6.3, it is evident that the system scales linearly with the increase in the number of processors. It is also clear that the speed-up of the simulation increases with the increase in the number of particles.

Figure 6.3: Execution time of 75,000 particles and 1000,000 particles on different number of processors

<!

=<<!

><<<!

>=<<!

?<<<!

?=<<!

<! ?F<! FD<! A?<! EC<! >?<<!>FF<!>CD<!>E?<!?>C<!?F<<!

72(*85&"*+4")%(9,0*

12&3"#*%$*!#%)",,%#,*

FG")2-5%(*85&"**

><<<B!7".#05)*'!

A=B!7".#05)*'!

Page 81: Vinodh Veda Chalam

65

Figure 6.4: Speed up of 75,000 particles and 1000,000 particles on different number of processors

6.3 Performance per time-step

The amount of time taken per time-step in the entire simulation time is studied. Previous studies have shown that that the timings vary over the first hundred time-step. The time taken per time-step is analysed for all the benchmarking runs that were discussed in previous section. Figure 6.4 shows the results for the 40 node-75,000 benchmark run over 500 time-steps on HECToR.

Figure 6.5: Comparison of time taken per time-step

<!

>!

?!

@!

F!

=!

C!

<! ?F<! FD<! A?<! EC<! >?<<! >FF<! >CD<! >E?<! ?>C<! ?F<<!

4.""9*2.*+:@>*!#%)",,%#,0*

12&3"#*%$*!#%)",,%#,*

4.""9*H.*

A=B!7".#05)*'!

><<<B!7".#05)*'!

<!?=!=<!A=!><<!>?=!>=<!>A=!?<<!

<! =<! ><<! >=<! ?<<! ?=<! @<<! @=<! F<<! F=<! =<<!

8I&"*+4")%(9,0*

85&",-".*

!"#$%#&'()"*."#*-5&",-".*

Page 82: Vinodh Veda Chalam

66

The results show that the performance is almost consistent for the first 100 time-step and there is a very little deviation in 250 and 500 time-steps. The reason for this minute deviation is that this is the time where a small amount of data is written to the output file.

6.4 Performance Comparison - Cohesion and Non-Cohesion

It is interesting to compare the run time of the simulation model with cohesion and without cohesion to understand the impact of cohesion on run time of the simulation. The simulation model with cohesion between particles takes 15% more time to execute than the model without cohesion of the same system size. Due to cohesion forces, particles interact with large number of neighbour particles in the system and hence the simulation takes more time to compute the forces between particles than the system without cohesion.

Figure 6.6: Comparison of the model run time with cohesion and without cohesion for system size 10000 particles run on 24 processors on HECToR

6.5 Summary

A number of timing results have been presented and analysed. Considering all the results, the simulation appears to give the good performance and good scaling. The scalability of the code can be improved by increasing the problem size. The results obtained have been run only two times and are fully reproducible on HECToR.

>><!

>>=!

>?<!

>?=!

>@<!

>@=!

>F<!

>F=!

G$-*'0$2! H$IG$-*'0$2!

72(*85&"*+4")%(9,0*

45&26'-5%(*J%9"6*

72(*85&"*K,*45&26'-5%(*J%9"6*

Page 83: Vinodh Veda Chalam

67

Chapter 7

Profiling and Performance Analysis

Profiling helps to identify which parts of the program is taking the most of the execution time. This helps in explaining the performance bottlenecks of the code on a particular system. The traditional way of conducting performance analysis is by Program counter sampling - interrupting the program at regular intervals and recording the timing of the currently executed instruction. By this it is possible to compute the relative amount of time spent in each procedure. There are a number of performance analysis tools available to help programmers and optimise their applications. These tools range from source code profilers to sophisticated tracers for analysis of communication and analysis of the memory system or a combination of the two. The sole purpose of the performance tools is to help developers to identify whether or not the application is running efficiently on the computer resources available. Performance of simulation is analysed in this chapter in terms of MPI functions and user defined functions.

Page 84: Vinodh Veda Chalam

68

7.1 Description about profiling tools available

There are many performance analysis tools installed on HECToR that includes Totalview, CrayPAT, TAU and Scalasca. For this thesis, Cray Performance Analysis Tools (CrayPAT) is used to analyse the performance of the programs. CrayPAT provides an integrated infrastructure for a variety of profiling experiments including analysis of computation, communication, I/O and memory utilization and hardware counter analysis. It supports all programming models. Currently CrayPAT is centrally installed on HECToR and is working correctly. CrayPAT typically involves five-phase cycle, which consists of:

1. Program Instrumentation 2. Data Measurement 3. Analysis of the performance data 4. Presentation of the captured data 5. Optimization of the program

CrayPAT consists of two components – the CrayPAT performance collector and the Cray Apprentice performance analyser. CrayPAT is the data capture tool that is used to prepare the program for performance analysis experiment. Cray Apprentice is a post processing data visualization tool that is used to further explore and study the resulting captured data.

Here is the overview of how to use CrayPAT with APA on HECToR.

1. On HECToR, all the performance analysis related tools are merged into one module called perftools. The first step is to load the CrayPAT module using the!module load perftools!command.(

2. (Compile and link the application. 3. The pat_build -Oapa [exe_name] command is used to instrument

the program. This inserts the instructions needed for analysis of the program at various points in the program. This will create the instrumented executable exe_name+pat

4. This instrumented executable is executed as normal using the aprun command. This generated a series of data files of the form: exe_name+pat+PID.xf

5. The pat_report command can be used to generate the .apa file using the experimental data file that is generated in Step 4.

6. Build the .apa file using the pat_build to generate new-instrumented executable for the tracing experiment.

7. Run the new executable using the aprun command. 8. Generate the new report using the pat_report command.

Page 85: Vinodh Veda Chalam

69

7.2 Profiling using CrayPAT

Using Craypat, statistics for three group functions namely MPI functions, USER functions and MPI_SYNC functions are obtained. MPI_SYNC is used in the trace wrapper for each collective subroutine to measure the time spent waiting at the barrier call before entering the subroutine. The MPI_SYNC statistics can be a good indication of load imbalance. The time percentage of each group is shown in figure 7.1

Figure 7.1: Profile by function group

With processor counts from 960 to 2400, it can be seen that the time spent in MPI calls increases from 28.7% to 33.1% while the time spent in user functions drop from 45.5% to 24.9%. The time spent in MPI_SYNC increase from 25.7% to 42.0%.

7.2.1 Profiling - User functions

Figure 7.2 gives the top time-consuming user function. According to the CrayPAT tracing results, the speed-up of the bin_atoms functions is about 3.5% on 2400 processors comparing with 960 processors.

<!

<J?!

<JF!

<JC!

<JD!

>!

EC<! >E?<! ?F<<!

!"#)"(-'L"*%$*-%-'6*-5&"*

12&3"#*%$*!#%)",,%#,*

!#%M56"*3N*$2()-5%(*L#%2.*

K7L&M(25!

K7L!

N'*.!

Page 86: Vinodh Veda Chalam

70

Figure 7.2: Top time consuming user functions got from CrayPAT

7.2.2 Profiling - Percentage Time of MPI Calls

The most time consuming among the MPI calls is MPI_Allreduce. This MPI collective operation does not scale well and it is expected behaviour.

Figure 7.3: Top time consuming MPI functions

<!<J<=!<J>!<J>=!<J?!<J?=!<J@!<J@=!<JF!<JF=!<J=!

EC<! >E?<! ?F<<!

!"#)"(-'L"*%$*-%-'6*-5&"*

12&3"#*%$*!#%)",,%#,*

8%.*-5&"*O%(,2&5(L*2,"#*$2()-5%(,*

$3*.)",'&12*".&0!

:01O.0H*04-)0'#PP5-*5Q&#.0!

H*04-R$.PPR02&"#$%'!

%"02!

<!

<J<=!

<J>!

<J>=!

<J?!

<J?=!

<J@!

<J@=!

EC<! >E?<! ?F<<!

!"#)"(-'L"*%$*-%-'6*-5&"*

12&3"#*%$*!#%)",,%#,*

8%.*-5&"*O%(,2&5(L*J!I*P2()-5%(,*

K7L&M*2S!

K7L&"))4"#-*.&!

K7L&T"0#"2(!

K7L&U)).*S65*!

Page 87: Vinodh Veda Chalam

71

However on the XE6, the scaling is relatively good from 1920 to 2400 processors in figure 7.3. From the call graph generated by CrayPAT, it was clear that the MPI_Allreduce bottleneck is in pre_force function of FixTriNeighlist class.

Figure 7.4: Top time consuming MPI_SYNC functions

In figure 7.4, it is clear that MPI_Allreduce accounts for the most parts of the waiting time spent in the barrier. It is worth checking the possibility of combining together several of the MPI_Allreduce. Compared to runs on 960 and 1920 processors MPI_SCAN and MPI_Bcast calls are becoming more significant on 2400 processors.

7.2.3 Profiling – Messages/Sizes

The main advantage of LAMMPS is that most of the messages are small and medium sized messages. Mostly messages of size 64B and 64KB are used. From figure 7.5, it is clear that the number of messages increases dramatically with the number of nodes.

<!

<J<=!

<J>!

<J>=!

<J?!

<J?=!

<J@!

<J@=!

<JF!

<JF=!

EC<! >E?<! ?F<<!

!"#)"(-'L"*%$*-%-'6*-5&"*

12&3"#*%$*!#%)",,%#,*

8%.*-5&"*)%(,2&5(L*J!IQ4R1O*P2()-5%(,*

%,0&'5"2&V'(25W!

%,0&R5"'#&V'(25W!

K7L&X5"'#V'(25W!

K7L&U))4"#-*.V'(25W!

%,0&"))4"#-*.&V'(25W!

%,0&")).*S65*&V'(25W!

K7L&U)).*S65*V'(25W!

Page 88: Vinodh Veda Chalam

72

Figure 7.5: Profile by message sizes

7.2.4 Profiling – Memory Usage

LAMMPS prints an estimate for its memory use but that is a lower limit, since it only accounts for large memory allocations. Currently LAMMPS does not output more accurate information on memory usage – this feature would have to be added to LAMMPS. This may not be easy to add, since LAMMPS may reserve memory for use (via malloc()), but not actually use it due to your specific selection of features, that is it will affect address space, but not (physical) memory used. On the other hand, allocated and used address space may be paged out to swap space. If LAMMPS is run in parallel over an RDMA architecture (e.g. infiniband or myrinet), then things get even more complicated, since it may have "pinned" memory that is backing device memory, but may be accessed by multiple processes at the same time and may have regular shared memory. To make things even more complicated, some MPI implementations have a "lazy" memory allocation that keeps allocated memory blocks around for later use, since it is likely that it is possible that same size messages are sent multiple times, but those could be freed, if there is no more free address space or based on some heuristics.

7.3 Timing O/P directly by the code and its description

LIGGGHTS code outputs timings from the main functions. Using these output timings a stacked bar chart showing how the time spent in each section of the code varies with processor count is plotted. Figure 7.1 shows such a chart for 1000K particles simulation. The time spent on each section of the code is given as the percentage of the total loop time. Table 7.1 is the summary of the timings reported by LIGGGHTS and their description.

<!

=<<<<!

><<<<<!

>=<<<<!

?<<<<<!

?=<<<<!

@<<<<<!

@=<<<<!

F<<<<<!

12&3"#*%$*J

",,'L",*

J",,'L"*45C"*+SN-",0*

J",,'L"*45C"*

><!

?<!

@<!

F<!

C<!

Page 89: Vinodh Veda Chalam

73

Name Description Pair Time taken to compute the pairwise interactions between the atoms Neigh Time taken to compute new neighbour lists for each atom Comm Time spent in communications Outpt Time taken to output the restart position, atom position, velocity and force files Main Time taken for the main MD loop to execute minus the sum of the above times

Table 7.1: LIGGGHTS timing output

For figure 7.6 it is clear that as the processor count increases, the percentage of time spent in Outpt, Comm begins to dominate. These phases involve MPI_Irecv/MPI_Send and MPI_Allreduce and hence this is not unexpected behaviour. There is a significant decrease in the Pair and Neigh timings. This is expected because, as the processors increase the number of particles each processors handles decreases and hence the reduction in these timings.

Figure 7.6: LIGGGHTS timing output

<!

><!

?<!

@<!

F<!

=<!

C<!

A<!

D<!

E<!

><<!

>><!

>?<! FD<! EC<! >FF<! >E?<!

!"#)"(-'L"*-5&"*-'E"(*+T0*

!#%)",,%#,*

UIVVVW84*85&5(L*X2-.2-*

K"02!

7"0.!

H*04-!

G$%%!

Y6#,#!

Page 90: Vinodh Veda Chalam

74

7.4 Summary

Among the MPI functions MPI_Send, MPI_Allreduce and MPI_Waitany are the mostly used MPI calls. Majority of the communication time is spent on MPI_Send and MPI_Allreduce, which is the reason for increase in the communication time on higher processors. Mostly messages of low size are used in the communications.

Page 91: Vinodh Veda Chalam

75

Chapter 8

Conclusions and Future Work

8.1.1 Summary

The main aim of the project was to demonstrate via a 3D model the flow of very large number of snow particles under the influence of gravity to allow for in-elastic collisions and cohesion using high performance computing and analyse the scalability and performance of the model. Understanding how particles behave under collision and cohesion will help in modelling granular snow avalanches. From industrial point of view, it will help tire manufactures to know how snow interacts with tires and in snow studies, the model can be used to study the effects of micro-penetrometer in snow.

Choosing the appropriate methodology to model the simulation was a big challenge. Existing literature on snow particle modelling and snow avalanche modelling was studied to understand the snow particle modelling. For the study it was understood that the by traditional continuum approach only homogeneous materials can be represented effectively. It was clear that discrete approach is the best finite difference method for predicting the motion of individual and independently moving particles and hence it was decided to implement the model using DEM.

It was also necessary to study the background of MD-DEM approach and the mechanics of snow particles. Understanding the physics and mathematics behind DEM approach helped in choosing the most appropriate technique and algorithms for the model in terms of efficiency and accuracy. For example: choosing the Velocity-verlet integration, PBC approach. This will also greatly help in further optimising the model. The physical constants governing the phenomenon of flow snow particles such as density, Young’s modulus, friction, coefficient of restitution are investigated more carefully. These properties greatly affect the flow of particles (in real) and the simulation as well. The choice of these parameters and the time-step for the simulation are carefully considered.

Typically, DEM simulations involve large number of particles and hence require a very long time to execute and are thus computationally expensive. Also the number of particles in an avalanche is huge. The challenge was how to accommodate the size of

Page 92: Vinodh Veda Chalam

76

the system, while meeting the need for speedy results that are computationally affordable to manage and run. Use of high performance computing was the obvious choice and hence all the simulation runs for the model were done on HECToR. LAMMPS/LIGGGHTS was considered for the model because of two reasons: it can effectively model granular particles and it is good MPI coupling interface for parallel implementation.

Benchmarking the model helped in understanding scalability of the model. It is very important that the model scales very well in-order to handle large system sizes. It was found that the model scales very well. With an increase in the number of processors, speedup achieved is quite good, for example on 480 processors for 75K particles, the speedup was 1.99, while on 960 processors for same number of particles speedup achieved was 2.52.

Due to the budget constraint on HECToR, the particle size for this thesis was limited to 1 million. It was thought it would be interesting to extrapolate the scaled size benchmark shown in figure 6.2 for a very huge number of particles. Least squares regression analysis was carried to predict the runtime for more than 1 million particles. Least squares technique was applied on the data to arrive at the linear regression model. Once the regression model for the data is derived, it was easy to extrapolate for the data for very large number of particles. Figure 8.1 shows the plot of the regression analysis. X-axis represents the Number of particles (in ten-thousands) while the run time is in Y-axis. The actual data was extended to 108 particles. The analysis shows, a system with 108 particles run on 960 processors will take approximately 4.4 hours to complete which is equivalent to 31,700 AUs on HECToR. On 480 processors, it will take approximately 5.5 hours and require 20,000 AUs on HECToR.

Page 93: Vinodh Veda Chalam

77

Figure 8.1: Linear Regression Analysis of Scaled size benchmark

To understand the performance of the model and to identify areas for optimisation, the mode was profiled. Vampir was first considered for profiling but technical difficulties faced in using Vampir on HECToR prevented this; CrayPAT was used to profile the model. Top time-consuming MPI functions and User functions are identified.

8.1.2 Recommendations for future Research

HPC Perspective – Suggestions for Optimisation

The amount of time spent on MPI calls - particularly in point-to-point communications and collective operations increases with increase in the number of processors. As the processors need to communicate between each other at every time step, there is less possibility to reduce the bottleneck due to point-to-point communications. But there is scope to reduce the bottleneck due to collective operations.

For the user functions, the time taken by the main computational loop increases with the increase in number of processors. Currently the main class methods of LAMMPS/LIGGGHTS code are not vectorised. It might be worth converting arrays to vectors, to see how it improves the performance of the simulation. However, irregular memory access pattern of LAMMPS might be a hindrance to vector code.

<!

=<<<!

><<<<!

>=<<<!

?<<<<!

?=<<<!

<! ><<<! ?<<<! @<<<! F<<<! =<<<! C<<<! A<<<! D<<<! E<<<!><<<<!

72(-5&"*+,")%(9,0*

12&3"#*%$*!'#-5)6",*+5(*-"(Y-D%2,'(90*

4)'6"9*45C"*S"()D&'#E*Y*FG-"(9"9*

FD<!7.$5*''$.'!

EC<!7.$5*''$.'!

Page 94: Vinodh Veda Chalam

78

Though there is a bottleneck on the time spent on point-to-point communications; in reality the actual time spent on this seems reasonable. The bottleneck could be because; with the increase in number of processors the workload of each process has reduced such that the amount of computation done by each processor is too small before communicating it to its neighbours. Investigating the load balancing and domain decomposition techniques can be help in improving this situation. Currently it is not possible to calculate the memory requirements for each processor for the simulation. Calculating the amount of memory required for each processor will help in understanding the load balance of the simulation.

Simulation Perspective – Improving the model

Most of the granular snow avalanches originate at a point, growing wider as they flow down the slope sweeping more and more snow particles in its descent. This kind of feature is not possible to simulate in LAMMPS/LIGGGHTS. It will be nice to code to make the model look very close and realistic to the real snow avalanche.

In the current model, the density of the particles is fixed. In reality the density of snow particles vary due to several environmental factors like heat. An attempt was made by varying the density simulation. It resulted in irregular pattern in the simulation. It might be an interesting study to analyse the effect of the material parameter values on the simulation.

Changes made to the original work plan and the risk assessment details are presented in Appendix A.

Page 95: Vinodh Veda Chalam

79

References

Ancey, Christophe, 2002. Snow Avalanches, Geomorpholigical Fluid Mechanics.

Beazleyand, D.M.and Lomdahl, P.S., 1994 "Message-PassingMulti- Cell Molecular Dynamics on the Connection Machine 5", Parallel Computing, 20 (2), pp. 173-195.

Campbell, C.S., P. Cleary, and M.A. Hopkins (1995). Long-runout landslides: a study by computer simulation, J. Geophysical Research, 100, B5, 8267-8283

Cray Inc; 2011; “CRAY XE 6”; http://www.cray.com/Products/XE/CrayXE6System.aspx; accessed on 11th August 2011

Chiou, 2005. M.C.: Modelling dry granular avalanches past different obstructs: numerical simulations and laboratory analyses. Ph. D. Technical University Darmstadt, Germany.

D. C. Rapaport, 2004. The Art of Molecular Dynamics Simulation. Cambridge University Press.

Dziugys, A. and Peters, B. 2001, “An approach to simulate the motion of spherical and non- spherical fuel particles in combustion chambers”, Granular Matter, 3, pp.231–265.

EPCC; 2011; “Ness”; http://www.epcc.ed.ac.uk/facilities/ness/; accessed on 4th August 2011

Farhang Radjaï and Frédéric Dubois. Discrete-element Modelling of Granular Materials. Mechanics and Civil Engineering Laboratory (LMGC), University of Montpellier 2, France ISBN: 9781848212602

Feng, J.Q. 2000, “Contact behavior of spherical elastic particles: a computational study of particle adhesion and deformations”, Colloids and Surfaces A: Physicochemical and Engineering Aspects, 172(1-3), pp.175–198.

Page 96: Vinodh Veda Chalam

80

Fierz, C., Armstrong, R.L., Durand, Y., Etchevers, P., Greene, E., McClung, D.M., Nishimura, K., Satyawali, P.K. and Sokratov, S.A., 2009. The International Classification for Seasonal Snow on the Ground. IHP-VII Technical Documents in Hydrology N°83, IACS Contribution N°1, UNESCO-IHP, Paris.

"Fredston and Fresla cite research by Perla (1977) stating that the most frequent angle of slope on which avalanches occur is 37 degrees." Freston, J and Fesler, D., 1999. Snow Sense: Alaska Mountain Saftey CEnter Perla, R , Slab avalanche measurements, Canadian Geotechnical Journal, 1977, 14:(2) 206-213, 10.1139/t77-021 in Freston and Fesler (1999)

F. J. L. Reid and L. A. Smith. Performance and profiling of the LAMMPS code on HPCx, Technical report, HPCx Consortium, May 2005.

Godbout, S, Chenard, L. And Marquis, A., 2000, “Instantaneous Young’s modulus of ice from liquid manure”, Canadian Agricultural Engineering, 42 (2), pp. 6.1-6.14

Hamaker, H.C., 1937. The London–vanderWaals attraction between spherical particles.Phys- ica, 4(10), pp.1058–1072.

H. Hertz, 1882. Uber die beruhrung fester elastischer korper (On the contact of elastic solids). J. Reine Angew. Math., 92, pp.156–171.

Harry F. Jordan and Gita Alaghband, 2003. Fundamentals of Parallel Processing, Chapter 2.7.2 A Simple Performance Model - Amdahls’s Law. Pearson Education, Inc.

Hector UK; 2010; “Architecture details”; http://www.hector.ac.uk/cse/documentation/Phase2b/#arch; accessed on 15th August 2011

Hector UK; 2011; “Cost of Access to hector”; http://www.hector.ac.uk/howcan/admin/costs/index.php; accessed on 5th August 2011

Hector UK; 2011; “Hector Hardware”; http://www.hector.ac.uk/support/documentation/userguide/hardware.php; accessed on 8th August 2011

Higa, M. Arakawa, M. and N. Maeno, 1995. Measurements of restitution coefficients of ice at low temperatures, Institute of Low Temperature Science, Hokkaido University, Kita-ku Kita-Nishi-8, Sapporo 060, Japan, Received 5 May 1995; revised 6 October 1995; accepted 6 October 1995

Hogue, C. and Newland, D. 1994. Efficient computer simulation of moving granular particles. Powder Technology, 78(1), pp.51–66

Jenkins, J.T. and Savage, S.B., 1983. A theory for rapid flow of identical, smooth, mearly elastic spherical particles. J. Fluid Mech. 130, pp.187-202

Page 97: Vinodh Veda Chalam

81

Jensen, R., Edil, T., Bosscher, P., Plesha, M. and Kahla, N. 2001, “Effect of particle shape on interface behaviour of DEM-simulated granular materials”, International Journal of Geomechanics, 1(1), pp.1–19

Lee R. Aarons, Jin Sun, and Sankaran Sundaresan, 2009. Unsteady Shear of Dense Assemblies of Cohesive Granular Materials under Constant Volume Conditions. Department of Chemical Engineering. Princeton University, Princeton, New Jersey 08544

Liggghts.com, 2011, “LIGGGHTS Open Source Discrete Element Method Particle Simulation Code”, http://www.liggghts.com/, accessed on 7th August 2011

P.A. Cundall and O.D.L. Strack, 1979. A discrete numerical model for granular assemblies. Geotechnique, 29(1), pp.47–65.

Paraview.org, 2011;”Paraview”, http://www.paraview.org/, accessed on 9th August 2011

Peter A. Schaerer, n.d, Friction coefficients and speed of flowing avalanches.

P.W. Cleary and M.L. Sawley, 2002. DEM modelling of industrial granular flows: 3D case studies and the effect of particle shape on hopper discharge. Applied Mathematical Modelling, 26, pp.89–111

Raji A.O., Favier J.F., 2004. Model for the deformation in agricultural and food particulate materials under bulk compressive loading using discrete element method. i: Theory, model development and validation, 64, pp. 359-371

Sandia.gov, “LAMMPS Molecular Dynamics Simulator”, http://lammps.sandia.gov/, accessed on 4th August 2011

Sandia.gov, “Pizza.py Toolkit”; http://www.sandia.gov/~sjplimp/pizza.html; accessed on 8th August 2011

Sinha, N.K, 1987, “Effective Poisson's Ratio of Isotropic Ice”, Reprinted from Proceedings of the Sixth International Offshore Mechanics and Arctic Engineering Symposium Houston, TX. March 1-5,1987 Vol. IV, p. 189-195 (IRC Paper No. 1472)

S. J. Plimpton, 1995. "Fast Parallel Algorithms for Short-Range Molecular Dynamics." Journal of Comp Phys, pp. 1-19

Subramani, V. S. 2008. Potential Applications of Nanotechnology for improved performance of Cement based materials. M.S Thesis, University of Arkansas

Wikipedia, 2010; “File:Graupel, Westwood”, http://en.wikipedia.org/wiki/File:Graupel,_Westwood,_MA_2010-02-02.jpg, accessed on 12th August 2011

Page 98: Vinodh Veda Chalam

82

Appendix A Project Management

Z[: \%#E*!6'(*

Overall the project was on schedule without any major delays. The anticipated work plan was submitted along with the project preparation course report. There are some slight deviations to it mainly in the implementation phase. Initially it was thought that the cohesion add-on provided for LAMMPS code would suffice our purpose of modelling cohesive behaviour of snow particles. But then only after the implementation, it was found to be not suitable for modelling snow particles. So LIGGGHTS cohesion model was considered. Installation of LIGGGHTS and the implementation using LIGGGHTS was not in the initial plan. The work plan was modified accordingly to extend the implementation phase. Details are summarised in below table A.1.

Task Planned Start Date

Planned End Date

Actual Start date

Actual End Date

Phase 1 – Background Research 26/01/11 28/02/11 26/01/11 28/02/11 Background reading 26/01/11 02/04/11 26/01/11 02/04/11 Literature Review 07/02/11 28/02/11 07/02/11 28/02/11 Phase 2 – Experimental Setup 01/03/11 11/03/11 01/03/11 30/06/11 Installing LAMMPS/LIGGGHTS 01/03/11 03/03/11 01/03/11 30/06/11 Granular Simulation 03/04/11 03/11/11 03/04/11 03/11/11 Phase 3 – Project Presentation 03/14/11 03/22/11 03/14/11 03/22/11 Phase 4 - Design 01/06/11 15/06/11 01/06/11 18/06/11 Construct the DEM Model 01/06/11 15/06/11 01/06/11 18/06/11 Phase 5 – Implementation 16/06/11 09/08/11 16/06/11 13/08/11 Implementation of the Model 16/06/11 18/07/11 16/06/11 18/07/11 Profiling & Analysis 18/07/11 27/07/11 25/07/11 03/08/11 Benchmarking 28/07/11 09/08/11 03/08/11 13/08/11 Visualization of the simulation 10/08/11 17/08/11 16/06/11 13/08/11 Phase 6 – Improvements 17/08/11 19/08/11 18/07/11 30/07/11 Changes to the model/code 17/08/11 19/08/11 18/07/11 30/07/11 Phase 7 – Final Write-up 15/08/11 08/30/11 06/20/11 08/18/11

********

Page 99: Vinodh Veda Chalam

83

Z[@ 75,E*Z,,",,&"(-*

Below table summaries the risk assessment details. The severity has been categorised as high, medium and low. In addition to the severity, the likelihood of the risk occurrence is also considered and it is categorised as high – more likely to happen to low – less likely to happen

Risk Severity Likelihood Mitigation Plan Status Open bug with the PGI compiler flags for latest version of LAMMPS (January 2011)

High Low

Used GNU compiler instead of PGI compiler and the flags are modified appropriately

Mitigated

Porting Dr. Jin Sun’s cohesive add-on to HECToR

High High

Try to fix the code if it is a minor issue; if not consider using LIGGGHTS

Mitigated

Installing LIGGGHTS on HECToR High Low

LAMMPS makefile was used as a template and a new Makefile for LIGGGHTS was created

Mitigated

The work plan was modified to include the LIGGGHTS implementation. This had an impact on the over all project schedule

High N/A

It was possible to overlap many of the stages and still complete the project on time. For example: Once a cohesion model was developed, It was possible to do code profiling and benchmarking in parallel

Mitigated

Table A.2: Risk Assessment

Page 100: Vinodh Veda Chalam

84

Appendix B Parallel Processing on Ness & HECToR

B.1 Batch script on HECToR #!/bin/bash --login #PBS -N lammps #PBS -v DISPLAY #PBS -l mppwidth=48 #PBS -l mppnppn=24 #PBS -l walltime=00:20:00 #PBS -A d04 # Change to the directory that the job was submitted echo "PBS_O_WORKDIR =" $PBS_O_WORKDIR cd $PBS_O_WORKDIR # Launch the parallel job aprun -n 48 -N 24 lmp_hector < in.chute

Page 101: Vinodh Veda Chalam

85

B.2 Makefile for LAMMPS/LIGGGHTS on HECToR

# HECToR XT4 system SHELL = /bin/sh .SUFFIXES: .cpp .d .IGNORE: # System-specific settings CC = CC CCFLAGS = -O3 -g -DFFT_FFTW – DMPICH_IGNORE_CXX_SEEK DEPFLAGS = -M LINK = CC $(CCFLAGS) USRLIB = -ldfftw SIZE = size # Link rule $(EXE): $(OBJ) $(LINK) $(LINKFLAGS) $(OBJ) $(USRLIB) $(SYSLIB) -o $(EXE) $(SIZE) $(EXE) # Library target lib: $(OBJ) $(ARCHIVE) $(ARFLAGS) $(EXE) $(OBJ) # Compilation rules .cpp.o: $(CC) $(CCFLAGS) -c $< # Individual dependencies $(OBJ): $(INC)

Page 102: Vinodh Veda Chalam

86

B.3 Makefile for LAMMPS/LIGGGHTS on Ness

# Ness system SHELL = /bin/sh .SUFFIXES: .cpp .d .IGNORE: # System-specific settings CC = g++ CCFLAGS = -O3 -DFFT_FFTW -DMPICH_IGNORE_CXX_SEEK – DMPICH_SKIP_MPICXX – I/opt/local/packages/fftw/fftw-2.1.5- gcc/include – I/opt/local/packages/mpich2/1.0.5p4-ch3_sock- gcc4/include DEPFLAGS = -M LINK = g++ $(CCFLAGS) LINKFLAGS = -L/opt/local/packages/fftw/fftw-2.1.5- gcc/lib –L/opt/local/packages/mpich2/1.0.5p4-

ch3_sock-gcc4/lib –L/usr/local/cuda3.2.16/lib64

USRLIB = -lfftw -lmpich -lpthread -lssl SIZE = size # Link rule $(EXE): $(OBJ) $(LINK) $(LINKFLAGS) $(OBJ) $(USRLIB) $(SYSLIB) -o $(EXE) $(SIZE) $(EXE) # Library target lib: $(OBJ) $(ARCHIVE) $(ARFLAGS) $(EXE) $(OBJ) # Compilation rules .cpp.o: $(CC) $(CCFLAGS) -c $< # Individual dependencies $(OBJ): $(INC)

Page 103: Vinodh Veda Chalam

87

Appendix C

AutoCAD Details

A chute was drawn using the AutoCAD 2011. The commands used were line, fillet, pedit, offset, extrude etc. The details/steps followed in drawing the chute are given below.

1. Units: (Dialogue box opens, in the insertion scale $ choose millimeters)

2. Limits: Specify lower left corner or [ON/OFF] <0.0000,0.0000>: 0,0 Specify upper right corner <12.0000,9.0000>: 1.2,0.9

3. Grid: Specify grid spacing(X) or [ON/OFF/Snap/Major/aDaptive/Limits/Follow/Aspect] <0.5000>:0.01

4. ZOOM: Specify corner of window, enter a scale factor (nX or nXP), or [All/Center/Dynamic/Extents/Previous/Scale/Window/Object] <real time>: a

5. Snap: Specify grid spacing(X): 0.01

6. Menu $ View $ viewports $ Click Named (Dialogue Box opens)$New Viewport $ choose Three: Left $ Click O.K.

7. (Click the cursor in the Left Screen.) Menu $ View $ Views $Choose SE Isometric.

8. (Click the cursor in the Right Bottom Screen.) Menu $ View $ Views $Choose Top.

9. (Click the cursor in the Right Top Screen.) Menu $ View $ Views $Choose Front.

10. (Click the cursor in the right top screen.

Page 104: Vinodh Veda Chalam

88

Line: Specify first point: 0.5026,0.2498 (Point C) (Press Function Key F8 for ortho on) Specify next point or [Undo]: <Ortho on>0 .03 (Point D) Specify next point or [Undo]: 0.8 (Point E) Specify next point or [Close/Undo]: Mouse right click $choose cancel.

11. Line: Specify first point: Choose the Point C (Press Function Key F8 for ortho off) Specify next point or [Undo]: <Ortho off> @.03<143(Point B) Specify next point or [Undo]: @.41<143 (Point A) Specify next point or [Close/Undo]: Mouse right click $choose cancel.

12. Zoom: Specify corner of window, enter a scale factor (nX or nXP), or [All/Center/Dynamic/Extents/Previous/Scale/Window/Object] <real time>: a

13. Pan (Move the object in such a way that points B,C,D were visible)

14. Fillet: Select first object or [Undo/Polyline/Radius/Trim/Multiple]: Radius Specify fillet radius <0.0000>: 0.03 Select first object or [Undo/Polyline/Radius/Trim/Multiple]: (Choose the line BC) Select second object or shift-select to apply corner: (Choose the line CD) ( The line BC and CD were transformed to curve)

15. Pedit: Select polyline or [Multiple]: (choose the line AB) Do you want to turn it into one? <Y>:Y Enter an option [Close/Join/Width/Edit vertex/Fit/Spline/Decurve/Ltype gen/Reverse/Undo]: Join Select objects: (Choose all the other lines) Enter an option [Close/Join/Width/Edit vertex/Fit/Spline/Decurve/Ltype gen/Reverse/Undo]: (Mouse right click , choose cancel) (All the lines became a single line)

16. Offset: Specify offset distance or [Through/Erase/Layer] <Through>: 0.001 Select object to offset or [Exit/Undo] <Exit> : (choose the line ABCDE) Specify point on side to offset or [Exit/Multiple/Undo] <Exit>: (Click above the line ABCDE) Select object to offset or [Exit/Undo] <Exit>: Exit. (The line similar to Line ABCDE was created, i.e A’B’C’D’E’)

17. Pan: (Move the object in such a way that Point A was clearly seen)

18. Line: Specify first point: (Choose Point A) Specify next point or [Undo]: (Choose Point A’) Specify next point or [Undo]: (Mouse right click, choose cancel)

Page 105: Vinodh Veda Chalam

89

19. Pan:

(Move the object in such a way that Point E was clearly seen)

20. Line: Specify first point: (Choose Point E) Specify next point or [Undo]: (Choose Point E’) Specify next point or [Undo]: (Mouse right click, choose cancel)

21. Pedit: Select polyline or [Multiple]: (Choose the line AA’) Do you want to turn it into one? <Y>: Y Enter an option [Close/Join/Width/Edit vertex/Fit/Spline/Decurve/Ltype gen/Reverse/Undo]: Join Select objects: (Select all the other lines) Enter an option [Open/Join/Width/Edit vertex/Fit/Spline/Decurve/Ltype gen/Reverse/Undo]: (Mouse right click , choose cancel) (ABCDE and A’B’C’D’E’ became a single line)

22. Extrude: Select objects to extrude: (Choose the Line ABCDEA’B’C’D’E’) Specify height of extrusion or [Direction/Path/Taper angle]:0.41 (Solid Tiny chute was created) Convert Solid geometry to Mesh using AutoCAD

23. Meshsmooth Select objects to convert: (Choose the solid) (Dialogue box opens, choose create mesh $ ok) (Solid was converted to Mesh) (Mouse Left click the object, Dialogue Box opens, change the smoothness to none) To convert AutoCAD file to stl file format

24. (AutoCAD file menu $Export$Other Format $ File Types $ Choose *.stl, Type the file Name) Select solids or watertight meshes: (select the solid or watertight mesh)(stl file was created)