journal of fluids and structures - syracuse...

25
Kernel-independent fast multipole method within the framework of regularized Stokeslets Minghao W. Rostami a,n , Sarah D. Olson b a Department of Mathematics, Syracuse University, 215 Carnegie Building, Syracuse, NY 13244, USA b Department of Mathematical Sciences, Worcester Polytechnic Institute,100 Institute Road, Worcester, MA 01609, USA article info Article history: Received 21 December 2015 Received in revised form 30 June 2016 Accepted 11 July 2016 Available online 28 September 2016 Keywords: Kernel-independent Fast multipole method Regularized Stokeslets Fluidstructure interaction Kirchhoff rod abstract The method of regularized Stokeslets (MRS) uses a radially symmetric blob function of infinite support to smooth point forces and allows for evaluation of the resulting flow field. This is a common method to study swimmers at zero Reynolds number where the Stokeslet is the fundamental solution corresponding to the kernel of the single layer potential. Simulating the collective motion of N micro-swimmers using the MRS results in at least N 2 pair-wise interactions. Efficient simulation of a large number of swimmers in free space is observed with the implementation of the kernel-independent fast multipole method (FMM) for radial basis functions. We illustrate the complexity of the algorithm on a simple test case where we study regularized point forces, showing that the method is of order N. Additionally, we explore accuracy in time for the MRS where the swimmers are modeled as Kirchhoff rods and the kernel-independent FMM is compared to the direct calculation using the standard MRS. Optimal hydrodynamic efficiency is also explored for different configurations of swimmers. & 2016 Elsevier Ltd. All rights reserved. 1. Introduction The collective motion of microorganisms will lead to self-organization into larger scale groups. At high enough density, experiments have observed the formation of vortices of both bacteria and sperm when confined by a surface (Riedel et al., 2005; Wioland et al., 2013). Sperm at high density have also been observed to line up and aggregate to form sperm trains' (Moore et al., 2002) that are 10 times the length of a single sperm and several sperm wide. Other experiments of bacteria have revealed self-organization into veils, vortices, and jets (Cisneros et al., 2011; Mendelson et al., 1999; Thar and Kuhl, 2002). More generally, we can view active matter to include the collective motion of microorganisms, molecular motors, and active colloids (Marchetti et al., 2013; Ramaswamy, 2010). The collective motion of microtubules and kinesin also form asters and vortices (Sanchez et al., 2012; Surrey et al., 2001). It is of great interest to understand biological implications for collective dynamics as well as to understand the role of hydrodynamic interactions. Comprehensive computational models and analysis have been completed to understand the role of hydrodynamic, steric, and chemical interactions on collective motion of a large number of structures (Baskaran and Marchetti, 2009; Hernandez-Ortiz et al., 2005; Hohenegger and Shelley, 2010; Ishikawa et al., 2008; Lushi et al., 2014; Saintillan and Shelley, 2008, 2011; Simha and Ramaswamy, 2002; Wolgemuth, 2008). Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/jfs Journal of Fluids and Structures http://dx.doi.org/10.1016/j.jfluidstructs.2016.07.006 0889-9746/& 2016 Elsevier Ltd. All rights reserved. n Corresponding author. E-mail addresses: [email protected] (M.W. Rostami), [email protected] (S.D. Olson). Journal of Fluids and Structures 67 (2016) 6084

Upload: others

Post on 22-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

Contents lists available at ScienceDirect

Journal of Fluids and Structures

Journal of Fluids and Structures 67 (2016) 60–84

http://d0889-97

n CorrE-m

journal homepage: www.elsevier.com/locate/jfs

Kernel-independent fast multipole method within theframework of regularized Stokeslets

Minghao W. Rostami a,n, Sarah D. Olson b

a Department of Mathematics, Syracuse University, 215 Carnegie Building, Syracuse, NY 13244, USAb Department of Mathematical Sciences, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, USA

a r t i c l e i n f o

Article history:Received 21 December 2015Received in revised form30 June 2016Accepted 11 July 2016Available online 28 September 2016

Keywords:Kernel-independentFast multipole methodRegularized StokesletsFluid–structure interactionKirchhoff rod

x.doi.org/10.1016/j.jfluidstructs.2016.07.00646/& 2016 Elsevier Ltd. All rights reserved.

esponding author.ail addresses: [email protected] (M.W. Ros

a b s t r a c t

The method of regularized Stokeslets (MRS) uses a radially symmetric blob function ofinfinite support to smooth point forces and allows for evaluation of the resulting flowfield. This is a common method to study swimmers at zero Reynolds number where theStokeslet is the fundamental solution corresponding to the kernel of the single layerpotential. Simulating the collective motion of N micro-swimmers using the MRS results inat least N2 pair-wise interactions. Efficient simulation of a large number of swimmers infree space is observed with the implementation of the kernel-independent fast multipolemethod (FMM) for radial basis functions. We illustrate the complexity of the algorithm ona simple test case where we study regularized point forces, showing that the method is oforder N. Additionally, we explore accuracy in time for the MRS where the swimmers aremodeled as Kirchhoff rods and the kernel-independent FMM is compared to the directcalculation using the standard MRS. Optimal hydrodynamic efficiency is also explored fordifferent configurations of swimmers.

& 2016 Elsevier Ltd. All rights reserved.

1. Introduction

The collective motion of microorganisms will lead to self-organization into larger scale groups. At high enough density,experiments have observed the formation of vortices of both bacteria and sperm when confined by a surface (Riedel et al.,2005; Wioland et al., 2013). Sperm at high density have also been observed to line up and aggregate to form ‘sperm trains'(Moore et al., 2002) that are 10 times the length of a single sperm and several sperm wide. Other experiments of bacteriahave revealed self-organization into veils, vortices, and jets (Cisneros et al., 2011; Mendelson et al., 1999; Thar and Kuhl,2002). More generally, we can view active matter to include the collective motion of microorganisms, molecular motors, andactive colloids (Marchetti et al., 2013; Ramaswamy, 2010). The collective motion of microtubules and kinesin also formasters and vortices (Sanchez et al., 2012; Surrey et al., 2001). It is of great interest to understand biological implications forcollective dynamics as well as to understand the role of hydrodynamic interactions. Comprehensive computational modelsand analysis have been completed to understand the role of hydrodynamic, steric, and chemical interactions on collectivemotion of a large number of structures (Baskaran and Marchetti, 2009; Hernandez-Ortiz et al., 2005; Hohenegger andShelley, 2010; Ishikawa et al., 2008; Lushi et al., 2014; Saintillan and Shelley, 2008, 2011; Simha and Ramaswamy, 2002;Wolgemuth, 2008).

tami), [email protected] (S.D. Olson).

Page 2: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–84 61

These examples of active matter are at the scale of zero Reynolds number, where viscous effects dominate. The in-compressible Stokes equations for a Newtonian fluid are given as

μ− Δ + ∇ = ( )pu F, 1a

∇· = ( )u 0, 1b

where μ is the viscosity, p is the pressure at position x, u is the velocity at x, and F is the force exerted on the fluid at x. Whenthere is only one point force fo which is located at position yo, F can be represented as δ( − )f x yoo where δ(·) is the Dirac deltadistribution. The Stokeslet (Green's function) is the fundamental solution to Eqs. (1a) and (1b) for a point force and issingular at the location of the point force. Note that for active matter located at N distinct spatial locations in an infinite fluiddomain, we can write

∑ δ= ( − )=

F f x yj

N

j j1

for N point forces and solve for the fluid flow as a superposition of fundamental solutions.Several fluid–structure interaction methods have been developed to handle point forces or slender elastic structures in a fluid.

The immersed boundary method requires both a Lagrangian and Cartesian grid, which can be computationally expensive (Peskin,2002). With the boundary integral equation method, one can recast the governing equations for smooth objects immersed in aStokesian fluid as integral equations of the boundaries of the objects (Pozrikidis, 1992). This reduces the complexity, solving 2Dsurface integrals instead of computing 3D flows. When close to a boundary, this method will involve discretization and eva-luation of integrals with singular kernels. The approach that we use in this study is the method of regularized Stokeslets (MRS)(Cortez, 2001; Cortez et al., 2005), a Lagrangian numerical algorithm that has been developed to regularize point forces andremove the singularity in the resulting fluid flow. The MRS has been utilized to model spherical particles as well as micro-swimmers and involves a tensor based on regularized Greens functions. A slightly different tensor, the Rotne–Prager–Yamakawa(RPY) tensor (Rotne and Prager, 1969; Yamakawa, 1970) is also regularized and was developed to accurately model interactions ofpolymers, also allowing for implementation of Brownian fluctuations (Liang et al., 2013).

As will be shown in Sections 4 and 5, under the framework of the MRS, computing the velocities at N locations in thefluid caused by N point forces boils down to an N-body problem, the computational complexity of which is ( )N2 if doneexactly. In the current paper, we focus on the efficient numerical solution of this problem for a large N. Fast algorithms forthe summation of a large number of pairwise interactions have been the subject of research activity over the last 40 years;among them are the tree codes (Barnes and Hut, 1986; Phalzner and Gibbon, 1996), the fast multipole method (FMM)(Beatson and Greengard, 1997; Cheng et al., 1999; Greengard and Rokhlin, 1987; Rokhlin, 1985; Tornberg and Greengard,2008; Ying, 2006; Ying et al., 2004) and the panel clustering method (Sauter, 2000; Ying, 1989), whose computationalcomplexities range from ( )N to ( )N Nlog5 . Techniques based on Ewald summation (Beenakker, 1986; Darden et al., 1993;Deserno and Holm, 1998; Ewald, 1921; Toukmaji and Board, 1996) have also been developed for a different problem wherethe interactions between N particles and all their periodic images need to be computed. The FMM, first proposed byGreengard and Rokhlin (1987), achieves optimal complexity ( )N and is considered to be one of the top algorithms of the20th century (Cipra, 2000). The classic FMM requires an analytic factorization of the kernel and has been successfullyapplied to the incompressible Stokes equations (1a) and (1b) where the Stokeslet and Stresslet kernels have been de-termined based on expansions of Green's function for the biharmonic function (Tornberg and Greengard, 2008). However,for many kernels including the ones arising from the MRS, such a factorization can be difficult to find. The kernel-in-dependent FMM (KIFMM) (Ying et al., 2004) was developed with the aim of handling a broader range of problems and onlyrequires a numerical factorization of the kernel. It is applicable to the kernel of any second-order elliptic PDE and has alsobeen extended in Ying (2006) to handle kernels that are radial basis functions. Recently, both the classic FMM and theKIFMM have been extended and studied for the case of the RPY tensor (Liang et al., 2013).

Our main contribution is extending the kernel-independent FMM to solve the resulting fluid flow due to the collective motionof a large number of microorganisms modeled by the MRS. In Section 2, we review the MRS and show how the regularizedkernels using the MRS result in radial basis functions. Next, we summarize the numerical algorithm for the kernel-independentFMM in Section 3. To test this algorithm, in Section 4 we report the computational time of the algorithm as well as explore howits error varies with different parameters. As an application to collective motion, we explore the swimming efficiency of differentconfigurations of swimmers using a Kirchhoff rod model formulation in Section 5. We show that the speed, efficiency and tilt of aswimmer in a large group of swimmers depend highly on their initial placement in the group.

2. Method of regularized Stokeslets

The method of regularized Stokeslets (MRS) has been used to successfully model swimmers or structures that can berepresented as curves or scattered points in space (Cortez, 2001; Cortez et al., 2005). The structures are assumed to beneutrally buoyant and immersed in a viscous, incompressible 3D unbounded fluid at zero Reynolds number. The point forcef (and possibly point torque n) that a structure exerts on the fluid will be regularized or smoothed by a blob function φ( )r

Page 3: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

Fig. 1. Regularization function or blob in Eq. (3) shown for several regularization parameters ε.

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–8462

that is a radially symmetric approximation to a 3D delta distribution satisfying ∫π φ( ) =∞

r r dr4 10

2 . Here, = ∥ − ∥r x yo is thedistance between the location of the point force yo and any point x in the fluid. Two example blob functions are (Nguyen andCortez, 2013; Olson et al., 2013):

( )ψ

πϕ

π( ) =

( + )( ) =

( )−r

rr

re

158 1

,5 2

2.

2r

2 7/2

2

3/2

2

It is easy to see that the following are two families of blob functions parametrized by ε and can be obtained by scaling ψ ( )r orϕ( )r given in Eq. (2) by ε1/ 3 and evaluating them at εr/ :

ψ επ ε

( ) =( + ) ( )ε r

r

158

,3

4

2 2 7/2

( )ϕ

ε

π ε( ) =

( )εε−r

re

5 2

2.

4r

2 2

3/2 5/2 2

The regularization parameter ε > 0 controls the width or spreading of the point force or point torque (Nguyen and Cortez,2013; Olson et al., 2013). The blob in Eq. (3) is shown for several values of ε in Fig. 1.

We wish to note the similarities between the radially symmetric blob functions and radial basis functions. Radial basisfunctions that are often used are Laguerre–Gaussians and generalized inverse multi-quadrics that can be shifted (Fasshauerand Zhang, 2009). The blob in Eq. (4) corresponds to a Laguerre–Gaussian satisfying two continuous moment conditions(Fasshauer and Zhang, 2009; Nguyen and Cortez, 2013). We note that the kernel-independent FMM for radial basis functionspreviously looked at kernels of the form r1/ 2, r1/ , and ε+r1/ 2 2 (Ying, 2006). The blobs under consideration are examplesof a real analytic radial basis function.

We now briefly describe the MRS given a point force fo in an unbounded 3D fluid domain. The singular point force in Eq.(1a) is replaced by a regularized force ψ= ( )ε rF fo where = ∥ − ∥r x y0 as before and ψ ( )ε r is a specified radially symmetric blobfunction as the ones given in Eqs. (3) and (4). For a given ψ ( )ε r , we define the regularized (and radially symmetric) Green'sfunction (for Poisson's equation) and biharmonic function as ψΔ ( ) = ( )ε εG r r and Δ ( ) = ( )ε εB r G r , respectively. After taking thedivergence of Eq. (1a) and using (1b) to simplify, the particular solution for the pressure is given as = ·∇ εp Gfo . The pressurecan now be used to determine the resulting flow due to the point force given as:

μ( ) = ( ·∇)∇ ( − ) − ( − )

( )ε ε

⎡⎣⎢

⎤⎦⎥B Gu x f x y f x y

1,

5o oo o

the regularized Stokeslet. The solution for the pressure and velocity can then be written as follows:

( ) = [ ·( − )]′ ( )

= [ ·( − )] ( )( )

ε⎛⎝⎜

⎞⎠⎟p

G rr

Q rx f x y f x y ,6o o o o

μ μ( ) =

′ ( )− ( ) + [ ·( − )]( − )

″ ( ) − ′ ( )= ( ) + [ ·( − )]( − ) ( )

( )ε

εε ε

⎡⎣⎢

⎛⎝⎜

⎞⎠⎟

⎛⎝⎜

⎡⎣⎢

⎤⎦⎥⎞⎠⎟⎤⎦⎥

⎛⎝⎜

⎞⎠⎟

B rr

G rB r B r

rH r H ru x f f x y x y f f x y x y

1 1.

7o o o oo o o o3 1 2

Page 4: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

Fig. 2. The partition of a 2D domain at three levels.

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–84 63

We emphasize that the above formulas are defined at any ∈ x 3, including y0. The terms Q(r), H1(r) and H2(r) will be radially

symmetric by definition and their exact formwill depend on the specific blob function ψε. In the case of N point forces { } =f j jN

1

located at { } =yj jN

1, Eq. (7) can then be written as

∑μ

( ) = ( ) + [ ·( − )]( − ) ( )( )=

⎛⎝⎜⎜

⎞⎠⎟⎟H r H ru x f f x y x y

1,

8j

N

j j j j j j1

1 2

for = ∥ − ∥r x yj j .In some applications, a swimmer may induce point torques in addition to point forces (Flores et al., 2005; Olson et al.,

2013), which result in fundamental solutions corresponding to Stokeslets and rotlets for the resulting fluid flow. In the caseof a swimmer represented as a Kirchhoff rod (Dill, 1992) with regularized point forces and torques, we canwrite the solutionto the local linear velocity u and angular velocity w1 as:

∑μ

( ) = ( ) + [ ·( − )]( − ) ( ) + [ × ( − )] ( )( )=

⎛⎝⎜⎜

⎞⎠⎟⎟H r H r Q ru x f f x y x y n x y

1 12

,9j

N

j j j j i j j j j j1

1 2

∑μ

( ) = [ × ( − )] ( ) + ( ) + [ ·( − )] ( )( )=

⎛⎝⎜⎜

⎞⎠⎟⎟Q r D r D rw x f x y n n x y

1 12

14

14

,10j

N

j j j j j j j j1

1 2

where { } =nj jN

1 are the point torques, ( ) ( ) ( )Q r H r H r, ,1 2 are as defined in Eqs. (6) and (7), ψ( ) = ( ) − ′ ( )ε εD r r G r r/1 , and

( ) = ( ′ ( ) ) − ( ″ ( ) )ε εD r G r r G r r/ /23 2 . We note that the angular velocity corresponds to regularized rotlets and dipoles and a de-

tailed derivation can be found in Olson et al. (2013).

3. Review of the kernel-independent fast multipole method

Consider the following many-body problem: given Ns source points { } =yj jN

1s that exert forces { } =f j j

N1

s on Ne evaluation points{ } =xi i

N1

e , find the velocities

∑ Φ= ( ) = ( ) = …( )=

i Nu u x x y f, , 1, 2, ,11

i ij

N

i j j e1

s

at the evaluation points. In (11), Φ is the kernel (or Green's function) of an underlying partial differential equation (PDE).When (11) arises from the simulation of fluid–structure interactions using the method of regularized Stokeslets (MRS),{ } =yj j

N1

s are points on the structures that exert forces on the surrounding fluid, and { } =xi iN

1e can include structure points as well

as fluid points that move according to these forces. Typically, { } =xi iN

1e consists of { } =yj j

N1

s and a number of markers in the fluid.

We are particularly interested in the 3D case of (11) where { } =xi iN

1e , { } =f j j

N1

s and { } =yj jN

1s are in 3. In this case, the problem of

computing { } =ui iN

1e can also be viewed as a matrix–vector product

1 The angular velocity ( )w x is a vector whose direction and magnitude specify respectively the axis and speed of rotation of the fluid element located atx . It satisfies the relation ( ) = ∇ × ( )w x u x1

2, i.e., it is half the vorticity at x .

Page 5: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

FB

FB FB FB FB FB FB FB FB

IB IB IB IB IB IB FB FB

IB IB NB IB FB FB

IB IB NB B IB FB FB

IB IB NB NB IB FB FB

IB IB IB IB IB IB FB FB

IB FB FB

FB FB FB FB FB FB FB

NB

NB

NB

NB

IB BIBI IB IB

Fig. 3. ( )B (all the boxes labeled with NB and B itself), ( )B (all the boxes labeled with FB or IB) and ( )B (all the boxes labeled with IB) for a box B on level3.

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–8464

Φ Φ Φ

Φ Φ Φ

Φ Φ Φ

⋮=

( ) ⋯ ( ) ⋯ ( )⋯ ⋯ ⋯ ⋯ ⋯

( ) ⋯ ( ) ⋯ ( )⋯ ⋯ ⋯ ⋯ ⋯

( ) ⋯ ( ) ⋯ ( )

( )

⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥⎥⎥⎥

u

u

u

x y x y x y

x y x y x y

x y x y x y

f

f

f

, , ,

, , ,

, , ,

,

12

i

N

j N

i i j i N

N N j N N

j

N

1 1 1 1 1

1

1

1

e

s

s

e e e s s

Fig. 4. The equivalent surfaces and coronas of a box in 2D and the quadrature points on them. Left plot: the upward equivalent surface (—�—) and thedownward equivalent surface (– –■– –). Right plot: the upward equivalent corona (the region between the two —�— boundaries) and the downwardequivalent corona (the region between the two – –■– – boundaries). A ×6 6 uniform Cartesian grid is used to discretize each equivalent surface or corona.The quadrature points on the equivalent surfaces or coronas are marked by � or ■.

Page 6: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–84 65

where the matrix is ×N N3 3e s. It is obvious that the complexity of computing (11) or (12) is ( )N Ne s , which quickly becomesprohibitive as Ne and Ns grow.

The fast multipole method (FMM) approximates (11) with complexity ( + )N Ne s . The original FMM (Beatson andGreengard, 1997; Cheng et al., 1999; Greengard and Rokhlin, 1987; Rokhlin, 1985; Tornberg and Greengard, 2008) requiresanalytic factorization of the kernel:

∑Φ( ) = ( ) ( )( )=

R Sx y x y, .13n

n n0

Factorization techniques based on Taylor series, Laurent series and spherical harmonics have been used (see Beatson andGreengard, 1997 and the references therein). However, they are problem-specific and can be difficult to find for an arbitrarykernel, such as the ones arising from the MRS. The kernel-independent version of FMM (Ying, 2006; Ying et al., 2004), onthe other hand, factors Φ( )x y, numerically and can be adapted to handle the kernel of any second-order elliptic PDE as wellas a kernel in the form of a radial basis function. Below we give a brief review of this method based on Beatson andGreengard (1997), Ying, (2006) and Ying et al., (2004).

3.1. Hierarchical decomposition of the computational domain

This part is the same for both the classic and kernel-independent FMM and has been described in detail in Beatson andGreengard (1997). Let = [ ] × [ ] × [ ]x x y y z z, , ,min max min max min max be the computational domain that contains all the eva-luation points { } =xi i

N1

e and source points { } =yj jN

1s . FMM builds an octant tree of boxes with being its root node and each level

being a partition of . We call itself the level-0 partition. For = …L 0, 1, 2, , if there is a level-L box that contains morethan s source points where ≥s 1 is a prescribed number, then we create level +L 1 in the tree by subdividing every level-Lbox uniformly into eight child boxes. In the resulting octant tree, there are 8L boxes on the Lth level and each leaf box (i.e., abox without any child) contains no more than s source points. In Fig. 2, we illustrate the first two iterations of this processapplied to a 2D domain. (The main difference when it is applied to a 3D domain is that every box is subdivided uniformlyinto eight instead of four child boxes.) The four boxes B B B B1, 2, 3, and 4 in Fig. 2(b) are the children of and each of themalso has four children as shown in Fig. 2(c). For example, the four boxes labeled with CB1 are the children of B1. Alternatively,we can perform the partition adaptively by refining only the level-L boxes that contain more than s source points instead ofevery one of them (see Beatson and Greengard, 1997; Ying et al., 2004). The adaptive approach is more efficient when thedistribution of the points is not uniform. In this paper, we focus on the uniform partition for simplicity.

For any box B in the octant tree, the boxes that are on the same level as B are classified as follows based on their locationrelative to B:

� The neighborhood of B (denoted by ( )B ): the set of all the boxes that are on the same level as B and share at least onevertex with B. (Therefore, ( )B includes B itself.) In 3D, a box usually has × × =3 3 3 27 neighbors. (In 2D, this numberbecomes × =3 3 9.)

� The far field of B (denoted by ( )B ): the set of all the boxes that are on the same level as B and do not belong to ( )B . In 3D,there are usually −8 27L boxes in ( )B for a level-L box B. (In 2D, this number is −4 9L .)

� The interaction list of B (denoted by ( )B ): the subset of ( )B that consists of the children of the neighbors of B's parent. In3D, there are usually × − =27 8 27 189 boxes in ( )B . (This number is × − =9 4 9 27 in 2D.)

In Fig. 3, we show the neighborhood, far field and interaction list of a level-3 box in 2D. For any evaluation point xi,assuming that it belongs to a leaf box B, our strategy for evaluating ui is the following: the part of ui induced by the sourcepoints in ( )B will be computed directly, and the other part of it which is induced by the source points in ( )B will beapproximated by FMM.

3.2. Equivalent surfaces/coronas

The original kernel-independent FMM (Ying et al., 2004) assigns each box B in the octant tree two “equivalent surfaces”:the upward equivalent surface which is taken to be the boundary of B (denoted by ∂B) and the downward equivalent surfacewhich is taken to be the boundary of the neighborhood of B (denoted by ∂ ( )B ). Fictitious force fields are also imposed onthem with respective densities ( )f yBU and ( )f yBD , which are defined as follows: for every point ∈ ( )Bx , the velocity at xinduced by the upward equivalent surface equals that induced by the source points in B, i.e.,

∫ ∑Φ Φ∀ ∈ ( ) ( ) ( ) = ( )( )∂ ∈

B dx x y f y y x y f: , , ;14B

BU

Bj j

yj

and for every point ∈ Bx , the velocity at x induced by the downward equivalent surface equals that induced by the sourcepoints in ( )B , or equivalently,

Page 7: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–8466

∫ ∑Φ Φ∀ ∈ ( ) ( ) = ( )( )∂ ( ) ∈ ( )

B dx x y f y y x y f: , , .15B

BD

Bj j

yj

In other words, the upward and downward equivalent surfaces serve as “proxies” of the source points in B and those in ( )B ,respectively. In Fig. 4(a), we plot both equivalent surfaces for a box B in 2D.

A modified kernel-independent FMM has been developed in Ying (2006) whose main improvement over the originalmethod is the use of equivalent “coronas” instead of surfaces. More precisely, the upward equivalent corona of B is definedto be the space between ∂B and a surface strictly contained in B, and the downward equivalent corona is the space between∂ ( )B and a surface strictly contained in ( )B . We can again define the force densities supported on the coronas to satisfytwo integral equations very similar to (14) and (15). (The only difference is that both integrals will be over the equivalentcoronas instead of the equivalent surfaces ∂B and ∂ ( )B .) The modified version has been shown in Ying (2006) to be moreaccurate for approximating (11) when Φ is a radial basis function. Fig. 4(b) displays the equivalent coronas of the same boxshown in Fig. 4(a). In all of our numerical experiments, we use the equivalent coronas for their accuracy; nonetheless, sincethe change to the original method caused by the use of coronas is only technical, we continue to use the equivalent surfacesin the rest of this review for simplicity.

3.3. Numerical approximation of the force densities

The integral equations (14) and (15) will be solved numerically for the force densities fBU and fBD supported on theequivalent surfaces. First, we discretize the integrals on the left-hand sides of both equations using a quadrature rule. Let{ } =qm

BUmN

1q be the quadrature points on the upward equivalent surface ∂B and { } =qm

BDmN

1q be the quadrature points on the

downward equivalent surface ∂ ( )B (see again Fig. 4), where Nq denotes the number of quadrature points. Next, we choose afinite number of samples for x in both (14) and (15). For convenience, the samples of x in (14) are chosen to be { } =qm

BDmN

1q

whereas the samples of x in (15) are chosen to be { } =qmBU

mN

1q . As a result, we obtain the following two ×N N3 3q q linear systems

of equations of { ( )} =f qBUmBU

mN

1q and { ( )} =f qBD

mBD

mN

1q , respectively:

∑ ∑ω Φ Φ( ) ( ) = ( ) = …( )= ∈

k Nq q f q q y f, , , 1, 2, ,16m

N

mBU

kBD

mBU BU

mBU

BkBD

j j qy1

q

j

and

∑ ∑ω Φ Φ( ) ( ) = ( ) = …( )= ∈ ( )

k Nq q f q q y f, , , 1, 2, , ,17m

N

mBD

kBU

mBD BD

mBD

BkBU

j j qy1

q

j

where ω ω{ } { } ∈= = ,mBU

mN

mBD

mN N

1 1q q q are coefficients determined by the quadrature rule used. Once (16) and (17) are solved, the

velocity at any point ∈ ( )Bx induced by the source points in B can be estimated by

∑ ω Φ( ) ( )( )=

x q f q,18m

N

mBU

mBU BU

mBU

1

q

and similarly, the velocity at any point ∈ Bx induced by the source points in ( )B is approximately

∑ ω Φ( ) ( )( )=

x q f q, .19m

N

mBD

mBD BD

mBD

1

q

In FMM terminology, (18) and (19) are called the multipole expansion and local expansion of box B, respectively. Theadvantage of computing the two velocities using (18) and (19) is to avoid direct interaction between the source points and

Table 1A summary of the parameters used in the complexity test.

Parameter Description Value

The computational domain [ − ] × [ − ] × [ ]5, 5 5, 5 0, 10No Number of microorganisms 5000, 40 000, 320 000, 2 560 000, or 20 480 000Ns Number of particles 10 000, 80 000, 640 000, 5 120 000, or 40 960 000Lmax Maximum level of refinement of in the FMM 2, 3, 4, 5, or 6Nq Number of quadrature points (a × ×4 4 4 uniform Cartesian grid is used) 64ℓ Length of each microorganism 0.02ε The regularization parameter in the MRS 0.02Nc Number of CPU cores 1

Page 8: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

Table 2Comparison of the kernel-independent FMM and direct summation for (11).

No Ns Lmax Tfmm (s) Tfmm ratio Tdirest (s) Tdir

est ratio Eu

5000 10 000 2 5 – 12 – 4.23 10�5

40 000 80 000 3 53 10.60 494 41.17 2.10 10�4

320 000 640 000 4 537 10.13 20 390 41.28 7.33 10�4

2 560 000 5 120 000 5 3776 7.03 1 490 300 73.09 2.18 10�3

20 480 000 40 960 000 6 28 727 7.61 93 653 000 62.84 4.88 10�3

Number of particles (Ns = Ne)103 104 105 106 107 108

Run

tim

e (in

sec

ond)

100

102

104

106

108

1010

Run time of KI FMMEstimated run time of direct summationLine with slope 1Line with slope 2

Fig. 5. The loglog plot of run time vs. number of particles.

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–84 67

the evaluation points, which is the key to reducing the computational complexity of (11). Instead, they will interact in-directly through the quadrature points.

It remains to show how to compute the right-hand sides of (16) and (17) efficiently, a detailed description of which isdeferred to Appendix A due to its technicality. Here we only wish to provide a short summary:

� The right-hand side of (16) is first computed exactly for each leaf box. As we traverse the octant tree from the leaf level tolevel 2, the right-hand side of (16) of each non-leaf box is then approximated using the information already obtained forits children. This step is called upward passing in FMM.

� The right-hand side of (17) is first estimated for each box on level 2 using the computational results from the upwardpassing step. As we traverse the octant tree in the reverse order (that is, from level 2 to the leaf level), the right-hand sideof (17) of each descendant box is then approximated using our knowledge about its parent. This step is referred to asdownward passing in FMM.

� It can be shown that the complexity for the above two steps is ( )Ns when the distribution of the source points is close tobeing uniform.

3.4. Evaluation

In this step, the velocities { } =ui iN

1e at the evaluation points are finally computed. Note that unlike in the upward and

downward passing steps where we need to traverse the entire octant tree, evaluation is only performed at the leaf level. For

Table 3The relative error as the regularization parameter ε varies (No¼320 000).

ε and ℓ Eu

0.005 9.12 10�5

0.01 2.62 10�4

0.02 7.33 10�4

0.04 1.76 10�3

0.08 2.84 10�3

Page 9: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

Table 4The relative error and run time of the kernel independent FMM associated with different grids for the equivalent coronas (No¼320 000).

Cartesian grid Nq Tfmm Eu

× ×4 4 4 64 537 7.33 10�4

× ×5 5 5 124 1145 2.95 10�4

× ×6 6 6 208 2485 1.07 10�4

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–8468

each evaluation point xi in a leaf box B, we first calculate directly the part of ui induced by the source points in ( )B , i.e.,

∑ Φ( )( )∈ ( )

x y f, ;20B

i j jyj

and then we use (19) to estimate the part of ui induced by the source points in ( )B :

∑ ω Φ( ) ( )( )=

x q f q, ,21m

N

mBD

i mBD BD

mBD

1

q

where { ( )} =f qBDmBD

mN

1q has been produced during the downward passing step. It is straightforward from their definitions that

( ) ∪ ( ) =B B and ( ) ∩ ( ) = ∅B B , therefore, ui is approximately the sum of (20) and (21).Since there are at most s27 source points in ( )B by the construction of the octant tree, the cost of computing (20) is

bounded above by a constant γN . The cost of evaluating (21) only depends on Nq and is therefore a fixed constant γF . Sincethere are a total of Ne evaluation points, the cost of the evaluation step is no more than

γ γ( + )· = ( )N N .N F e e

Summing up this cost and the ones calculated in Appendix A, we conclude that the total cost of kernel-independent FMM is( + )N Ne s .

4. Interactions of a large number of swimming particles

The “dumbbell” model was first introduced in Hernandez-Ortiz et al. (2005) as a minimal model for swimming mi-croorganisms. In this model, each microorganism is represented by a pair of beads or particles (hence the name “dumbbell”)which mimic its cell body and flagella, respectively. Such an oversimplification drastically reduces the computational ex-pense but is only appropriate for capturing far-field interactions. A more realistic model will be considered in Section 5. Inthe original dumbbell model, force balance between the flagellar force, drag forces, and a force due to rigidity of the rodconnecting the beads was accounted for. The MRS has been applied to study the flow field generated by the collectiveswimming of microorganisms described by a simplified version of the dumbbell model (Ainley et al., 2008).

We will test the performance of the kernel-independent FMM at approximating (11) arising from the MRS applied to anexample considered in Ainley et al. (2008). In this example, a large number of microorganisms are uniformly but randomlydistributed in a 3D Stokes fluid; and the two particles representing each microorganism exert forces of unit norm andopposite directions along its length, pushing them away from each other. Let No denote the number of microorganisms.Then there are =N N2s o source points in total. By Eq. (7), the kernel for this example is

Φμ

( ) = ( ) + ( )( − )( − )( )

⎛⎝⎜

⎞⎠⎟H r I H rx y x y x y,

1,

22T

1 2

where = ∥ − ∥r x y , μ is the viscosity of the fluid, and I is the ×3 3 identity matrix. The scalar functions ( )H r1 , ( )H r2 are asdefined in Eq. (7) and their exact forms depend on the blob function used in the MRS. In our numerical experiments, theblob function given by Eq. (3) is used at every source point y , which gives

επ ε π ε

( ) = +( + )

( ) =( + ) ( )

H rr

rH r

r

28

,1

8.

231

2 2

2 2 3/2 2 2 2 3/2

Table 5Run time of the kernel-independent FMM when multiple cores are used (No¼320 000).

Nc Tfmm Optimal Tfmm Strong scaling efficiency (%)

1 537 537 1002 307 268 874 176 134 768 100 67 67

Page 10: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–84 69

In order to explore the performance of the kernel-independent FMM applied to this problem, we perform a series ofnumerical experiments where we vary the number of microorganisms (No), the number of quadrature points (Nq) used todiscretize the equivalent coronas, the regularization parameter ε in the MRS as well as the number of CPU cores (Nc) used.The computational domain is a × ×10 10 10 box and stays fixed, and the trapezoidal rule over a uniform grid of quad-rature points is used to discretize both integral equations (14) and (15). All the numerical experiments are run in MATLAB(version 2014a) on an Intel Xeon E5-2680 v2 CPU. The run time is calculated using the tic and toc command pair inMATLAB.

4.1. Computational complexity

As analyzed in Section 3, direct summation has computational complexity ( )N Ne s whereas the kernel-independent FMMhas complexity ( + )N Ns e . In the special case where Ne¼Ns (i.e., if we only compute the velocities for the particles and noextra fluid markers), the two complexities become ( )Ns

2 and ( )Ns , respectively. The aim of the following set of experimentsis to verify both computational complexities by increasing No. The parameters used are summarized in Table 1.

As shown in Table 1, starting from the second No, each No is 8 times as large as the previous one; and every time No

increases by a factor of 8, we also increase Lmax by 1 accordingly so that the number of particles s in each leaf box staysroughly the same. We compute the velocities of all the particles using the kernel-independent FMM and record its run timeTfmm, which includes the time required for constructing the octant tree, sorting all the particles into the boxes, the upwardand downward passing as well as the actual evaluation. Since the run time of direct summation increases drastically as No

grows, we only use it to calculate the velocities of 1000 randomly sampled particles, record the run time Tdirsample and estimate

the run time of the full simulation as

= · ( )T TN

1000. 24dir

estdirsample s

In Fig. 5, we plot both Tfmm and Tdirest corresponding to the increasing sequence of No listed in Table 1. (Note that this is a loglog

plot.) Two lines with respective slopes 1 and 2 are also displayed in the same figure for reference. Fig. 5 shows that the curverepresenting Tdir

est is almost parallel to the line with slope 2, confirming that the complexity of direct summation is ( )Ns2 . The

curve corresponding to Tfmm, on the other hand, is close to being parallel to the line with slope 1, which verifies that thecomplexity of the kernel-independent FMM is ( )Ns . The run time of both methods is also reported in Table 2. Based on thecomplexity analysis, we expect Tfmm to grow by a factor of 8 and Tdir

est by a factor of 64 as No grows by a factor of 8, which canbe seen in Table 2 in the asymptotic regime.

4.2. Accuracy

We also monitor the errors of the kernel-independent FMM in the sequence of experiments described above. To accountfor the accuracy of this method at approximating the velocities of both the microorganisms and the fluid, we compute thefollowing relative error:

=∑ ∥ − ∥

∑ ∥ ∥ ( )

=

=

Eu u

u 25

n ifmm

i

n iu

12000 2

12000 2

n n

n

where the subscripts { } =in n 12000 correspond to an ensemble of evaluation points consisting of the 1000 particles sampled in the

previous set of experiments and 1000 random points in the fluid, and { } =ui n 12000

nare the exact velocities of these points

obtained by direct summation. (The reason that we do not include all the particles in (25) is because direct summation isapplied to the sampled particles only.) As shown in Table 2, the relative error (25) is between ( )−10 5 and ( )−10 3 in ourexperiments and grows with the density of the microorganisms.

We conclude that the kernel-independent FMM is able to achieve significant savings in computational cost if the numberof microorganisms is large enough and if the requirement on accuracy is not very stringent. We note that as the number ofmicroorganisms is increased, the computational domain is fixed; different error may be observed for inhomogeneous sourcedistributions in space. Since the relative error will depend on a number of different parameters, in the following numericalexperiments, we fix No¼320 000 (and thus Ns¼640 000) and explore the impacts of other parameters on the performanceof the kernel-independent FMM.

4.3. Regularization parameter

We vary the regularization parameter ε in (22) and always make the length ℓ of each microorganism equal ε. The relativeerror (25) associated with each ε is reported in Table 3, which illustrates that the kernel-independent FMM becomes moreaccurate as ε decreases (that is, when the shape of the blob function ψ ( )ε r gets “taller”).

Page 11: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

Table 6A summary of the parameters used in the Kirchhoff rod example.

Parameter Description Value

The computational domain [ − ] × [ − ] × [ ]8, 8 8, 8 0, 16Lmax Maximum level of refinement of in the FMM 3Nq Number of quadrature points (a × ×5 5 5 uniform Cartesian grid is used) 124Nr Number of rods 225M Number of segments on each rod 150Ns Number of grid points on the rods × =225 151 33975L Length of each rod μ9 mΔs Length of each segment on the rods = μL M/ 0.06 mΔt Size of a time step 10�6 sNt Number of time steps 20 000ε The regularization parameter in the MRS Δs5 , Δs6 , Δs7

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–8470

4.4. Number of quadrature points

We also examine the performance of kernel-independent FMM as the Cartesian grids used to discretize the equivalentcoronas are refined. More specifically, we continue to use the trapezoidal rule and consider three uniform meshes:

× ×4 4 4, × ×5 5 5 and × ×6 6 6, which consist of 64, 124 and 208 quadrature points, respectively. The correspondingdimensions of the linear systems (16) and (17) are 192, 372 and 624. As shown in Table 4, the kernel-independent FMMbecomes more costly when a finer mesh is used because the dimension of the linear systems that need to be solved becomeslarger; in the meantime, it gets more accurate since the two integrals in (14) and (15) are better approximated by thequadrature rule. Note that evenwhen the finest grid is used, the kernel-independent FMM is still significantly more efficientthan direct summation for this particular choice of No.

4.5. Number of CPU cores

So far all the numerical results are obtained on a single CPU core. We also parallelize the kernel-independent FMM codeusing the Parallel Computing Toolbox in MATLAB. In the upward or downward passing step, we need to traverse the octanttree in a particular order; however, the loop over all the boxes on the same level is parallelizable. In the evaluation step, theloop over all the leaf boxes can also be parallelized. We use the parfor command to parallelize these loops. The numbers ofcore(s) that we use are 1, 2, 4, and 8, and the run time of each case is shown in Table 5. We also compute the strong scalingefficiency for each case, which is defined to be the following percentage:

x axis-8 -6 -4 -2 0 2 4 6 8

y ax

is

-8

-6

-4

-2

0

2

4

6

8

Fig. 6. The projections of the 225 straight rods onto the xy-plane at t¼0 s.

Page 12: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

Fig. 7. Left: 225 straight rods placed on the plane z¼0.2. Right: the same set of free swimming rods at 0.02 s (ε = Δs5 >).

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–84 71

=· ( )

·N N

strong scaling efficiencyrun time when 1 core is used

run time when core s are used100%.

c c

Table 5 demonstrates that the efficiency decreases by around 10% every time we double the number of cores. The mainobstacle to increasing the efficiency of the parfor command lies in the way memory is accessed by the cores: instead ofaccessing the data stored in a shared memory via pointers, every core has to make a local copy of the data that it will beusing. This increases the overhead of parallel computing.

5. Interactions of multiple free swimming flagella

In this example, we simulate the collective swimming of a large number of elastic rods immersed in a 3D Stokes fluid.The elastic rods are modeled using an unconstrained version of the Kirchhoff rod theory (Dill, 1992), and they translate,bend and twist in the fluid. We continue to investigate the effectiveness of the kernel-independent FMM applied to the MRS.

Compared to the previous example, this is a more challenging problem in the following ways. In the “dumbbell” example,translation is the only type of motion present since the flagella of each microorganism are simply represented by a point.Here, the flagella are modeled as elastic rods that will bend and twist. Thus, besides forces and linear velocities, torques andangular velocities also need to be considered. This results in a more complicated kernel. Furthermore, unlike in the previousexample where we only compute the linear velocities at one instant, in this example, we simulate the motion of the rodsover a period of time, which in turn entails solving a sequence of N-body problems. It not only significantly increases the runtime regardless of whether direct summation or the FMM is used but also has more stringent requirement on the accuracyof the latter since the numerical errors will propagate.

5.1. Description of the method

In our simulation, each rod is represented by its centerline which is a 3D parametric curve of finite length and isinitially straight. The points on the rod are Lagrangian variables parametrized by the arclength from the base point ofthe rod. To capture the twisting of the rod, we associate each point on the rod with three orthonormal triads: one istangent to the rod whereas the other two are along the cross section at that point. In the previous example, each sourcepoint exerts to the surrounding fluid a force whose magnitude is 1 and whose direction is along the length of themicroorganism that it belongs to. In this example, the forces and torques exerted by the rods are computed as follows.We assume that each rod has an intrinsic curvature that it tries to achieve by constantly adjusting its internal force andtorque. In order to compute them, we discretize each rod into a number of segments and treat them as small springsconnected by the grid points. At every time step, the elastic energy stored in each segment due to the differencebetween its current and preferred shapes can be computed, from which we can then back out the internal force andtorque of that segment. The force and torque exerted by the fluid to the rods at each grid point can be derived bybalancing the linear and angular momentums at that point. Finally, the same force and torque in the opposite directionsare applied by the grid point to the fluid. This method has been used in previous work (Lim, 2010; Lim et al., 2008;Olson et al., 2013) where a detailed description of it can be found.

Page 13: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

Fig. 8. Relative errors of the kernel-independent FMM applied to the Kirchhoff rod example (ε = Δ Δs s5 , 6 or Δs7 ).

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–8472

Page 14: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

Fig. 9. Left: 225 straight rods placed on the paraboloid. Right: the same set of free swimming rods at 0.02 s (ε = Δs5 ).

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–84 73

Let there be +M 1uniformly spaced grid points on each rod dividing it intoM equal segments. Let L be the length of the rods,then Δ =s L M/ gives the length of each segment. Let { } =y j

njN

1s denote the locations of the grid points at the nth time step. They are

arranged in such a way that { } =( − )( + )+( + )y j

nj k Mk M

1 1 11 correspond to the grid points on the kth rod in increasing order of the arclength

from the base of the rod. If there are Nr rods, then = ( + )N N M 1s r . Let { } =bkn

kN

1r denote the base points of the rods at the nth time

step, i.e., = ( − )( + )+b ykn

k Mn

1 1 1. Then the initial locations of the grid points { } =( − )( + )+( + )y j j k M

k M01 1 1

1 on the kth rod can be written as

+ ·Δ =

⎧⎨⎪⎩⎪

⎣⎢⎢

⎦⎥⎥⎫⎬⎪⎭⎪

qs

b00 .k

q

M

0

0

Let { } =f jn

jN

1s and { } =n j

njN

1s be the forces and torques exerted by the grid points to the fluid, the computation of which has been

described above. By Eqs. (9) and (10), at the nth time step, the linear velocities { } =uin

iN

1e and angular velocities { } =wi

niN

1e at the

evaluation points { } =xin

iN

1e given by the MRS are

∑ Ψ= = …( )=

⎡⎣⎢⎢

⎤⎦⎥⎥

⎛⎝⎜⎜

⎞⎠⎟⎟⎡⎣⎢⎢

⎤⎦⎥⎥ i N

u

wx y

f

n, , 1, 2, , ,

26

in

in

j

N

in

jn j

n

jn e

1

s

where the kernel

Ψμ

( ) =( ) + ( )( − )( − ) ( )[ − ]

( )[ − ] ( ) + ( )( − )( − )( )

×

×

⎢⎢⎢⎢

⎥⎥⎥⎥

H r I H r Q r

Q r D r I D rx y

x y x y x y

x y x y x y,

112

12

14

14

.

27

T T

T T

1 2

1 2

The scalar functions ( ) ( ) ( )Q r D r D r, ,1 2 in (27) are defined in Section 2 and depend on the blob function used; and for any vector

= [ ] ∈ v v vv , , T1 2 3

3, the matrix [ ]×v denotes the ×3 3 skew-symmetric matrix

−−

⎢⎢⎢

⎥⎥⎥

v v

v v

v v

0

0

0

,3 2

3 1

2 1

that is, the cross product of v and any vector ∈ r 3 is × = [ ] ·×v r v r whereas that of r and v is × = [ ] ·×r v v rT . Computing (26) isequivalent to computing the matrix–vector product

Page 15: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

Fig. 10. Projection of the rods onto the xy-plane at t¼0.02 s.

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–8474

Ψ Ψ Ψ

Ψ Ψ Ψ

Ψ Ψ Ψ

=

( ) ⋯ ( ) ⋯ ( )… … … … …

( ) ⋯ ( ) ⋯ ( )… … … … …

( ) ⋯ ( ) ⋯ ( )

( )

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

u

w

u

w

u

w

x y x y x y

x y x y x y

x y x y x y

f

n

f

n

f

n

, , ,

, , ,

, , ,

28

n

n

in

in

Nn

Nn

n n njn n

Nn

in n

in

jn

in

Nn

Nn n

Nn

jn

Nn

Nn

n

n

jn

jn

Nn

Nn

1

1

1 1 1 1

1

1

1

1

e

e

s

s

e e e s

n s

s

Page 16: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

Fig. 11. The z coordinates of the mid-point of the 15 rods on the center row at t¼0 s, 0.005 s, 0.010 s, 0.015 s and 0.02 s.

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–84 75

where the matrix n is ×N N6 6e s. In this section, we continue to use the blob function (3), which gives (23) and (29):

επ ε

ε επ ε

επ ε

( ) = +( + )

( ) = − −( + )

( ) = +( + ) ( )

Q rr

rD r

r r

rD r

r

r

5 28

,10 7 2

8,

21 68

.29

2 2

2 2 5/2 1

4 2 2 4

2 2 7/2 2

2 2

2 2 7/2

We will focus on the dynamics of the rods and thus choose { } = { }= =x yin

iN

jn

jN

1 1e s , though extra evaluation points can easily be

included as in the previous example. At the nth time step, let {( ) ( ) ( ) }D D D, ,in

in

in1 2 3 denote the three orthonormal triads as-

sociated with xin, where ( )Di

n3 is tangent to the rod at xin and ( ) ( )D D,i

ni

n1 2 are parallel to the cross section at xin. Let Δt be the

time step. The method proposed in Olson et al. (2013) for tr3acking the movement of Kirchhoff rods with an intrinsiccurvature and twist immersed in a 3D Stokes fluid is outlined in Algorithm 1.

Algorithm 1. The forward Euler time stepping for the Kirchhoff rod model.For = … −n N0, 1, 2, , 1t :

1. Compute using finite difference the forces { } =f in

iNs

1 and the torques { } =nin

iNs

1 based on the difference between the current shapes of the rods given by

{ } =xin

iNs

1 and {( ) } =Dik n

iNs

1 ( =k 1, 2, 3) and their desired shape at time step +n 1 specified by an intrinsic curvature and twist.

2. Compute the linear velocities { } =uin

iNs

1 and the angular velocities { } =win

iNs

1 from (26), or equivalently, (28).

3. Compute the locations of the grid points and the orientation of the triads as follows: for = …i N1, 2, , s,

← + ·Δ ( ) ←∥ ∥

∥ ∥·Δ ·( )( )

+ +⎛⎝⎜

⎞⎠⎟t R tx x u D

ww

w D, , ,30i

nin

in

ik n i

n

in i

nik n1 1

where θ θ θ θ( ) = ( ) + ( − ) + ( )[ ]×R Ie ee e, cos 1 cos sinT is a ×3 3 orthogonal matrix which, when applied to any vector ∈ v 3, rotates it about the unit

vector ∈ e 3 through an angle of θ.

Page 17: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

Fig. 12. Mean efficiencies of the rods from t¼0 s to 0.02 s (the first four beats).

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–8476

The computational complexity of both steps 1 and 3 of Algorithm 1 is ( )Ns and that of step 2 is ( )Ns2 if (26) is computed

directly. We want to increase the efficiency of this algorithm by reducing the complexity of step 2 to ( )Ns using the kernel-independent FMM. The changes that need to be made to the FMM algorithm are only technical when it is applied to (26)instead of (11). The main difference is that besides the two fictitious forces fields, we also need to impose two fictitioustorque fields on every box. The two integral Eqs. (14) and (15) become

∫ ∑Ψ Ψ∀ ∈ ( ) ( )( )

( )= ( )

( )∂ ∈

⎣⎢⎢

⎦⎥⎥

⎣⎢⎢

⎦⎥⎥B dx x y

f y

n yy x y

fn

: , ,31

B

BU

BUB

jj

jyj

and

∫ ∑Ψ Ψ∀ ∈ ( )( )

( )= ( )

( )∂ ( ) ∈ ( )

⎣⎢⎢

⎦⎥⎥

⎣⎢⎢

⎦⎥⎥B dx x y

f y

n yy x y

fn

: , , ,32

B

BD

BDB

jj

jyj

which lead to linear systems

∑ ∑ω Ψ Ψ( )( )

( )= ( ) = …

( )= ∈

⎣⎢⎢

⎦⎥⎥

⎣⎢⎢

⎦⎥⎥ k Nq q

f q

n qq y

fn

, , , 1, 2, ,33m

N

mBU

kBD

mBU

BUmBU

BUmBU

BkBD

jj

jq

y1

q

j

and

Page 18: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–84 77

∑ ∑ω Ψ Ψ( )( )

( )= ( ) = …

( )= ∈ ( )

⎣⎢⎢

⎦⎥⎥

⎣⎢⎢

⎦⎥⎥ k Nq q

f q

n qq y

fn

, , , 1, 2, ,34m

N

mBD

kBU

mBD

BDmBD

BDmBD

BkBU

jj

jq

y1

q

j

of dimension ×N N6 6q q instead of ×N N3 3q q as in the “dumbbell” example. Eqs. (18)–(21) and (40)–(43) need to be adjustedin a similar fashion to account for the new kernel. Note that all the fictitious force and torque fields will vary with time sincethe locations of the source points as well as the forces and torques that they apply to the fluid are time-dependent. In Eqs.(31)– (34), we deliberately drop the superscript n that specifies the time step to simplify the notation.

5.2. Set-up of the numerical experiments

A summary of the values of the parameters used in our simulation can be found in Table 6. The computational domainis chosen to be large enough so that none of the grid points on the rods will move outside of throughout all the si-mulations. In the kernel-independent FMM, three levels of partition are used for and the trapezoidal rule with a × ×5 5 5uniform grid is used to discretize each equivalent corona, which results in Nq¼124 grid points and linear systems (33) and(34) of dimension ×744 744. The computational domain , its hierarchical decomposition as well as the discretization ofeach corona will remain fixed for the duration of each simulation, though they can easily be adapted at each time step toaccount for the movement of the rods. Initially, we place 225 straight rods of length μ9 m in the computational domain.Their projections onto the xy-plane form a uniform ×15 15 grid and are shown in Fig. 6. In our numerical experiments, wewill vary the z coordinates of the base points { } =bk k

N01

r of the rods to investigate how the swimming pattern of the rods willchange with their initial geometric configuration.

The number of source (or evaluation) points in this example is 33 795, which is on the small end of the range of problemsconsidered in the previous section. This choice is made based on practical considerations. The current simulation will beover a period of time which entails computing (26) or (28) a large number of times. One of our goals is to investigate howthe error of the FMM propagates when applied to a time-dependent problem. Therefore, we need to solve this problemusing both the FMM and direct summation and compare their results at each time step. In the previous example, we onlyapply direct summation to compute the linear velocities for a sample of particles to save time. Here, since we need to updatethe locations and triads of all the source points before moving on to the next time step, direct summation has to be appliedto the full problem. The number of rods is chosen so that the simulation with direct summation for (26) can be performed ina reasonable amount of time.

Recall that what drives the rods to move is their desire to achieve an intrinsic shape. In our simulations, the desiredshape is chosen to be a traveling spiral wave determined by a time-dependent strain-twist vector Ω:

{ }( ) ( )Ω Ω Ω Ω κ τ σ κ τ σ( ) = { } = ( − ) ( − ) ( )s t s t s t, , , cos , sin , 0 , 351 2 3

where s is the arclength between any point on a rod and the base point of the same rod, t is the time, κ τ σ, , are thecurvature, torsion and frequency of the traveling spiral wave, respectively. This choice for the desired shape of the rods ismotivated by the observation that sperm flagella propagate planar or helical waves in experiments (Woolley, 2003; Woolleyand Vernon, 2001). A variety of planar and helical waves including the spiral wave given by (35) have been considered inOlson (2014) to study the motion of elastic rods in viscous fluids. The negative sign in (35) indicates that the spiral wave willpropagate downward causing the rod to move upward in the fluid when no other rods are present. The curvature κ andtorsion τ can be computed as

κλπ

τ

λπ

λπ

=+

=+⎜ ⎟

⎜ ⎟

⎜ ⎟⎛⎝

⎞⎠

⎛⎝

⎞⎠

⎛⎝

⎞⎠

b

b b2

and 2

2

,22

22

where b is the amplitude and λ is the wavelength of the spiral wave. We choose σ = −1257 s 1, λ = μ2.25 m and = μb 0.09 m,which implies that the period of the wave is π σ =2 / 0.005 s (or equivalently, Δt5000 ) and there are λ = =L/ 9/2.25 4 wa-velengths along the length of each rod.

Since we are using the unconstrained version of the Kirchhoff rod model, inextensibility of the rods is not guaranteed butcan be satisfied approximately if a small enough time step is used for a given set of material parameters of the rod. Note thatthis is a result of the way wemodel the rods regardless of whether direct summation or FMM is used for computing (26). Wechoose Δ = −t 10 s6 to ensure that the length of all the 225 rods will not grow or shrink by more than 1% of their initial length( μ9 m) at any time step throughout the simulations.

5.3. Propagation of the errors in time

In the first set of experiments, we place 225 straight rods on the plane z¼0.2, i.e., ( ) =b 3 0.2k0 for = …k N1, 2, , r (see Fig. 7

(a)). The x and y coordinates of { } =bk kN0

1r are shown in Fig. 6. We simulate the motion of these rods from 0 s to 0.02 s. i.e., for

four beats of the spiral wave specified by (35). Fig. 7(b) is a snapshot of the same 225 rods at 0.02 s when ε = Δs5 is used as

Page 19: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

Fig. 13. Comparison of the kernel-independent FMM errors when forward Euler (Δ = −t 10 6 s) is used and those when RK2 (Δ = × −t 3 10 6 s) is used(ε = Δs7 ).

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–8478

Page 20: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–84 79

the regularization parameter in (3). Since the size of each time step is Δ = −t 10 s6 , 20 000 time steps are needed which thenlead to 20 000 evaluations of (26) or (28).

To investigate the accuracy of the FMM and how it progresses with time, we introduce the following notation for therelative error in a quantity v at the nth time step:

=∑ ∥ ( ) − ∥

∑ ∥ ∥ ( )

=

=

Ev v

v,

36

n iN

ifmm n

in

iN

inv

12

12

s

s

where{ } =vin

iN

1s are the exact values calculated using direct summation. In Fig. 8, we display the sequences of E n

u , E nw, En

x , E nD1, E n

D2 and

E nD3 corresponding to three different values of the regularization parameter ε: Δs5 , Δs6 and Δs7 . As can be seen from this figure,

the errors in all the quantities grow fairly slowly for the duration of the simulation, especially when ε = Δs5 or Δs7 ; and all theerrors are between ( )−10 6 and ( )−10 4 at the end of the simulation. In addition, each time step of our simulation takes ap-proximately 175 s to run on a single core when FMM is used for (26), which is about 50% of the run time when direct summationis used. As shown in the previous example, the advantage of the FMM will be more pronounced when more structures areincluded. This set of experiments demonstrate the potential of the kernel-independent FMM in simulating the long-term be-havior of a large collection of microorganisms modeled by the MRS.

We note that improvements can be made to Algorithm 1 so that a larger time step may be taken while the length of therods is still maintained. In particular, we can use a more accurate model for the forces and torques and/or a higher ordertime stepping scheme such as one of the Runge–Kutta methods instead of forward Euler. In Appendix B, the behavior of thekernel-independent FMM applied to the same example is explored with the second-order Runge–Kutta method (RK2) as thetemporal discretization scheme. We observe that using an RK method allows for a much larger time step which in turn leadsto considerable savings in simulation time, and moreover, it does not degrade the accuracy of the kernel-independent FMM;however, the extent to which we can relax the time step is limited by the accuracy of the force/torque model.

5.4. Swimming pattern and efficiency as the geometric configuration varies

In the following experiments, we examine how the swimming pattern and efficiency of the rods vary with their initialplacement. The regularization parameter ε is fixed to be Δs5 . We consider the following four initial placements of the 225straight rods: on the plane z¼0.2 as in the previous experiments, and on the following paraboloid:

( ) = −·

( + ) +( )

p x yh

x y h,2 7. 5 372

2 2

where =h 0.5, 1 or 2. The family of paraboloids (37) satisfy the following properties: ( ) =p x y, 0 at the points ( − − )7.5, 7.5 ,( − )7.5, 7.5 , ( − )7.5, 7.5 and ( )7.5, 7.5 , and ( ) =p h0, 0 . The projection of the rods are again given by Fig. 6. In Fig. 9(a), we plot225 straight rods placed on the paraboloid (37) with h¼2, and Fig. 9(b) is a snapshot of the same set of rods at t¼0.02 s.

When there is only one rod propagating a spiral wave (35), it will keep moving upward and mostly maintain its uprightposition. However, when there are a pack of them interacting with each other through the fluid while trying to achieve thesame desired curvature, very different behaviors can be observed depending on their initial placement and locations relativeto each other. Since each rod propagates a spiral wave, the more upright it is, the more circular its projection onto the xy-plane will be. Therefore, we use this projection as an indicator of how upright a rod is swimming. Each subplot of Fig. 10illustrates the projection of all 225 rods onto the xy-plane at t¼0.02 s under one of the four initial placements. As can beseen from this figure, in all four cases, the closer a rod is placed towards the center initially, the more likely that it will stayupright during the simulation. In particular, the rod at the center stays upright regardless of the initial placement. Moreover,the more separated the rods are in their initial elevation, the more tilted they will become while swimming in the fluid.

To visualize the vertical movement of the rods as time progresses, we choose the 15 rods in the center row to berepresentatives and plot in Fig. 11 the z coordinates of their mid points from t¼0 s to t¼0.02 s. Each curve in this figureconnects the mid points of these 15 rods at the same time step. As shown in Fig. 11(a), when the rods are placed on the planez¼0.2 initially, they will keep moving upward throughout the simulation with the ones on the two ends moving slightlyfaster than the ones in the middle. On the other hand, when the rods are initially placed on a paraboloid, they will try tobecome aligned by swimming upward, staying where they are or even swimming backward depending on their positionsrelative to their neighbors, as depicted in Fig. 11(b), (c), and (d). In the case where h¼0.5, for example, the mid-points of therods are almost at the same height already when t¼0.02 s.

We also monitor the hydrodynamic efficiency of vertical swimming. For the kth rod at the nth time step, it is defined as

η = +∑ ( )

∑ · ( )

=( − )( + )+( + )

=( − )( + )+( + )

⎛⎝⎜

⎞⎠⎟M

u

f u

11

3,

38kn

i k Mk M

in

i k Mk M

in

in

1 1 11

2

1 1 11

where the numerator is the squared mean vertical speed of all the grid points and the denominator is the total powergenerated by them (Lighthill, 1975). In other words, Eq. (38) measures how efficiently the swimmers gain kinetic energy in

Page 21: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–8480

the vertical direction. The average efficiency of the kth rod from time steps n¼0 to −N 1t can then be defined as

∑η η=( )=

N1

.39

kt n

N

kn

0

1t

Fig. 12 displays the average efficiencies of all 225 rods under the four initial placements. The gray scale of each small squarerepresents the average efficiency of the rod initially placed at its center: the brighter the square is, the larger the average efficiency is.In all four cases, the maximum average efficiency occurs on the four edges and this maximum increases as the paraboloid becomestaller. When h¼0.5 or h¼1, as can be seen from Fig. 12(b) and (c), the rods in the middle are the least efficient. This is because theymove very slowly or even just stay where they are vertically as shown in Fig. 11(b) and (c). When h¼2, however, these rods becomemore efficient because they move downward quickly to stay aligned with the other rods, which has been illustrated in Fig. 11(d). Thering of minimum efficiency in Fig. 12(d) is also expected from the stagnation of the rods at those locations observed in Fig. 11(d).

6. Conclusion

The method of regularized Stokeslets (MRS) gives rise to an N-body problem whose computational cost is ( )N2 . This be-comes a bottleneck of MRS in the simulation of active matter where N is typically very large. We demonstrate the effectiveness ofthe kernel-independent fast multipole method (FMM) at accelerating this computation by considering both a simple modelwhere the microorganisms are represented by pairs of particles and a much more sophisticated model where they are modeledas elastic rods that will bend and twist. In our numerical experiments, significant savings in run time and satisfactory numericalerrors are achieved for N varying from tens of thousands to tens of millions. Swimming speeds and directions as well as hy-drodynamic efficiencies of the microorganisms have been investigated, which indicate a strong dependency on their initialplacement. Additionally, the slow progression of the errors in the time-dependent simulations for a range of regularizationparameters shows the robustness of the KIFMM in studying the self-organization of micro-swimmers. The results show thatwhen a simulation has a sufficient number of points, one could use the same parameters and models (for forces, regularization,and time stepping) when using the KIFMM and observe a significant acceleration in the computation time.

When simulating swimming microorganisms, one has many choices in terms of the modeling and computational approachesto be used. The MRS is a well-established framework for studying fluid–structure interactions at zero Reynolds number. For the“dumbbell” example studied here, this is a simplified model that is only valid in the far field. The Kirchhoff rod model is also anidealized case of a slender swimmer ( ≪r La where ra is the radius of the rod and L is the length of the rod). We note that thecorresponding regularized boundary integral equation results in a Stokeslet and a Stresslet, where we omit the Stresslet term inthe MRS under the assumption of slender bodies. Many extensions of the original method have been developed and includedifferent regularization functions, inclusion of shear flows, boundary integral formulations, and the presence of a wall (Ainleyet al., 2008; Cortez and Varela, 2015; Nguyen and Cortez, 2013; O'Malley and Bees, 2012; Smith, 2009). We believe that each ofthese are feasible when using the KIFMM. The RPY FMM/KIFMMwould be a better choice to handle Brownian fluctuations sincethe corresponding matrices are guaranteed to be symmetric, positive, and definite (Liang et al., 2013; Rotne and Prager, 1969;Yamakawa, 1970). In addition, it would be the method of choice when one wants or needs to include the Stresslet term that isomitted in the MRS (Liang et al., 2013; Rotne and Prager, 1969; Yamakawa, 1970).

There are many future directions of this work that we would like to explore in both applications and numerical methods.For applications, we plan to increase the length of the simulations presented here until large-scale, recognizable patterns inthe microorganisms emerge and compare the simulation results with the experimental results; it is also of our great interestto examine how various factors, such as the viscosity of the fluid and the waveform, frequency, elasticity and density of themicro-swimmers, affect the formation of patterns. As for numerical methods, we want to understand how the accuracy ofFMM depends on the choice of blob function and regularization parameter in MRS, to explore solution methods for theintegral equations arising from the numerical factorization of the MRS kernels as well as ways to factor them analytically,and to compare the performance of FMM and other fast summation methods such as tree codes.

Acknowledgement

M. Rostami and S. Olson were supported, in part, by NSF DMS Grant 1455270. Simulations at WPI were run on a recentlyacquired cluster, supported by NSF MRI DMS Grant 1337943.

Appendix A. The upward and downward passing in kernel-independent FMM

A.1. Upward passing

In this step, we traverse the octant tree from the leaf level up to level 2 and solve (16) for every box on these levels. Forevery leaf box, we simply evaluate the right-hand side of (14) directly. For a non-leaf box B, let { }ℓ ℓ=C 1

8 be the children of B

Page 22: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–84 81

and assume that { ( )} =ℓ ℓf qC U

mC U

mN

1q are known for all ℓ. Since { } ⊂ ( )= ℓCqk

BDkN

1q for every ℓ, according to (18), the right-hand side

of (16) can be approximated by

∑ ∑ ω Φ( ) ( )( )ℓ= =

ℓ ℓ ℓ ℓq q f q, .40m

N

mC U

kBD

mC U C U

mC U

1

8

1

q

In FMM terminology, this process of “merging” the information of the children to gather information about their parent isreferred to as the multipole-to-multipole translation (or M2M translation). We summarize the upward passing step inAlgorithm 2. Assume level Lmax is the leaf level in the octant tree.

Algorithm 2. The upward passing step.

For = − …L L L, 1, , 2max max

For every box B on level L:1. If =L Lmax:

Compute the right-hand side of (16) directly.2. Else:

Compute the right-hand side of (16) by (40).

3. Compute { ( )} =f qBUmBU

mNq

1 by solving the linear system (16).

The complexity of the upward passing step can be calculated as follows. Recall that s is a fixed number that representsthe maximum number of source points allowed in each leaf box. Assume that the distribution of the source points is close tobeing uniform in the following sense: the average number of source points in each box on level −L 1max (the second finestlevel, which is also the parent level of the leaf level) is strictly greater than s, i.e., · <−s N8L

s1max . This implies that

< + = + − ≤ ⌈ ⌉ +⎛⎝⎜

⎞⎠⎟L

Ns

N s Nlog 1 log 1 log log 1,ss smax 8 8 8 8

where⌈ ⌉r is the smallest integer greater than or equal to a real number r. For each leaf box, since both s and Nq are constantsindependent of Ne and Ns, the cost of computing the right-hand side of (16) directly is also a constant γ1. Likewise, for everynon-leaf box, the cost of computing the right-hand side of (16) by (40) is a constant γ2 as well. In addition, the cost of solving(16) or (17) for any box is a constant γ3 determined by Nq. To sum up, the total cost of Algorithm 2 is

∑ ∑

γ γ γ γ

γ γ γ γ γ γ γ γ γ

( + )· + ( + )·

≤ { } + · < { } + · < { } + ·−

−= ( )

=

= =

⌈ ⌉+⎛⎝⎜⎜

⎞⎠⎟⎟

⎛⎝⎜⎜

⎞⎠⎟⎟

⎛⎝⎜⎜

⎞⎠⎟⎟

NN

8 8

max , 8 max , 8 max ,64 1

8 1.

L

L

LL

L

LL

L

NL s

s

1 32

1

2 3

1 2 32

1 2 32

log 1

1 2 3

s

maxmax

max 8

A.2. Downward passing

Contrary to the previous step, in this step, we traverse the octant tree from level 2 down to the leaf level and solve (17)for every box on these levels. Note that for any box B and assume that P is its parent, by the definitions of ( )B and ( )B :

( ) = ( ) ∪ ( ) ( ) ∩ ( ) = ∅B B P B Pand .

This indicates that the right-hand side of (17) can be rewritten as the following sum:

∑ ∑Φ Φ( ) + ( )( )∈ ( ) ∈ ( )

q y f q y f, , .41B

kBU

j jP

kBU

j jy yj j

Since { } ⊂ ( )= AqkBU

kN

1q for any box ∈ ( )A B , by (18), the first sum in (41) can be approximated by

∑ ∑ ω Φ( ) ( )( )∈ ( ) =

q q f q, ,42A B m

N

mAU

kBU

mAU AU

mAU

1

q

where { ( )} =f qAUmAU

mN

1q have already been found in the upward passing step. The process of passing information from the

interaction list of a box to the box itself is called the Multipole to Local translation (or M2L translation) in FMM. For any boxB on level 2, ( ) = ∅P and thus ( ) = ( )B B (see Fig. 2). Consequently, the right-hand side of (17) can be approximated by(42) alone. For any box B on a level greater than 2, assume { ( )} =f qPD

mPD

mN

1q are known. The first sum in (41) can again be

computed by (42); and since { } ⊂= PqkBU

kN

1q , by (19), the second sum in (41) can be estimated by

Page 23: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–8482

∑ ω Φ( ) ( )( )=

q q f q, .43m

N

mPD

kBU

mPD PD

mPD

1

q

Then the right-hand side of (17) can be estimated by the sum of (42) and (43). A box “inheriting” information from its parentlike this is referred to as the Local to Local translation (or L2L translation) in the language of FMM. The downward passingstep is summarized in Algorithm 3.

Algorithm 3. The downward passing step.

For = …L L2, 3, , max

For every box B on level L:1. If L¼2:

Compute the right-hand side of (17) by (42).2. Else:

Compute (42), (43) and sum them up to get the right-hand side of (17).

3. Compute { ( )} =f qBDmBD

mNq

1 by solving the linear system (17).

We now analyze the complexity of the downward passing step. The cost of computing (42) for any box is bounded aboveby a constant γ4 since the interaction list of a box contains at most 189 boxes,2 and computing (43) entails a fixed cost γ5 forany box on level L where ≤ ≤L L3 max. The total cost of Algorithm 3 is therefore no more than

∑ ∑ ∑γ γ γ γ γ γ γ γ γ γ γ γ γ γ( + )· + ( + + )· < ( + + )· < ( + + )· < ( + + )·−

−= ( )

= = =

⌈ ⌉+ NN8 8 8 8

64 18 1

,L

LL

L

LL

L

NL s

s3 42

33 4 5 3 4 5

23 4 5

2

log 1

3 4 5

smax max 8

where γ3 is again the cost of solving (16) or (17) for any box.

Appendix B. Simulating the interaction of swimming flagella using RK2 in time

Instead of forward Euler, higher order methods such as RK2 can be used for time marching in the simulation ofswimming flagella described in Section 5. Algorithm 4 below outlines RK2 for the Kirchhoff rod model, which, compared toAlgorithm 1, requires an extra matrix–vector product (28) per iteration.

Consider again the case where 225 rods are initially placed on a horizontal plane and propagate spiral waves for 0.02 s.Recall that a time step of size 10�6 s is used for Algorithm 1 in order to more or less maintain the inextensibility of the rodssince it is not imposed explicitly by the rod model. If RK2 and the same method for computing the forces and torques areused, a time step as large as × −3 10 6 s can be taken instead and still guarantees that the length of the rods will not vary bymore than 1% throughout the simulation. With this time step, Algorithm 4 entails only 6667 iterations, 13 334 matrix–vectorproducts (28) and consequently 67% of the run time of Algorithm 1. We also monitor the errors of the kernel-independentFMM as time progresses and compare themwith those observed for Algorithm 1. In Fig. 13, both sets of errors are plotted forthe regularization parameter ε = Δs7 and they are almost indistinguishable, suggesting that taking a larger time step doesnot harm the accuracy of FMM. The errors behave very similarly when other values of ε are used.

Algorithm 4. The RK2 time stepping for the Kirchhoff rod model.For = … −n N0, 1, 2, , 1t :

Steps 1 and 2 are the same as in Algorithm 1.

3. Compute the locations of the grid points and the orientation of the triads at time step +n 12as follows: for = …i N1, 2, , s,

← + · Δ ( ) ←∥ ∥

∥ ∥· Δ ·( )+ +

⎛⎝⎜⎜

⎞⎠⎟⎟t

Rtx x u D

ww

w D2

, ,2

,i

nin

in

ik n i

n

in i

nik n

12

12

where the matrix R has been defined in Algorithm 1.

4. Compute using finite difference the forces+

=⎪ ⎪

⎪ ⎪⎧⎨⎩

⎫⎬⎭

f i

n

i

Ns12

1

and the torques+

=⎪ ⎪

⎪ ⎪⎧⎨⎩

⎫⎬⎭

ni

n

i

Ns12

1

based on the difference between the shapes of the rods at

time step +n 12given by

+

=⎪ ⎪

⎪ ⎪⎧⎨⎩

⎫⎬⎭

xi

n

i

Ns12

1

and ( ) +

=

⎧⎨⎩⎫⎬⎭Di

k n

i

Ns12

1( =k 1, 2, 3) and their desired shape at time step +n 1.

5. Compute the linear velocities+

=⎪ ⎪

⎪ ⎪⎧⎨⎩

⎫⎬⎭

ui

n

i

Ns12

1

and the angular velocities+

=⎪ ⎪

⎪ ⎪⎧⎨⎩

⎫⎬⎭

wi

n

i

Ns12

1

from (26) or (28). (All the superscripts in these two equations

need to be changed from n to +n 12accordingly.)

2 Recall that the interaction list of a box B consists of the children of the neighbors' of B's parent that are not B's neighbors and therefore has size atmost × − =27 8 27 189.

Page 24: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–84 83

6. Compute the locations of the grid points and the orientation of the triads at time step +n 1 as follows: for = …i N1, 2, , s,

← + ·Δ ( ) ← ·Δ ·( )+ + ++

+

+

⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟t R tx x u D

w

w

w D, , .in

in

i

nik n i

n

i

ni

nik n1

12 1

12

12

12

References

Ainley, J., Durkin, S., Embid, R., Boindala, P., Cortez, R., 2008. The method of images for regularized Stokeslets. J. Comput. Phys. 227, 4600–4616.Barnes, J., Hut, P., 1986. A hierarchical ( ( ))N Nlog force-calculation algorithm. Nature 324, 446–449.Baskaran, A., Marchetti, M., 2009. Statistical mechanics and hydrodynamics of bacterial suspensions. Proc. Natl. Acad. Sci. U. S. A. 106, 15567–15572.Beatson, R., Greengard, L., 1997. A short course on fast multipole methods. In: Wavelets, Multilevel Methods and Elliptic PDEs. Oxford University Press,

Oxford, UK, pp. 1–37.Beenakker, C., 1986. Ewald sum of the Rotne–Prager tensor. J. Chem. Phys. 85, 1581–1582.Cheng, H., Greengard, L., Rokhlin, V., 1999. A fast adaptive multipole algorithm in three-dimensions. J. Comput. Phys. 155, 468–498.Cisneros, L., Kessler, J., Ganguly, S., Goldstein, R., 2011. Dynamics of swimming bacteria: transition to directional order at high concentration. Phys. Rev. E 83,

061907.Cipra, B., 2000. The best of the 20th century: Editors name top 10 algorithms. SIAM News 33.Cortez, R., 2001. The method of regularized Stokeslets. SIAM J. Sci. Comput. 23, 1204–1225.Cortez, R., Fauci, L., Medovikov, A., 2005. The method of regularized Stokeslets in three dimensions: analysis, validation, and application to helical

swimming. Phys. Fluids 17, 031504.Cortez, R., Varela, D., 2015. A general system of images for regularized Stokeslets and other elements near a plane wall. J. Comput. Phys. 285, 41–54.Darden, T., York, D., Pedersen, L., 1993. Particle mesh Ewald: an ( )N Nlog method for Ewald sums in large systems. J. Chem. Phys. 98, 10089–10092.Deserno, M., Holm, C., 1998. How to mesh up Ewald sums. I. A theoretical and numerical comparison of various particle mesh routines. J. Chem. Phys. 109,

7678–7693.Dill, E., 1992. Kirchhoff's theory of rods. Arch. Hist. Exact Sci. 44, 1–23.Ewald, P., 1921. Evaluations of optical and electrostatic lattice potentials. Ann. Phys. – Leipzig 64, 253–287.Fasshauer, G., Zhang, J., 2009. Preconditioning of radial basis function interpolation systems via accelerated iterated approximate moving least squares

approximation. In: Ferreira, A., Kansa, E., Fasshauer, G., Leitao, V. (Eds.), Progress on Meshless Methods, vol. 11. , Springer, Berlin, Heidelberg, Germany,pp. 57–75.

Flores, H., Lobaton, E., Mendez-Diez, S., Tlupova, S., Cortez, R., 2005. A study of bacterial flagellar bundling. Bull. Math. Biol. 65, 137–168.Greengard, L., Rokhlin, V., 1987. A fast algorithm for particle simulations. J. Comput. Phys. 73, 325–348.Hernandez-Ortiz, J., Stoltz, C., Graham, M., 2005. Transport and collective dynamics in suspensions of confined swimming particles. Phys. Rev. Lett. 95, 204501.Hohenegger, C., Shelley, M., 2010. Stability of active suspensions. Phys. Rev. E 81, 046311.Ishikawa, T., Locsei, J., Pedley, T., 2008. Development of coherent structures in concentrated suspensions of swimming model micro-organisms. J. Fluid

Mech. 615, 401–431.Liang, Z., Gimbutas, Z., Greengard, L., Huang, J., Jiang, S., 2013. A fast multipole method for the Rotne–Prager–Yamakawa tensor and its applications. J.

Comput. Phys. 234, 133–139.Lighthill, J., 1975. Mathematical Biofluid Dynamics. SIAM, Philadelphia, USA.Lim, S., 2010. Dynamics of an open elastic rod with intrinsic curvature and twist in a viscous fluid. Phys. Fluids 22, 024104.Lim, S., Ferent, A., Wang, X., Peskin, C., 2008. Dynamics of a closed rod with twist and bend in fluid. SIAM J. Sci. Comput. 31, 273–302.Lushi, E., Willard, H., Goldstein, R., 2014. Fluid flows created by swimming bacteria drive self-organization in confined suspensions. Proc. Natl. Acad. Sci. U.

S. A. 111, 9733–9738.Marchetti, M., Joanny, J., Ramaswamy, S., Liverpool, T., Prost, J., Rao, M., Simha, R., 2013. Hydrodynamics of soft active matter. Rev. Mod. Phys. 85, 1143–1189.Mendelson, N., Bourque, A., Wilkening, K., Anderson, K., Watkins, J., 1999. Organized cell swimming motions in Bacillus subtilis colonies: patterns of short-

lived whirls and jets. J. Bacteriol. 181, 600–609.Moore, H., Dvorakova, K., Jenkins, N., Breed, W., 2002. Exceptional sperm cooperation in the wood mouse. Nature 418, 174–177.Nguyen, H., Cortez, R., 2013. Reduction of the regularization error of the method of regularized Stokeslets for a rigid object immersed in a three-di-

mensional stokes flow. Commun. Comput. Phys. 15, 126–152.O'Malley, S., Bees, M., 2012. The orientation of swimming biflagellates in shear flows. Bull. Math. Biol. 74, 232–255.Olson, S., 2014. Motion of filaments with planar and helical bending waves in a viscous fluid. In: Layton, A., Olson, S. (Eds.), Biological Fluid Dynamics:

Modeling, Computations, and Applications. Contemporary Mathematics, AMS eBook Collections, vol. 628.Olson, S., Lim, S., Cortez, R., 2013. Modeling the dynamics of an elastic rod with intrinsic curvature and twist using a regularized stokes formulation. J.

Comput. Phys. 283, 169–187.Peskin, C., 2002. The immersed boundary method. Acta Numer. 11, 459–517.Phalzner, S., Gibbon, P., 1996. Many Body Tree Methods in Physics. Cambridge University Press, Cambridge, UK.Pozrikidis, C., 1992. Boundary Integral and Singularity Methods for Linearized Viscous Flow. Cambridge University Press, Cambridge, UK.Ramaswamy, R., 2010. The mechanics and statistics of active matter. Annu. Rev. Condens. Matter Phys. 1, 323–345.Riedel, I., Kruse, K., Howard, J., 2005. A self-organized vortex array of hydrodynamically entrained sperm cells. Science 309, 300–303.Rokhlin, V., 1985. Rapid solution of integral equations of classical potential theory. J. Comput. Phys. 60, 187–207.Rotne, J., Prager, S., 1969. Variational treatment of hydrodynamic interaction in polymers. J. Chem. Phys. 50, 4831–4837.Saintillan, D., Shelley, M., 2008. Instabilities, pattern formation, and mixing in active suspensions. Phys. Fluids 20, 123304.Saintillan, D., Shelley, M., 2011. Emergence of coherent structures and large-scale flows in motile suspensions. J. R. Soc. Interface 9, 571.Sanchez, T., Chen, D., DeCamp, S., Heymann, M., Dogic, Z., 2012. Spontaneous motion in hierarchically assembled active matter. Nature 491, 431–435.Sauter, S., 2000. Variable order panel clustering. Computing 64, 223–261.Simha, R., Ramaswamy, S., 2002. Hydrodynamic fluctuations and instabilities in ordered suspensions of self-propelled particles. Phys. Rev. Lett. 89, 058101.Smith, D., 2009. A boundary element regularized Stokeslet method applied to cilia- and flagella-driven flow. Proc. R. Soc. A 465, 3605–3626.Surrey, T., Nedelec, F., Leibler, S., Karsenti, E., 2001. Physical properties determining self-organization of motors and microtubules. Science 292, 1167–1171.Thar, R., Kuhl, M., 2002. Conspicuous veils formed by vibrioid bacteria on sulfidic marine sediment. Appl. Environ. Microbiol. 68, 6310–6320.Tornberg, A., Greengard, L., 2008. A fast multipole method for the three-dimensional stokes equations. J. Comput. Phys. 227, 1613–1619.Toukmaji, A., Board Jr, J., 1996. Ewald summation techniques in perspective: a survey. Comput. Phys. Commun. 95, 73–92.Wioland, H., Woodhouse, F., Dunkel, J., Kessler, J., Goldstein, R., 2013. Confinement stabilizes a bacterial suspension into a spiral vortex. Phys. Rev. Lett. 110,

268102.Wolgemuth, C., 2008. Collective swimming and the dynamics of bacterial turbulence. Biophys. J. 95, 1564–1574.

Page 25: Journal of Fluids and Structures - Syracuse Universitymwrostam.mysite.syr.edu/webpage_files/KIFMM-MRS.pdf · Kernel-independent fast multipole method within the framework of regularized

M.W. Rostami, S.D. Olson / Journal of Fluids and Structures 67 (2016) 60–8484

Woolley, D., 2003. Motility of spermatozoa at surfaces. Reproduction 126, 259–270.Woolley, D., Vernon, G., 2001. A study of helical and planar waves on sea urchin sperm flagella, with a theory of how they are generated. J. Exp. Biol. 204,

1333–1345.Yamakawa, H., 1970. Transport properties of polymer chains in dilute solution: hydrodynamic interaction. J. Chem. Phys. 53, 436–443.Ying, L., 1989. On the fast matrix multiplication in the boundary element method by panel clustering. Numer. Math. 54, 463–491.Ying, L., 2006. A kernel independent fast multipole algorithm for radial basis functions. J. Comput. Phys. 213, 451–457.Ying, L., Biros, G., Zorin, D., 2004. A kernel-independent adaptive fast multipole algorithm in two and three dimensions. J. Comput. Phys. 196, 591–626.