finite element solution strategies for large-scale flow ... · pdf filefinite element solution...

Computer Methods in Applied Mechanics and Engineering 112 (1994) 3-24 North-Holland

Finite element solution strategies flow simulations*

M. Behr and T.E. Tezduyar

for large-scale

~e~ar~rne~ of Aerospace Engitzeering and ~e~h~~i~s~ Army ~~~~-Ferfor~ce ~o~p~~i~~ Research Center and Supercomputer Institute, University of Minnesota, 1200 Waskington Avenue South,

Minneapolis, MN 55415, USA

Received 4 December 1992 Revised manuscript received 10 December 1992

Large-scale flow simulation strategies involving implicit finite element formulations are described in the context of incompressible flows. The stabilized space-time formulation for problems involving moving boundaries and interfaces is presented, followed by a discussion of mesh moving schemes. The methods of solution of large linear systems of equations are reviewed, and an implementation of the entire finite element code, permitting the use of totally unstructured meshes, on a massively parallel supercomputer is considered. As an example, this methodology is applied to a flow problem involving three-dimensional simulation of liquid sloshing in a tank subjected to vertical vibrations.

1. Introduction

Fuelfed by advances in supercomputer technology, the popularity of numerical simulation as a way of investigating physical phenomena has been growing steadily. Fluid dynamics research in particular is benefitting greatly from the new methodology. Yet, numerical approximation of physical systems, and especially the necessary replacement of continuum with a finite number of variables, is rarely straightforward. The possibility of approximation errors obliterating the actual physical phenomena is a constant danger. A combination of mathematical analysis and careful comparison with the experimentat and analytic results is necessary to validate any new numerical method. Even after such verification, many obstacles remain on the road to realistic large-scale simulations. Chief among them is the need to solve large systems of linear equations, which is inherent in the implicit methods considered here. A more recent challenge is the need to consider parallelization potential of every algorithm employed. These issues are among those addressed in this article. Even though we consider a formulation for incompressible flows, many of the procedures and remarks are equally applicable to simulation of compressible flows.

We discuss the application of finite element tools to flow problems which involve moving boundaries and interfaces. The domain over which such problems have to be solved evolves in time, requiring modifications of the usual. simulation techniques. We address this problem with a stabilized finite element formulation, which incorporates the motion of the domain automatically by virtue of deforming space-time elements.

Any discussion of numerical techniques cannot be easily separated from implementational issues. Practical application of finite element methods on scalar, and even vector, architectures has been

Correspondence to: Professor ‘I’ayfun E. Tezduyar, Supercomputer Institute, 1200 Washington Avenue South, Minneapolis, MN 55415, USA.

* This research was sponsored by NASA-JSC under grant NAG 9-449, by NSF under grants MSM-8796352 and ASC-9211083, by ALCOA Foundation and by ARPA under NISI’ contract GONANB2D1272. Partial support for this work has also come from the Army Research Office contract number DAAL03-89-C-0038 with the Army High Performance Computing Research Center at the University of Minnesota. The authors are thankful for the technical support they received from the Thinking Machines Corporation, in particular from Dr. John Kennedy.

~45-~SZj/94/$~.~ @ 1994 Elsevier Science B.V. AH rights reserved

3 M. Behr. 7‘.E. Tezduyzr, Finite element solution strategies

extensively discussed in many textbooks and tutorials. The advent of massively parallel computers, while promising continuing advances in computational speed, rendered most of these classical finite element programming techniques obsolete. Also, the range of best to worst code performance dramatically increased with the shift away from traditional sup~rcomputers to machines with hundreds or thousands of processing units. Both the data structure and the algorithm design have to be changed extensively to take advantage of the speed-ups offered by parallelism. An implementational framework suitable for unstructured finite element grids. yet resulting in good and scalable parallel performance. is also discussed,

We begin by introducing the equations of the fluid motion for the incompressible case in Section 2. In Section 3, we open with a general discussion of numericat methods used to solve flow problems involving moving boundaries and interfaces. Subsequently, we present the space-time velocity-pressure formulation which takes such deformations into account in an elegant and straightforward manner. We describe the stabilization technique which will endow the method with necessary robustness. In Section 4 we discuss the applicability of the method to problems involving deforming domains, and related issues such as mesh movement options.

The formulations such as the one described in Section 3 give rise to large coupled linear systems of equations, which can be solved by using either a direct or an iterative solution technique. The merits and disadvantages of both approaches are discussed in Section 5. The iterative scheme under consideration is based on the preconditioned Generalized Minimum Residual (GMRES) method.

The large-scale flow problems which motivate us to develop the finite element formulations in the first place, prompt us also to pay particular attention to the usability of the related algorithms on the computer architectures that offer highest computational speeds. Today, that is synonymous with implementing the concepts included in following sections on massively parallel distributed-memory computers. Section 6 relates our experience in that area, which is so far restricted to the Single- Instruction-Multiple-Data (SIMD), or data parallel, style of programming, which is accessible on the Connection Machine CM-200 and CM-5 computers.

A numerical example is presented in Section 7. Here, the space-time velocity-pressure formulation is used in a three-dimensional simulation of sloshing in a tank subjected to vertical vibrations.

2. Problem statement

We consider a viscous, incompressible fluid occupying at an instant t E (0, T) a bounded region 0, c [w”\d, with boundary &, where rqd is the number of space dimensions. The velocity and pressure, u(x, t) and p(r, t), are governed by the momentum and mass balance equations:

! du p ,t+u-Vu-f -V.u=O onf2,

1 Vff(0, T) * (1)

V.u=O on 4 VtE(0, T), (2)

where p is the fluid density, assumed to be constant, andf(x, t) is an external, e.g., gravitational, force field. The closure is obtained with a constitutive equation relating the stress tensor u to velocity and pressure fields. In the constitutive equation for a Newtonian fluid, the deviatoric stress is assumed to be proportional to the fluid rate of strain tensor which is defined as

E(U) = $(Vu + (Vu)‘) . (3)

Under this assumption, the stress tensor for a fluid with dynamic viscosity p is defined as follows:

cr=-pl+T, T = 2&u) . (41

Both the Dirichlet and Neumann-type boundary conditions are taken into account, represented as

M. B&r, T.E. Tezduyar, Finite element solution strategies 5

n - u - ed = h, on (Qh,d, d = 1, . . . , nsd , (6)

where (&d and (%d are complementary subsets of the boundary r], and {e,>ze, is a basis in lPsd. Note that the decomposition of 4 may be different for each of the basis vectors ed. In practice, the bases often coincide with the local directions normal and tangential to the boundary. All of the commonly used inflow, no-slip, symmetry and outflow traction-free boundary conditions can be represented in this way.

The initial condition consists of a divergence-free velocity field specified over the entire domain:

4-v 0) = uo , V-u,=0 onGo. (7)

Of interest here is the case where the domain fz,, and consequently its boundaries 4, evolve in time.

3. Space-time velocity-pressure formulation

When faced with fluid dynamics problems involving free surfaces, moving interfaces and deforming domains in general, a decision tree regarding the choice of a numerical method shown in Fig. 1 may be the one to follow.

The Marker-And-Cell (MAC) method developed by Harlow and Welch, and described in an exhaustive report [l], has stood its own against the more established finite element and finite difference approaches in this particular area. The numerous examples presented in [l] attest to the method’s extraordinary flexibility. The concept of a mesh covering the maximum possible domain, but with computations being performed only for the cells intoning marker particles, leads to efficient implementations, Since the tracer particles are indistinguishable from one another, the joining and separation of fluid parts - a source of mesh generation nightmares in the finite difference/element approaches - can be effortlessly simulated. Yet the MAC method is not free of shortcomings. Various stress effects at free surfaces, such as surface tension and contact angles, have been notoriously difficult to incorporate, requiring complex surface fitting techniques such as those introduced in [2]. The method can suffer from stability problems which may be di~cult to detect. That is, apparently reasonable particle distribution may be an artifact of gross approximation errors. Some modification were proposed to stabilize the MAC method in [3]. Another variation of the MAC concept replaces the discrete marker particles with a continuous fractional volume of fluid (VOF) function, as described in [4].

The extensive pool of knowledge accumulated about the finite difference, element and volume methods in the context of fixed domain problems makes these more standard approaches also attractive. Here, we concentrate on the finite element approach. It should be noted that some of the finite element methods reviewed in this section possess a finite difference counterpart. For a finite element method to be applicable to problems involving deforming domains, it needs to have one

Marker-And-Cull Finite Element, Difference, , . .

I Lagrangian

I Lagrangian-Eulerian

ALE Space-Time

I I Continuous-In-Time Discontinuous-In-Time

Fig. 1. Numerical methods applicable to problems involving deforming domains: authors’ decision tree (see text for explanation of terms).

6 M. Behr. T.E. Tezduyar, Finite element solution strategies

important feature which may be omitted in the case of problems involving fixed domains only. The necessity of accurately following the deforming boundary of the domain requires some degree of

Lagrangian description to be present in the formulation, at least in the vicinity of the boundary. Since the use of a Lagrangian viewpoint is unavoidable, some researchers consider fully Lagrangian

formulation for the entire domain [5, 61. The simplicity of such a solution is offset by several difficulties.

The flow field in the interior of the domain may exhibit a large amount of circulation, such as local vortices, which is often unrelated to the motion of the boundary. The fully Lagrangian approach may

result in unnecessary mesh distortion and tangling in such interior regions, even for moderate or null displacements of the boundary. On the other hand, internal circulation and shear layers are handled without difficulty by an Eulerian description.

These facts provided strong motivation for the development of methods capable of combining the Lagrangian and Eulerian approaches in the same domain. The Arbitrary Lagrangian-Eulerian (ALE)

formulation was initially stated in a finite difference context by Hirt [7], and eventually adopted also in the finite element community (see [8, 91 and references contained therein for a detailed description). In

the modern ALE approach, velocities of the nodes of the computational mesh explicitly enter the

momentum equation, which is written over a reference domain. This domain may or may not coincide with either material or spatial domains. Nodal velocities may be either prescribed a priori, or be a

function of the fluid velocity (most notably at the free surface). The ALE method has been used to

solve a number of interesting problems, including those quoted in [9] and [lo]. Another avenue was pioneered by Jamet and Bonnerot [ 11, 121 as a means of solution of Euler

equations and Stefan-type problems. The method uses finite element discretization in both space and time to automatically account for the deformation of the computational domain. This approach was later applied to shallow water equations by Lynch and Gray [13] and flows of incompressible fluids by Frederiksen and Watts [14]. Improving on the continuous-in-time interpolations used in the applications mentioned above, Jamet [15] proposed the use of discontinuous-in-time functions for a model parabolic

problem. Here for the first time the numerical method was presented together with analytic proofs of stability and estimates of accuracy.

In a parallel development, the discontinuous-in-time space-time methods have been gaining popularity for fixed domain problems, as an alternative to the more common semidiscrete (finite

elements in space, finite differences in time) approach. The advantages, such as potential for selective temporal refinement and possibility of obtaining rigorous convergence predictions, were discussed by

Hughes and Hulbert in [16] in the context of elastodynamics. Considerable attention was given to the space-time method for fixed domains by Hansbo et al. [17]. Their characteristic streamline diffusion method takes advantage of the possibility of varying node distribution inside the domain to reduce

advective effects by moving interior nodes approximately along the characteristic directions. followed

by remeshing. The potential of the discontinuous-in-time space-time formulation as a means of simulating flows of

fluids in a deforming domain has been exploited by Tezduyar et al., in [ 18, 191. Later applications of the

technique are described in [20, 211. It has also been used to conduct large-scale studies of incompressible flows past oscillating airfoils and cylinders [22, 231. More recently, the same method, with a particular mesh moving scheme conforming to the characteristic streamline diffusion ideology. has been applied to deforming domain problems by Hansbo [24]. Finally let us note that a similar approach can be equally successful for compressible flow simulation, as shown by Aliabadi and Tezduyar in [25]. The remainder of this chapter recapitulates the method introduced in [18, 191, and discusses mesh motion and related aspects. The discontinuous-in-time space-time technique considered here may be expected to retain the detailed accuracy analyses carried out for its fixed domain counterpart, while providing an extremely elegant way of accounting for the displacements of the reference domain. i.e., the finite

element mesh. Note that in addition to the direct modeling of the governing equations, the free-surface flows arc

amenable to treatment with specialized methods, which may use specific assumptions about the flow to produce an approximate set of governing equations. As is the case with the Shallow Water Equations (SWE) [26], this modified problem is often significantly simpler to model than the original one. The simplified problem is in turn solved with a suitable numerical method. These types of approximations

are not considered here.

M. Behr, T.E. Tezduyar, Finite element solution strategies 7

3.1. Variational formulation

In order to construct the finite element function spaces for the space-time method, we partition the time interval (0, T) into subintervals Z, = (t,, t,, 1), where t, and t,, 1 belong to an ordered series of time levels 0 = t, < t, < * . . < t, = T. Let C& = 4, and Z, = & . We define the space-time slab Q, as the domain enclosed by the surfaces a,, , fin + 1 and P,,, where P,, i”s the surface described by the boundary 4 as t traverses I,,. These concepts are sketched in Fig. 2. As it is the case with r], surface P,, can be decomposed into (P,), and (P,),, with respect to the type of boundary condition (Dirichlet or Neumann) being applied, again with a possibly different decomposition for each of the velocity degrees of freedom. For each space-time slab, we define the following finite element interpolation function spaces for the velocity and pressure:

(Yi), = {u” 1 uh E [H’h(Q,)]“sd, uh A gh on (P,),} , (8)

(Vi), = {u” 1 uh E [H'"(Q,)]"sd, uh G 0 on (P,),} , (9)

($2, = P;), = {P” 1 ph E H’“(Q,)> . (10)

Over the element domain, the interpolation is constructed by using first-order polynomials in space and, depending on our choice, zeroth- or first-order polynomials in time. Globally, the interpolation functions are continuous in space but discontinuous in time. However, for two-liquid flows, the solution and variational function spaces for pressure should include the functions that are discontinuous across the interface. It should be remarked that when required, the method can employ higher-order interpolation functions in both space and time, to provide necessary accuracy.

The stabilized space-time formulation for deforming domains can be written as follows: given (u”), , find uh E (Yi), and ph E (9’:), such that Vwh E (YE),, Vqh E (Yi),:

9) dQ + I, 4wh) : 4ph, u”) dQ n

+ qhV-uh dQ + I R (w”),’ . P((u”),’ - @“),I dfl

n n

Fig. 2. Space-time discretization: concept of a space-time slab.

M. Behr. T.E. Tezduyar, Finite element solution srrategie.\

-/) -V.a(ph, u”)] dQ

The notational conventions used in (3 1) are shown below:

(zf”),’ =iim ~(t,, t F) , r-0 (12)

The solution to (11) is obtained sequentially for all space-time slabs 9, , Q2, . . . * Q,v , . The computations start with

In the variational formulation given by (ll), the first three terms of the left-hand side, together with the right-hand side, constitute the standard Galerkin formulation of the problem. The fourth integral enforces, in a weak sense, the continuity of the velocity in time over the lower boundary O,, of the space-time slab Q,. The fifth term in (11) is a least-squares form of the momentum equation added to the formulation. This term provides crucial stability characteristics and is discussed in more detail, including the definition of rMoM, later in this section. At high Reynolds numbers, the stability is further enhanced by the sixth term present in the formuIation (ll), which includes a least-squares form of the continuity equation. The coefficient rconr is aIso defined later in this section.

The deformation of the mesh is reflected in the deformation of space-time elements, and directly affects the computation of the transport terms. This effect is illustrated for a typical deformed space-time element shown in Fig. 3. Let 5 and 8 serve as a set of standard reference coordinates for that element. In the special, yet common, case when the 8 = const levels coincide with the t = const levels, the Jacobian matrix of the transformation between the reference and physical domains can be written as

I

X

Fig. 3. Space-time discretization: deforming elements.

M. Behr, T. E. Tezduyar, Finite element solution strategies

and its inverse as

ax ae at ’ - ae 1

where At = t, + 1 - t, and vh is the local mesh velocity defined as

ax vh = at g=,,“st

Now the transport term can be computed as

auh auh a8 aUh ag i)t+Uh.v*h=~j-+--++h*vuh

ag at

auh 2 auh =___--_h .vg + uh .VUh ae At a&

h

= S$ + (u” - v”) *Vuh .

9

(16)

(17)

(18)

(19)

This is equivalent to modifying the advective velocity by subtracting the mesh velocity. If the element does not deform, as in a fully Eulerian description, the at/at term vanishes and the transport terms retain full advective velocity. On the other hand, in an approximately Lagrangian description, the node movement follows the fluid velocity field, and the effective advection velocity (u” - v”) is close to zero.

The mixed Galerkin formulation, i.e., formulation (11) with 7MoM = 0 and 7coNT = 0, suffers from two well known stability problems. The first of the two is observed when a solution of an advection- dominated problem is attempted. The spurious oscillations arise in the vicinity of the boundary layers, and pollute the entire domain. As a remedy, the streamline upwind / Petrov-Galerkin (SUPG) technique introduced by Brooks and Hughes [27] has been widely accepted in the fluid dynamics community, and is a part of the stabilization terms of (11). The second stability problem is unique to mixed methods, and manifests itself for only certain combinations of the interpolations for the velocity and pressure fields. Simply stated, the danger stems from the fact that the pressure field enters the mixed Galerkin variational formulation, whph dQ. For not sufficiently rich (Yi),, p

i.e., (11) with rMoM = 0, only through the term la V* s urious pressure fields, or modes, may ‘slip through’ a ne”t of

velocity weighting functions and remain undetected by the variational formulation. A consistent approach for the Stokes problem was introduced by Hughes et al. in [28]. Here the weighted residual character of the Galerkin formulation was preserved, with added stabilizing terms which vanish when the solution approaches the exact one. A generalization of that work for the finite Reynolds number flows led to pressure stabilized/Petrov-Galerkin (PSPG) formulation developed by Tezduyar et al. [29]. As seen in [29], the PSPG formulation has been successfully applied to different equal-order interpolations in the context of Navier-Stokes equations. The Galerkin / Least-Squares technique used here combines the SUPG and PSPG stabilizations in one conceptually simple term. As seen in (ll), the stabilizing term has the form of the sum of element-level inner products of the momentum equation and its variation.

The TCONT- term has been proposed by Franca and Hughes [30], and provides stability to the velocity

10 M. Behr. T, E. Terduyw, Finite element solution srraregieJ

field at high Reynolds numbers. This term can dramatically enhance the convergence of the nonlinear iterative algorithm, without compromising the consistency of the original method. The standard stability analysis for the advection-diffusion equation with SUPC or Galerkin / Least-Squares stabilization relies on the divergence-free property of the advection velocity field. In general, the velocity field obtained by solving (11) will become free of divergence only in the h --+ 0 timit, where h is the mesh parameter. This potential for instability is eliminated with the addition of the consistent rcoN., -term to the formulation.

A parameter rMoM designed specifically for use with bilinear interpolations has been given in, c.g.. [EC], and is repeated here:

or

(20,

in the steady-state case. Here laJij!, = (Z>?,l~~l(‘)“” is the pointwise velocity norm, h,, is the element length and v = p/p is the kincmattc viscosity. The second design (21) coincides in the advective limit with a definition used earlier for non-space-time formulations (see e.g. [29]):

Re,, < 3

Re,>3’

(22)

where Re, is an element level Reynolds number. The definition of ~~o,.,~ is the same as that introduced in [31]:

with Re, and t(Re,) defined as in (22). Here 9 is a positive parameter, usuahy taken as unity. The definition (23) is a smoothed version of the parameter initially used in [32].

Special remarks apply to the formulation (11) when it is used with first-order interpolation functions for the velocity field. Although the formulation appears to be consistent, the least-squares stabilization terms involve an unmodified form of the momentum equation, and hence second derivatives of the velocity field. These derivatives vanish identically for linear (e.g., triangle) velocity interpolations, and are incomplete for bi- or trinlinear (e.g., quadrilatera1) velocity elements. Thus, while the full viscous momentum equation is represented in the Galerkin part, only an inviscid, or nearly so, version of it is present in the least-squares contribution. The imbalance created by the absence of the viscous terms will normally be absorbed by the pressure and convective terms. This effect is hardly noticeable for the rMoM definitions in use here. But in the space-time formulation, the presence of the time evolution terms in the least-squares contribution may lead to a non-negligible and surprising phenomenon. When a steady flow of a fluid is computed with a time-dependent procedure, the convergence is achieved when (Us),,, = (a”),, yet (u”), need not be equal to (u”),:. When this criterion is used, there exists the possibility of a non-zero C&/at and a corresponding non-zero jump-term. Indeed, the already mentioned inconsistency seen in the least-squares form of the momentum equation, will feed such a saw-tooth pattern in the time behavior of the solution. By varying the time step size, the weighting expression &vh/dt is changed, leading to relative shifts in weight between the Galerkin and Least-


Squares forms of the momentum equation. This accounts for the visible dependence of the steady-state solution on the time step, when the bilinear or trilinear velocity interpolation is used.

Another low order element anomaly appears in the context of temporal discretization. In the applications of space-time methods to the fixed domain problems, a satisfactory solutions may be obtained with the use of constant-in-time interpolation functions. When such interpolation is applied to the deforming domain problems, it falls into the subparametric category, which may invalidate some of the assumptions of convergence analysis, such as completeness of the shape function sets.

4. Moving boundary treatment

Similar to other Lagrangian-Eulerian formulations, the current formulation gives us nearly complete freedom of the mesh movement within the domain, and to some extent also at the boundary. In the particular case of free surfaces and moving no-flux interfaces, the only requirement for the velocity u of the nodes on that boundary is v*n = u*n, i.e., the normal components of the nodal and fluid velocities must match. The tangential component of u remains arbitrary, with some common choices depicted in Fig. 4. Boundary nodes can be made to move in a prescribed direction (e.g., horizontal or vertical), in a direction normal to the free surface, or in the direction of local fluid velocity vector. In the last case, we obtain a locally Lagrangian description at the boundary.

In problems involving mild or cyclic deformations only, the initial mesh can be designed to provide the necessary flexibility, and remain topologically unchanged throughout the simulation. In problems involving large deformations, on the other hand, it may be necessary to introduce new meshes during the course of the computations, and project the solution from the old set of nodes to the new one for every new mesh. There are situations in which the deformation of the mesh is known in advance, either precisely, or qualitatively. Such is the case when computing flows around bodies moving in a prescribed manner. Thus the initial mesh can be made to absorb certain types of deformations without significant element distortion or the need for remeshing, as shown schematically in Fig. 5. This approach is successfully used in conjunction with the present formulation to compute flows past oscillating cylinders and pitching airfoils (see [22, 231). In general, such techniques will be successful when the deformation of the domain is either cyclic or tends to a steady-state.

In many cases, however, the distortion of the domain is irregular or unpredictable, and therefore the design of an explicit mesh moving method is difficult. More flexible techniques have been developed, in which the domain is treated as an elastic solid under given boundary displacements [21]. The movement of the nodes is determined by solving the elasticity equations to obtain the displacement field in the interior of the domain. The shape of the critical elements can be improved by adjusting the (fictitious)

t I

Fig. 4. Mesh moving options at the free surface: prescribed

direction (left), normal direction (center) and local velocity direction (right).

Fig. 5. Mesh moving options: movement with no remeshing.

12 M. Behr. T. E. Tezduyar, Finite element solution strategies

f

\ \ \i i __ \

X

Fig. 6. Mesh moving options: movement with discrete re-

meshing.

X

Fig. 7. Mesh moving options: movement with continuous

remeshing.

material properties for these elements. Only the initial mesh needs to be explicitly generated, possibly by an automatic mesh generator.

In some other cases, dramatic changes in the shape and volume of the domain may cause both of the aforementioned techniques to break down. Our approach here is to use an automatic mesh generator, the INRIA-developed Emc* [33], to discretize anew the domain deformed during the previous time step, as shown in Fig. 6. The solution has to be projected to the new mesh via an interpolation technique. The jump term integral in (11) can also serve as a projection mechanism. The ability of this scheme to handle arbitrary domains is offset by the diffusive effect of the projection, which grows with increasing time resolution, as the number of projections increases with decreasing time step size.

The ideal approach should integrate the elastic treatment of the interior domain, with infrequent remeshing when unacceptable mesh distortions are reached. It is also desirable to isolate, if possible, a section of the domain in which remeshing should be avoided, and preserve the mesh in this region even as the mesh in the rest of the computational domain undergoes restructuring and data there are projected. It is clear that the linear projection will not affect a flow field with uniform velocity or uniform shear rate. On the other hand, the adverse effects of projection are most pronounced in regions where the shear rate varies rapidly, e.g., in boundary layers of internal shear layers. Sometimes, the location of such regions is predictable. For example, computation of fluid flow around a solid object might involve a structured, stiff mesh fragment around the object which is never subjected to remeshing and the associated projection process, thus preserving the flow field within its boundary layer. Outside this small area, even frequent remeshing may be allowed, as it will not degrade the quality of a slowly varying flow field.

Yet another possibility is available to us with the use of a completely unstructured mesh within the space-time slab. In the approaches discussed so far, the space-time mesh was formed by a straightforward extension of the spatial mesh in the time dimension. Such strategy yields three-dimensional six-node or eight-node prisms for two-dimensional problems discretized with triangular or quadrilateral elements, and four-dimensional bricks for three-dimensional problems. The important point is that the mesh generation burden is restricted to the spatial domain only. With n,,-simplex mesh in the spatial domain, it is also possible to connect two dissimilar meshes at the two time levels of a space-time slab with an unstructured (nsd + 1)-simplex mesh. The remeshing in this case is achieved in a continuous manner, as shown in Fig. 7. Unfortunately with the present state of affairs in the area of mesh generation, this strategy is not yet viable for three-dimensional problems. The mesh motion with no remeshing (Fig. 5) is utilized in the sloshing problem in Section 7.

5. Solution techniques

At each nonlinear iteration step of a finite element procedure, we have the task of solving a large linear equation system,

M. Behr, T.E. Tezcluyar, Finite element solution strategies 13

Ax=b. (24)

Initially A is assumed to be an N X N general matrix, while later we discuss special properties of the matrices arising from the finite element discretizations. The size of the system in our applications ranges from N = 5000 for small space-time test problems to N = 300 000 for some of the large-scale problems. Only in the lower part of this spectrum are the direct solution techniques feasible on contemporary computers. For larger problems the memory limitations, as well as rapidly increasing computing cost, make iterative solution techniques a necessity. It is therefore tempting to abandon the direct solver entirely, for the sake of code uniformity. Yet time and again, the direct solvers prove to be a necessary tool in at least the development stage of a finite element implementation. For example, to properly diagnose instabilities of a new variational form, we may have to approach and pinpoint the borders of the nonconvergence of the nonlinear iteration process. The robustness of the direct technique, which is lacking from the practical iterative techniques, makes such studies of nearly ill-posed problems possible. Iterative solvers are usually applied after the basic properties of the system matrix, often predicted by analysis, are confirmed numeri~lly with the aid of a direct solver.

Of particular interest here is the LU decomposition of the matrix A, and the Crout algorithm for achieving such decomposition. This method gives an option of efficient solution of a system with multiple right-hand sides, yet having a low operations count similar to the methods of Gauss and Gauss-Jordan. The multiple right-hand side feature can be useful when the nonlinear iterations are performed with a frozen left-hand side matrix to save matrix formation expense.

While the direct solution methods offer much better tolerance of unfavorable properties of the linear system than their iterative counterparts, they are not immune to failure. Aside from the obvious requirement of the nonsingularity, matrix A may also require pivoting to correct certain orderings of the equations, which otherwise would lead to numerical overflows. In the case of the stabilized methods discussed here, we are guaranteed that the symmetric part of the matrix A is positive definite. The design of the stabilization parameters ensures that the positive definiteness constant is greater than machine precision zero. Thus pivoting is not necessary in our implementations.

Another important effect of the equation ordering is the bandwidth of the matrix A. Figure 8 shows the position of nonzero entries of a typical matrix arising from a finite element discretization. It is natural to take advantage of the sparsity of such a matrix by employing the skyline matrix storage scheme, such as that described in Section 11.2 of [34]. In that scheme, only the entries below the first nonzero entry in each column of the upper-triangular part of A, and the entries to the right of the first nonzero entry in each row of the lower-triangular part, are stored. Some zero entries of the initial matrix A are still stored, as they become nonzero, or filled-in the process of the LU decomposition. The resulting arrangement of stored and ignored entries is often called a skyline profile of the matrix. The

Fig. 8. Typical finite element matrix: location of non-zero entries (left) and skyline profile (right) for a quasi-structured mesh (inset).

14 M. Be/w, T.E. Tezduyar, Finite element solution strategies

skyline profile of the example matrix is also illustrated in Fig. 8. Twice the average height of the column in a skyline matrix is called the mean bandwidth. The storage required for a skyline N x’N matrix with a mean bandwidth b is Nb. Significant changes in the bandwidth may result from simple reordering of the equations in the linear system. It is therefore worthwhile to seek an equation numbering that produces an optimal bandwidth for a given system. Normally, the initial equation numbering will correspond to the node numbering produced by the mesh generator. When a structured mesh generator is used, the equation ordering can be nearly optimal without any additional effort. To illustrate, the system pictured in Fig. 8 has been constructed from a quasi-structured finite element mesh. Un- fortunately, the more automatic the mesh generator becomes, the less control is exercised over the node ordering. This may have a disastrous effect, as the mean bandwidth b may approach N for randomly ordered meshes, leading to unmanageable storage requirements. The remedy is to use an empirical node renumbering scheme as an adjunct to the mesh generation step. Our implementations which rely on an automatic mesh generation incorporate a reverse Cuthill-McKee [35, 361 renumbering algorithm. Figure 9 shows the skyline profile of a typical finite eiement matrix resulting from a discretization via an unstructured grid. The nonzero entries, including the fill-in, account for 37% of the entire matrix. Application of the reverse Cuthill-McKee reordering yields a matrix representing the same equation system, but with dramatically different skyline profile, also shown in Fig. 9. Now the stored entries take up only 9% of the full matrix. Thus, an important four-fold reduction in storage requirement is accomplished.

In an effort to reduce the cost required to solve a large linear system, we often lower our sights from the exact solution and become content with a solution which approximates the exact one with a prescribed accuracy. Such approximation can often be found at a greatly reduced expense. The iterative methods achieve this aim by constructing a series of guesses approximating the true solution. The conjugate gradient (CG) method has been extremely successful when applied to symmetric linear systems. Discounting the truncation errors, this method is known to produce an exact solution in at most N iterations, where N is the order of the linear system. For a review of this and earlier iterative methods see, e.g., [37]. Although several generalizations of the CG algorithm to nonsymmetric linear systems exist, the Generalized Minimum Residual (GMRES) method introduced by Saad and Schultz [38] has proven to be the method of choice for many finite element researchers. The algorithm that follows was introduced by Saad in [39].

Fig. 9. Finite element matrix for an unstructured grid: skyline (right).

profile for the mesh (inset) before (left) and after reordering

M. Behr, T. E. Tezduyar, Finite element solution strategies 15

5.1. GMRES algorithm

The outline of the GMRES algorithm is shown in Box 1. Given the equation system (24) an initial guess n, and a pre~onditioner matrix M (to be discussed later), the process leads to an approximate solution vector x which minimizes the residual of the initial system over the preconditioned Krylov subspace. Each outer iteration involves solution of the linear system projected to a lower-dimensional subspace. The successive outer iterations differ in the quality of the initial guess x0 only.

The inner iteration loop constructs the Krylov space and projects the original equation system to this space, producing the matrix a which is by definition upper-Hessenberg. The modified Gramm-Schmidt orthogonalization of the basis vectors is typically the most ~omputationally expensive part of the algorithm for moderate values of m.

The solution of the reduced system takes advantage of the upper-Hessenberg form of a, as shown in Box 2. First the matrix H is reduced to an upper-triangular form by a series of Givens rotations. The

Box 1 GMRES algorithm: control flow

forI=l,...,n,,,,, r,:=b-Ax,

z, := M,?u, w :=Az, fori=l,...,j

h,.i := (w, u,)

x:=x* + EF=, yiz,

if @e, - Hy/, s E exit

I elsex,:=s

GMRES outer iterations compute initial residual compute initiaI residual norm define first Krylov vector GMRES inner iteration preconditioning step matrix-vector product Gramm-Schmidt orthogonaliation

define next Krylov vector define reduced system matrix solve reduced system - see Box 2

form approximate solution

convergence check restart

Box 2 ZMRES aigorithm: solution of the reduced system

p:=pe, initialize right-hand side forj=l,...,m

r:=qm

loop over columns of ti

c:=h,,,/y s:=b. ,+JY

:= -sh,,, + chj+l,j = 0 Givens rotation on I?

16 M. Eehr, T.E. Tezduyar, Finite elemenr solution strategies

optimal y is then found through a simple back-substitution on the transformed system. Note that the operations enclosed in frames in Box 2 are omitted from practical implementations.

It should be noted that the algorithm shown in Box 1 differs from the classical GMRES in two ways. In the classical GMRES, the Krylov space is continually expanded until convergence is reached. This leads to excessive memory requirements, and the advantage over a direct method of solution is lost. To alleviate this problem, in a practical GMRES algorithm, the dimension of the Krylov space m is limited a priori, and if this dimension proves insufficient, the algorithm is restarted in the next outer iteration. Alternatively, the orthogonalization process can be truncated to include only a given number of previous basis vectors, but this option is not pursued here. Unfortunately, these modified methods arc not guaranteed to converge, and the choice of the restart frequency has to be made carefully, usually by numerical experimentation. The current algorithm also differs from other GMRES derivatives in that it allows variable preconditioning at each inner iteration. This feature enables us to, e.g.. freely mix two different but complementary preconditioners, as described in [40].

In a more efficient (but less intuitive) version of the GMRES algorithm, the Givens rotations are computed and applied to the reduced system even as it is being constructed. This modi~cation gives us an option of discontinuing the inner GMRES iterations when proper convergence criterion is met within the inner iteration loop. Here, we use the fact that after j rotations applied to the reduced system, the current value of the norm II/3 e, - Hyll z is equal to the absolute value 1 pj +_, 1, as shown in [38]. The final algorithm adopted for present computations is thus presented in Box 3.

Box 3 GMRES algorithm: modified control flow

for I= 1, . , , noutcr r,, = 6 - Ax,,

P := l/~ul/Z u, = rOlp p:=pe, forj=l,...,m

z,:=M;‘v I w:=Az, fori=l,...,j

h ‘,, := fW? Vi) w := w - hf,,u

/ h , i l,ip;lI: v, / 1 111.1 fori=l,...,j-1

{

hi., := cA, + sA+ I.,

y :=~~~:;h::.r:.,+c.h,,,.,

c, : = ,,,,I;; ~,:=h,+,,,h

GMRES outer iterations compute initial residual compute initial residual norm define first Krylov vector initialize right-hand side GMRES inner iteration preconditioning step matrix-vector product Gramm-Schmidt orthogonalization

define next Krylov vector previous Givens rotations on fi

compute next rotation

h,,, := Y

h ,<,,,:=o

{

P, := C,P,

P,+1 := -s,p,

Givens rotation on J?

Givens rotation on p

if IJ~,+,(S’E exit loop inner loop convergence check

{~~1:= [“~ I:; :::1-‘{~:i backsubstitution

x:=x0 + c;_, yz if \~,+,ls: exi;iLop else x0:=x

form approximate solution outer loop convergence check restart


For equation systems stemming from the increment form of the variational formulation, the initial guess x0 is normally taken as zero.

In contrast to the direct methods of solution, the iterative schemes can be extremely sensitive to the particular numerical properties of the system matrix A, and the GMRES algorithm is no exception. The rate of convergence of Krylov subspace methods is influenced by the condition number of the matrix A, as shown in [41]. The analysis contained therein indicates also that the order of convergence will degrade as eccentricity of the ellipse containing the eigenvalues of the linear system decreases. In practical terms, this measure is smaller for problems dominated by advection effects, as indicated by the Reynolds number.

In most cases, a preconditioning of the original system will be needed to achieve reasonable convergence rates. Preconditioning involves a matrix M resembling in some sense the matrix A, but whose inverse is easy to compute. Assuming a constant preconditioner matrix for the duration of the iterative process (as mentioned earlier, variable preconditioning is also possible), the (right) preconditioned system becomes

AM-‘@&-) = b . (25)

The closer the matrix AM-’ is to identity, the better convergence one may expect from the iterative soIver. Once the solution Mx of (25) is obtained, the solution of the original system, i.e., the x vector itself, is easily computed.

Scaling of the linear system is a related concept. Here we construct a system

The matrix of the iteratively solved system becomes thus W-“2AW-“2. In contrast with the one-sided preconditioning, the scaled matrix retains the symmetry characteristics of the original system.

The simplest preconditioning method uses the diagonal part of the system matrix M = diag A. Such preconditioning removes the worst effects of the large variations in the magnitude of diagonal entries, which for diagonally dominant matrices is closely related to a high condition number. A diagonal scaling may be constructed for positive definite matrices with W = diag A as well. Diagonal preconditioning or scaling become less effective as the Reynolds number increases and off-diagonal entries assume importance. The more advanced preconditioning techniques such as the Element-By-Element (EBE) [42], Cluster-Element-By-Element (CEBE) [43, 441 and CEBE /Cluster Companion (CEBE /CC) [40] methods are not discussed here. The iterative implementation used here uses the diagonal scaling exclusively.

6. Parallel impleme~~tion

Implementation notes contained in this section follow the parallel program design outlined in [20]. We restrict the discussion to distributed memory Single Instruction/Multiple Data (SIMD) architectures, such as the Connection Machine range of supercomputers. It should be noted that the latest model in this hardware family is also capable of sustaining the Multiple Inst~ction/MuItip~e Data (MIMI)) type of control flow.

The starting point in the design of a massively parallel implementation is the decision regarding the distribution of variables among the processors of the computer. Since most of massively parallel architectures are distributed memory machines, the placement of the data is of paramount importance and can radically affect the performance of the implementation. We assume that the architecture that is used has a developed concept of virtual processors, so that each of the data entries that are subject to a parallel operation may be assumed to reside on its own processor. In reality, each physical processor takes over the duties of several virtual processors.

We use two primary data storage modes termed ELEM and EQN. The ELEM mode is used for storage of element level data, with one element and its degrees of freedom associated with exactly one virtual

18 M. Behr. T. E. Tezduyar. Finite element solution strategies

processor. On the other hand, the EQN mode will hold variables at the level of the global equation system. These two arrangements are shown schematically in Fig. 10 for a typical subset of a finite element mesh. The nodal data, coordinates, element-level properties and element level matrices are stored in the ELEM mode. The global increment vector, global residual and certain intermediate variables are kept in the EQN form. The mapping between the two datasets is denoted LM : ELEMw EQN. Communication operation ELEM +EQN is called a gather, while movement of the data in the opposite direction, ELEM + EQN, is known as a scatter. The scatter is usually coupled with a combining operation at the destination, such as addition or overwriting. Both gather and scatter may be implemented efficiently on the Connection Machine computers, provided they are done repeatedly with a static communication pattern as defined by LM. In that case, the communication trace may he optimized and saved the first time communication is performed, resulting in extremely fast subsequent gathers and scatters. The general nature of the scatter and gather operations sets no conditions on the connectivity of the mesh, allowing the use of totally unstructured meshes characteristic of the finite element method.

The bulk of the computations in an implicit finite element program will typically come from two stages. One is the formation of the element level matrices and residual vectors. Second is the solution of a linear system of equations, the components of which were constructed in the first stage. Efficient implementation of these two stages on a massively parallel architecture is the most important, as well as difficult, task.

The process of solution of the global linear system used here does not require the assembfy of the global matrix in any form. Instead, it operates on unassembled element level matrices. Therefore, apart from a simple assembly, or scatter, of the global residual vector from element-level residuals, the task of forming the equation system takes place entirely at the element level. Consequently. in the matrix formation phase, no inter-processor communication is involved and the parallelism of the operations

ELEM EQN

$Ni oo[IIo. 0000, 0000

\ processing node data element

Fig. 10. Parallel implementation: element-level (left) and equation-level (right) data storage modes


may be fully exploited. A more detailed description of the matrix formation phase, including a pseudocode fragment, is found in [20].

The solution of the linear system (24) typically consumes most of the computational resources in an implicit finite element program, and is also the most challenging part from the point of view of parallel implementation. The direct solution technique are extremely hard to implement efficiently on massively parallel computers, except for the special cases of, e.g., a uniformly banded matrix with very small bandwidth of 3-5 elements. Such matrices rarely appear in finite element codes. Efforts are underway to create more general parallel direct solvers employing the MIMD paradigm which by nature is more flexible than the SIMD model.

In contrast, the iterative solution techniques have proven significantly more adaptable to parallel architectures. The example in Section 7 employs a data parallel implementation of the GMRES algorithm described previously. The bulk of the computations in the GMRES algorithm shown in Box 3 will involve EQN structures only. These include the set of Krylov vectors (original and preconditioned) and the entries of the diagonal preconditioner. The computationally intensive task of Gramm-Schmidt orthogonalization of the Krylov vectors involves only on-processor operations and communication operations of the scan/reduce type. The latter are performed very efficiently on the architectures used here, being in fact one of the fastest inter-processor communication operations. The only interplay between the ELEM and EQN occurs when the matrix-vector product is to be computed. Such product involves a gather of EQN-level values into the ELEM dataset, followed by a communication-free on-processor matrix-vector product, and finally a scatter back into the EQN data set. The gather and scatter operations utilize the highly optimized Connection Machine Scientific Software Library (CMSSL) available on the CM-200 and CM-S.

In the current implementation, all variables related to the reduced system are stored on the scalar front-end processor, and the factorization and back substitution of that system is also performed in a scalar fashion. For moderate values of the Krylov space dimension m 6 50, it was observed that the solution of the reduced system consumed less than 3% of the time spent on the GMRES iterations, so the use of the comparatively slow front-end processor here is not an impediment.

See also the Ph.D. thesis of Johan [45] for another discussion of data parallel solution techniques and a description of a memory-saving matrix-free GMRES implementation.

It is important to note the rapid advancements in computational speed which are still seen in the universe of parallel computing. Most of the performance measurements quoted in the next section come from either a CM-200 computer, or an early version of CM-5. These early CM-5 machines are now being equipped with Vector Execution Units (VEU), which dramatically improve the pace of floating point operations. We have compared the speed of the VEU version of CM-5 to that of a CM-200 in a preliminary three-dimensional computation of an incompressible flow past a sphere at Reynolds number

Fig.11. Flow past a sphere: mesh surface for the benchmark problem.


100. Figure 11 shows the surface portions of the mesh for this problem, which uses 16 384 elements and 134 542 unknowns. On a per-node basis, we have obtained a speed-up of over 19 for the computational-

ly intensive system formation stage, and of 7 for the communication intensive GMRES stage. This resulted in an overall speed-up by a factor of almost 9. Note that ‘node’ refers to either a Weitek

floating point unit on CM-200, or a 4-VEU processing node of a CM-5. It is also important to realize, that while the communication libraries on the CM-200 were subjected to optimization efforts lasting several years, the development of their CM-S/VEU counterparts has just began. These results are

based upon a test version of the Thinking Machines Corporation software where the emphasis was on providing functionality and the tools necessary to begin testing the CM-5 vector units. This software

release has not had the benefit of optimization or performance tuning and, consequently, is not necessarily representative of the performance of the full version of this software.

7. Numerical example: sloshing in a tank subjected to vertical excitation

In this section, we perform a three-dimensional simulation of sloshing in a tank subjected to external

excitation. The objective of this exercise is to assess the capabilities of the space-time finite element

formulation as a method for handling deforming domain problems. Experimental and theoretical evidence [46, 471 indicates the existence of multiple solution branches when the horizontal cross-section of the tank is nearly square. Depending on the excitation frequency, the competing wave modes

interact generating complex periodic, as well as chaotic, wave behavior. The particular case considered here is based on the experiment performed by Feng and Sethna [47].

Water fills a container with a rectangular cross-section. The initial dimensions of the domain are

W X H X D with W= 0.1778 m (7 in), H = 0.18034 m (7.1 in) and D = 0.127 m (5 in). Side and bottom boundaries allow slip in the direction tangent to the surface. The top boundary is left free, and moves

with the normal component of the fluid velocity at the surface. The normal directions are computed using the consistent normal approach of Engelman et al. [48]. The external forces acting on the fluid

consist of a constant gravitational acceleration of magnitude g = 9.81 m ss2 and of a sinusoidal vertical excitation Ag sin ot with w = 2nf, f = 4.00 Hz and A such that the amplitude of the oscillations remains

at 1 mm. The problem is nondimensionalized, with the dimensional scales chosen so that the X- dimension length %’ and vertical acceleration g become unity. The nondimensional time step is

At”= 0.093, leading to 20 time steps per excitation period. At each time level, the domain is discretized using 7056 nodes and 6000 hexahedral trilinear elements. The space-time mesh for each time slab

consists of 14 112 nodes and 6000 quadrilinear four-dimensional brick elements. A GMRES solver with diagonal scaling was used to solve the linear system of equations at each

nonlinear iteration. Krylov space size of 40 was chosen, and maximum number of outer GMRES

iterations was 5 initially, and 10 for larger fluid motions. At each time step. an average of 3 nonlinear iterations were needed. The simulations were done in 64-bit precision on the CM-200 computer.

The computations start with a perturbed solution which has been obtained by carrying out the

simulation for 20 time steps (one excitation period), with additional force components applied in the x- and y-directions. These components have the same amplitude and frequency as the vertical one. At the end of the perturbation period, the surface of the fluid becomes deflected by about 7% from the original flat position. The additional forces are removed, and motion of the fluid continues influenced by the vertical excitation only. To get an idea of how much of the wave motion is contributed by the two competing modes (1,O) and (0, l), we show the wave height reading at points (x, y) = (W, H/2)

and (x, y) = (W/2, H). These points correspond to the location of probes in the experiment of Feng et al. [47], and are referred to as Al and A2, respectively. The Al amplitude identifies the (1, 0)-mode component and amplitude at A2 indicates the strength of the (0, I)-mode. The time histories of these two amplitudes are shown in Figs. 12 and 13. Even though the perturbation exhibits no preference for either of the two competing modes. i.e., (0,l) and (l,O), after its removal, we witness the growth of the (0, I)-mode.

A period of the wave motion is shown in Fig. 14. The elevation plots are viewed from the negative x-direction. In the z-direction, the visible bounding box extends from 0.5 to 0.9. Here the (0. l)-mode

M. Be&-, T.E. Tezduyar, Finite element solution strategies 21

0 20 40 60 80 100 120

Fig. 12. Vertically excited tank: time history of the wave height at point Al.

I i . . .,....._,_.,.. _.; .,.,.:_ ..,.

0 _ 5 -. _i ..‘....... i... :. < ._. _... i_ I

0.4 I , I I t 0 20 40 60 80 100 120

Fig. 13. Vertically excited tank: time history of the wave height at point A2.

is dominant, with a small, but non-zero, (l,O)-mode component. The two competing modes have a slight phase difference, leading to a small ‘rotating’ effect. The experimental data predicts the presence of a mixed mode (called MS1 in [47]) at this frequency, which contains a large (0, 1)-mode component and a small (1, 0)-mode component.

Performance tests were conducted for the space-time implementation, with the sloshing problem serving as a benchmark, yielding the following results. On the 32K-processor CM-200 computer, the formation of the element level matrices and residuals took place at 800 million 64-bit floating point operation per second (megaflops). The GMRES solution phase speed was observed as 872 megaflops. For a three-dimensional space-time formulation, the bulk of time taken by the GMRES solution process is consumed by the matrix-vector product. As discussed in Section 6, this product involves three components: gather, on-processor matrix vector multiplication and scatter. Only the second component contributes to the floating point operation count. The ratio of the imputation time consumed by the three stages was found to be 2 : 1: 2. Thus the 872 megaflops speed for the entire GMRES solver indicates that the on-processor matrix vector product is being computed near its peak speed of 5 gigaflops. The performance of the entire code, including the parallel input/output operations and problem setup, was 794 megafiops. As the speeds of the two major code parts, i.e., the matrix

22 M. Behr. T. E. Tezduyar, Finite element solution strategies

! I i

Fig. 14. Vertically excited tank: free surface view at I ~71.29, 71.76. 72.22 (top row), t=72.69, 73.15, 73.61 (middie row), t = 74.08, 74.54, 75.00 (bottom row).

formation stage and the GMRES solution stage, are comparable, the overall performance of the program should not be sensitive to the choice of parameters such as the number of outer or nonlinear iterations. Nevertheless, such sensitivity may exist in other situations.

8. Conclusions

We outlined the stabilized space-time velocity-pressure formulation which is applicable to problems involving moving boundaries and interfaces. In an effort to communicate our philosophy regarding the movement of the mesh, we discussed several distinct mesh moving schemes. Subsequently, we concentrated on the solution methods for the large linear systems of equations which arise from finite element dicretizations. Noting the continuing need for direct solution techniques, we entrusted the large scale simulations to the GMRES iterative method. The simulation of three-dimensional phenomena places a big strain on computing resources, and the efficient implementation of the finite element methods on the latest generation of computers becomes an important issue. To this end, we successfully employ massively parallel CM-200 and CM-5 machines, in conjunction with arbitrary unstructured linite element meshes. Here, we discussed some of the programming issues which arise from the use of distributed memory hardware. Finally, we presented the results from a simulation, which involved three-dimensional sloshing phenomena in a vertically excited tank.

References

[l] F.H. Harlow, J.E. Welch, J.P. Shannon and B.J. Daly, The MAC method, Report LA-3425, Los Alamos Scientific Laboratory, 1965.

]2] B.J. Daty, A technique for including surface tension effects in hydrodynamics calculations, J. Comput. Phys. 4 (1969) 97- 117.

[3] R.K.C. Chan and R.L. Street, A computer study of finite-amplitude water waves, J. Comput. Phys. 6 (1970) 68-94. [4] C.W. Hirt and B.D. Nichols, Volume of Fluid (VOF) method for the dynamics of free boundaries, J. Comput. Phys. 39

(1981) 201-225.


[5] C.W. Hirt, J.L. Cook and T.D. Butler, A Lagrangian method for calculating the dynamics of an incompressible fluid with free surface, J. Comput. Phys. 5 (1970) 103-124.

[6] T. Okamoto and M. Kawahara, Two-dimensional sloshing analysis by Lagrangian finite element method, Internat. J. Numer. Methods Fluids, 11 (1990) 453-477.

[7] C.W. Hirt, A.A. Amsden and J.L. Cook, An arbitrary Lagrangian Euierian computing method for all flow speeds, J. Comput. Phys. 14 (1974) 227-253.

[8] T.J.R. Hughes, W.K. Liu and T.K. Zimmermann, Lagrangian-Eulerian finite element formulation for incompressible viscous flows, Comput. Methods Appl. Mech. Engrg. 29 (1981) 329-349.

[9] A. Huerta and W.K. Liu, Viscous flow with large free surface motion, Comput. Methods Appl. Mech. Engrg. 69 (1988) 277-324.

[lo] A. Soulaimani, M. Fortin, G. Dhatt and Y. Ouellet, Finite element simulation of two- and three-dimensional free surface tlows, Comput. Methods Appl. Mech. Engrg. 86 (1991) 265-296.

(111 P. Jamet and R. Bonnerot, Numerical solution of the Eulerian equations of compressible flow by a finite element method which follows the free boundary and the interfaces, J. Comput. Phys. 18 (1975) 21-45.

(121 R. Bonnerot and P. Jamet, Numerical computation of the free boundary for the two-dimensional Stefan problem by space-time finite elements, J. Comput. Phys. 25 (1977) 163-181.

1131 D.R. Lynch and W.G. Gray, Finite element simulation of flow in deforming regions, J. Comput. Phys. 36 (1980) 135-153. [14] C.S. Frederiksen and A.M. Watts, Finite-element method for time-de~ndent incompressible free surface flows, J. Comput.

Phys. 39 (1981) 282-304. [15] P. Jamet, Galerkin-type approximations which are discontinuous in time for parabolic equations in a variable domain, SIAM

J. Numer. Anal. 15 (1978) 912-928. [16] T.J.R. Hughes and G.M. Hulbert, Space-time finite element methods for elastodynamics: Formulations and error estimates,

Comput. Methods Appl. Mech. Engrg. 66 (1988) 339-363. [17] P. Hansbo and A. Szepessy, A velocity-pressure streamline diffusion finite element method for the incompressible

Navier-Stokes equations, Comput. Meth~s Appt. Mech. Engrg. 84 (1990) 175-192. [lS] T.E. Tezduyar, M. Behr and J. Liou, A new strategy for finite element computations involving moving boundaries and -.

interfaces --The deforming-spatial-domain/space-time procedure: I. The concept and the preliminary numerical tests, Comput. Methods Appl. Mech. Engrg. 94 (1992) 339-351.

]I91

]201

WI

La

f231

v41

[251

WI

2271

WI

1291

[301

[311

T.E. Tezduyar, M. Behr, S. Mittal and J. Liou, A new strategy for finite element computations involving moving boundaries and interfaces - The deforming-spatial-domain/space-time procedure: II. Computation of free-surface flow, two-liquid flows, and flows with drifting cylinders, Comput. Methods Appi. Mech. Engrg. 94 (1992) 353-371. M. Behr, A. Johnson, J. Kennedy, S. Mittal and T.E. Tezduyar, Computation of incompre~ible flows with implicit finite element implementations on the Connection Machine, Comput. Methods Appl. Mech. Engrg. 108 (1993) 99-118. T.E. Tezduyar, M. Behr, S. Mittal and A.A. Johnson, Computation of unsteady incompressible flows with the finite element methods - Space-time formulations, iterative strategies and massively parallel implementations, in: New Methods in Transient Analysis, PVP, Vol. 246 (ASME, New York, 1992) 7-24. S. Mittal and T.E. Tezduyar, A finite element study of incompressible flows past oscillating cylinders and airfoils. Internat. J. Numer. Methods Fluids 15 (1992) 1073-1118. S. Mittal, Stabilized space-time finite element fo~ulations for unsteady incompressible flows involving fluid-body interactions, PhD thesis, University of Minnesota, 1992. P. Hansbo, The characteristic streamline diffusion method for the time-dependent incompressible Navier-Stokes equations, Comput. Methods Appi. Mech. Engrg. 99 (1992) 171-186. S. Aliabadi and T.E. Tezduyar, Space-time finite element computation of compressible flows involving moving boundaries and interfaces, Comput. Methods Appi. Mech. Engrg. 107 (1993) 209-223. J.P. Benque, A. Haugei and P.L. Viollet, Numerical methods in environmental fluid mechanics, in: M.B. Abbott and J,A. Cunge, eds., Engineering Applications of Computational Hydraulics, Vol. II (Pitman, London, 1982). A.N. Brooks and T.J.R. Hughes, Streamline upwind/Petrov-Galerkin formulations for convection dominated flows with particular emphasis on the incompressible Navier-Stokes equations, Comput. Methods Appl. Mech. Engrg. 32 (1982) 199-259. T.J.R. Hughes, L.P. Franca and M. Balestra, A new finite element formulation for computational fluid dynamics: V. Circumventing the BabuSka-Brezzi condition: A stable Petrov-Galerkin formulation of the Stokes problem accommodating equal-order inte~olations, Comput. Methods in Appl. Mech. Engrg. 59 (1986) 85-99. T.E. Tezduyar, S. Mittai, S.E. Ray and R. Shih, In~mpressible flow computations with stabilized bilinear and linear equal-order-interpolation velocity-pressure elements, Comput. Methods Appl. Mech. Engrg. 95 (1992) 221-242. L.P. Franca and T.J.R. Hughes, Two classes of mixed finite element methods, Comput. Methods Appl. Mech. Engrg. 69 (1988) 89-129. M. Behr, L.P. Franca and T.E. Tezduyar, Stabilized finite element methods for the velocity-pressure-stress formulation of incompressible flows, Comput. Methods Appl. Mech. Engrg. 104 (1993) 31-48.

[32] L.P. Franca and S.L. Frey, Stabilized finite element methods: II. The incompressible Navier-Stokes equations, Comput. Methods Appl. Mech. Engrg. 99 (1992) 209-233.

1331 F. Hecht and E. Satei, Emc’ un iogiciel d’edition de maiilages et de contours bidimensionneis, RT no. 118, INRIA, 1990. [34] T.J.R. Hughes, The Finite Element Method. Linear Static and Dynamic Finite Element Analysis (Prentice Hall, Englewood

Cliffs, NJ 1987).


[3S] E. Cuthill and J. McKee, Reducing the bandwidth of sparse symmetric matrices, in: Proc. 24th National Conf. ACM, Vol. P-69 (1969) 157-172.

(361 J.A. George, Computer implementation of the finite element method. Technical Report STAN-CS-71-208, Computer Science Department, Stanford University. Stanford, California, 1971.

(371 D.M. Young, A historical overview of iterative methods, Comput. Phys. Comm. 53 (1989) 1-17. [38] Y. Saad and M. Schultz, GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM

J. Sci. Statist. Comput. 7 (1986) 856-869. [39] Y. Saad, A flexible inner-outer preconditioned GMRES algorithm, SIAM J. Sci. Statist. Comput. 14 (1993) 461-469. (401 T.E. Tezduyar, M. Behr, S.K. Abadi. S. Mittal and S.E. Ray, A new mixed preconditioning method for finite element

computations, Computer Methods Appl. Mech. Engrg. 99 (1992) 27-42. [41] Y. Saad, Krylov subspace methods for solving large unsymmetric linear systems, Math. Comp. 37 (1981) 105-126. [42] T.J.R. Hughes, J. Winget, I. Levit and T.E. Tezduyar, New alternating direction procedures in finite element analysis based

upon EBE approximate factorizations, in: S. Atluri and N. Perrone, eds., Recent Developments in Computer Methods for Nonlinear Solid and Structural Mechanics (ASME, New York, 1983) 75-109.

[43] J. Liou and T.E. Tezduyar. A clustered element-by-element iteration method for finite element computations, in: R. Glowinski et al., eds., Domain Decomposition Methods for Partial Differential Equations (SIAM. Philadelphia, PA. 1991) 140-150.

1441 J. Liou and T.E. Tezduyar, Clustered element-by-element computatj~~ns for fluid flow. in: Horst D. Simon, ed.. Parallel Computational Fluid Dynamics - Implementation and Results, Scientific and Engineering Computation Series (MIT Press. Cambridge, MA, 1992) 167-187.

(451 Z. Johan. Data Parallel Finite Element Techniques for Large-Scale Computational Fluid Dynamics. PhD thesis. Stanford University, 1992.

(461 F. Simonelli and J.P. Gollub. Surface wave mode interactions: Effects of symmetry and degeneracy. J. Fluid Mech. I99 (1989) 472-494.

[47] Z.C. Feng and P.R. Sethna, Symmetry-breaking bifurcations in resonant surface waves, J. Fluid Mech. I99 (1989) 495-518. 1481 M.S. Engelman, R.L. Sani and P.M. Gresho, The implementation of normal and/or tangential boundary conditions in finite

element codes for incompressible fluid flow. Internat. J. Numer. Methods Fluids 2 (1982) 225-238.

finite element solution strategies for large-scale flow ... · pdf filefinite element solution...

Documents