[american institute of aeronautics and astronautics 20th aiaa/ceas aeroacoustics conference -...

23
Performance improvements and new solution strategies of Actran TM for nacelle simulations. Bernard Van Antwerpen * Yves Detandt Diego Copiello Eveline Rosseel § and Eloi Gaudry Free Field Technologies www.fft.be Axis Park Louvain-la-Neuve, Rue E.Francqui 9, B1435 Mont-St-Guibert, Belgium Following two papers 1, 2 , new advances now allow reaching higher frequencies within a smaller amount of time. The Finite Element Code whose performance is benchmarked in this paper is a specific software dedicated to solve the acoustic propagation and radiation of turbo machinery noise. The code allows the injection of duct modes representing the turbo machinery noise, and computes their propagation and free field radiation in presence of a non-uniform mean flow. The modal duct basis enables the handling of reflected modes. In addition, acoustic treatments in the presence of a mean flow are accurately accounted for using the classical Myers Boundary Condition. 6 Such treatments can be represented using admittance boundary conditions, or alternatively using transfer admittances to cou- ple with treated back cavities as done for non-locally reacting acoustic treatments. The code has been already presented and validated against theoretical results 7, 8 , and against measurements acquired in fan rigs 9 or detected on a real engine during ground tests. 10 This paper presents the improvements and different acoustic solution strategies improv- ing the performance of such computations. All improvements and solution strategies are demonstrated on a representative model of a nacelle intake including acoustic treatment and in realistic flow conditions (approach). All improvements are shown while insuring a constant or comparable accuracy. The performance improvements concern both the code and the computational input (mesh type, interpolation order, ). This paper is divided into 3 sections. The first section reviews the influence on the existing solution of different meshing strategies (linear versus quadratic elements, usage of hybrid meshes). Both the influences on the accuracy of the results as on the performance improvements are assessed. The second section reviews different performance improvements brought to the computational software without affecting the accuracy of the results. The last section reviews the new implemented solution strategies and their influence both on accuracy and performance. During the last 3 years and in particular in the context of the development of Actran 15, many different alternatives have been implemented in order to improve the global ef- ficiency of the computations. The integration of the MKL PARDISO 5 solver offers an interesting alternative to the MUMPS solver. The difference in terms of memory con- sumption, scalability and performance between the two solvers will be assessed. This MKL PARDISO solver is currently not available using MPI parallelism, but offers an interest- ing multithreading capability that is compared with the matrix parallelism capability of the MUMPS solver. The efficiency of the matrix parallelism available with the MUMPS implementation will be compared using different strategies, such as centralized versus dis- tributed matrix analysis or using different reordering tools. The influence of the different reordering tools used by both algebraic solvers on memory consumption and efficiency will be compared. The integration of new BLAS libraries used by the different algebraic solvers and their efficiency in multithreaded computations on different architectures will be eval- uated. Finally, different performance improvements brought on matrix assembly will be evaluated in comparison with previous revisions. * Validation and QA manager, Free Field Technologies. Aeroacoustics Technology Manager, Free Field Technologies Senior Application engineer, Free Field Technologies § Senior Product Development Engineer, Free Field Technologies Senior Product Development Engineer, Free Field Technologies 1 of 23 American Institute of Aeronautics and Astronautics Downloaded by STELLENBOSCH UNIVERSITY on October 9, 2014 | http://arc.aiaa.org | DOI: 10.2514/6.2014-2315 20th AIAA/CEAS Aeroacoustics Conference 16-20 June 2014, Atlanta, GA AIAA 2014-2315 Copyright © 2014 by Free Field Technologies, MSC Software Company. Published by the American Institute of Aeronautics and Astronautics, Inc., with permission. AIAA Aviation

Upload: eloi

Post on 24-Feb-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [American Institute of Aeronautics and Astronautics 20th AIAA/CEAS Aeroacoustics Conference - Atlanta, GA ()] 20th AIAA/CEAS Aeroacoustics Conference - Performance improvements and

Performance improvements and new solution strategies

of Actran TM for nacelle simulations.

Bernard Van Antwerpen∗ Yves Detandt † Diego Copiello ‡ Eveline Rosseel § and Eloi Gaudry ¶

Free Field Technologies www.fft.be

Axis Park Louvain-la-Neuve, Rue E.Francqui 9, B1435 Mont-St-Guibert, Belgium

Following two papers1,2 , new advances now allow reaching higher frequencies within asmaller amount of time. The Finite Element Code whose performance is benchmarked inthis paper is a specific software dedicated to solve the acoustic propagation and radiationof turbo machinery noise. The code allows the injection of duct modes representing theturbo machinery noise, and computes their propagation and free field radiation in presenceof a non-uniform mean flow. The modal duct basis enables the handling of reflected modes.In addition, acoustic treatments in the presence of a mean flow are accurately accountedfor using the classical Myers Boundary Condition.6 Such treatments can be representedusing admittance boundary conditions, or alternatively using transfer admittances to cou-ple with treated back cavities as done for non-locally reacting acoustic treatments. Thecode has been already presented and validated against theoretical results7,8 , and againstmeasurements acquired in fan rigs9 or detected on a real engine during ground tests.10

This paper presents the improvements and different acoustic solution strategies improv-ing the performance of such computations. All improvements and solution strategies aredemonstrated on a representative model of a nacelle intake including acoustic treatmentand in realistic flow conditions (approach). All improvements are shown while insuring aconstant or comparable accuracy. The performance improvements concern both the codeand the computational input (mesh type, interpolation order, ). This paper is divided into 3sections. The first section reviews the influence on the existing solution of different meshingstrategies (linear versus quadratic elements, usage of hybrid meshes). Both the influenceson the accuracy of the results as on the performance improvements are assessed. Thesecond section reviews different performance improvements brought to the computationalsoftware without affecting the accuracy of the results. The last section reviews the newimplemented solution strategies and their influence both on accuracy and performance.

During the last 3 years and in particular in the context of the development of Actran15, many different alternatives have been implemented in order to improve the global ef-ficiency of the computations. The integration of the MKL PARDISO5 solver offers aninteresting alternative to the MUMPS solver. The difference in terms of memory con-sumption, scalability and performance between the two solvers will be assessed. This MKLPARDISO solver is currently not available using MPI parallelism, but offers an interest-ing multithreading capability that is compared with the matrix parallelism capability ofthe MUMPS solver. The efficiency of the matrix parallelism available with the MUMPSimplementation will be compared using different strategies, such as centralized versus dis-tributed matrix analysis or using different reordering tools. The influence of the differentreordering tools used by both algebraic solvers on memory consumption and efficiency willbe compared. The integration of new BLAS libraries used by the different algebraic solversand their efficiency in multithreaded computations on different architectures will be eval-uated. Finally, different performance improvements brought on matrix assembly will beevaluated in comparison with previous revisions.

∗Validation and QA manager, Free Field Technologies.†Aeroacoustics Technology Manager, Free Field Technologies‡Senior Application engineer, Free Field Technologies§Senior Product Development Engineer, Free Field Technologies¶Senior Product Development Engineer, Free Field Technologies

1 of 23

American Institute of Aeronautics and Astronautics

Dow

nloa

ded

by S

TE

LL

EN

BO

SCH

UN

IVE

RSI

TY

on

Oct

ober

9, 2

014

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/6

.201

4-23

15

20th AIAA/CEAS Aeroacoustics Conference

16-20 June 2014, Atlanta, GA

AIAA 2014-2315

Copyright © 2014 by Free Field Technologies, MSC Software Company. Published by the American Institute of Aeronautics and Astronautics, Inc., with permission.

AIAA Aviation

Page 2: [American Institute of Aeronautics and Astronautics 20th AIAA/CEAS Aeroacoustics Conference - Atlanta, GA ()] 20th AIAA/CEAS Aeroacoustics Conference - Performance improvements and

Besides HPC improvements, different alternative strategies for solving nacelle compu-tations are presented and evaluated in this paper. A first fully automated strategy fordecomposing symmetric models into symmetric and anti-symmetric solutions allows to de-crease the model size by a factor 2 without affecting the accuracy. The handling of thedecomposition on modal ducts representations in this context is presented. Finally, theusage of Perfectly Matched Layer techniques for convected wave propagation will be in-troduced, allowing to decrease the global size of the handled models and to suppress therequirement for convergence analyses inherent to the usage of infinite elements.

The evaluation of the above mentioned alternative strategies for solving nacelle com-putations will be performed on a basic nacelle geometry featuring all the characteristicspresent in such applications. Specifically, the nacelle will be computed by considering the”Approach” certification condition prescribed by Far 36 since typically this is the mostdemanding condition for nacelle inlets from an acoustic point of view. The ”flight test”in this condition will be investigated. To impose this condition, the Mach number at thefan face is assumed equal to 0.35 whereas the nacelle is assumed to be flying at 70m/s(i.e. at the far field the flow speed is set to a constant value of 70m/s). The geometryselected corresponds to a middle size aircraft. The fan internal and external diametersare 0.46m and 0.85m respectively. Moreover, to increase the complexity of the simulationsa scarf angle of 3 is considered in the geometry and a spliced acoustic liner is appliedat the inner barrel. All above mentioned improvements will be tested at the first bladepassing frequency which is assumed to be 2.5kHz. The final objective will be to increasestepwise the computational frequency using the retrieved guidelines up to the second bladepassing frequency, 5kHz. This should allow to insure the validity of the retrieved efficiencyindicators on smaller models.

This paper is an occasion to review classic computation guidelines. The perspective ofthis paper is to enlarge the frequency range in which calculations on realistic models canbe made. The final goal is to reach higher frequencies on an adapted HPC environment.

I. Short description of the code

A. Governing equations

Let us consider acoustic perturbations propagating on top of an irrotational homentropic mean flow in anunbounded region Ω, external to a region Γ = Γfan ∪Γsoft ∪Γhard. In the context of turbofan nacelle, Γfanis the fan face, Γsoft is the liner region, and Γhard is a solid boundary. The unbounded region Ω is dividedinto an inner region, Ωi, discretised using finite elements and an outer region, Ωo, discretised with infiniteelements. The problem is governed by the convected wave equation

D0

Dt

(ρ0

a20

D0φ

Dt

)−∇ · (ρ0∇φ) = 0 , (1)

where φ, a0, ρ0 and v0 denote the acoustic velocity potential, the local speed of sound, the local density andthe mean flow velocity, respectively. The operator D0/Dt is defined as

D0

Dt=

∂t+ v0 · ∇ . (2)

The problem is solved in the frequency domain assuming that the potential is harmonic

φ(x, t) = Re(φ(x, ω) eiωt

), (3)

where ω = 2πf and f is the frequency. The acoustic pressure, p, is related to φ through (see Eversman,11

for example)

p = −ρ0

(iωφ+ v0 · ∇φ

). (4)

Equation (1) is solved in Ωi with a finite element method (FEM), and in Ωo with an infinite elementmethod. The infinite element method implemented in ACTRAN is an extension of a variable order Legendrepolynomial formulation whose numerical performances have been extensively studied.7,8 The description ofthis extension for the convected case is referenced in the actran manual3 for example.

2 of 23

American Institute of Aeronautics and Astronautics

Dow

nloa

ded

by S

TE

LL

EN

BO

SCH

UN

IVE

RSI

TY

on

Oct

ober

9, 2

014

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/6

.201

4-23

15

Page 3: [American Institute of Aeronautics and Astronautics 20th AIAA/CEAS Aeroacoustics Conference - Atlanta, GA ()] 20th AIAA/CEAS Aeroacoustics Conference - Performance improvements and

B. Boundary conditions

1. Radiation boundary condition

The acoustic problem must satisfy the Sommerfeld radiation condition at large distance from the inlet. Thisis enforced through the use of infinite elements. They are based on the multipole expansion of the solution ofEq. (1) for a uniform mean flow. The order of the expansion directly governs the accuracy of the boundarycondition enforcement. More details on the numerical implementation can be found in the Actran manual.3

2. Admittance boundary condition

If acoustic liners are present, they are accounted for with a normal admittance boundary condition (alsocalled soft-wall boundary condition) applied on Γsoft, which relates the wall normal velocity, iωun, to theacoustic pressure:

iωun = A p . (5)

The relation between the normal wall displacement, un, and the normal acoustic velocity in the presence offlow is:6

vn = iωun + v0 · ∇un − un n · (n · ∇)v0 . (6)

How to combine Eqs. (5) and (6) in a finite element model is detailed in several documents.3,11

3. Fan face boundary condition

At the fan face, Γface, the acoustic potential is expressed in terms of duct modes. For an annular duct, wehave that

φ(r, θ, z) =∑m,n

(Jm(krmnr) + CYm(krmnr))

eimθ(A+mne

−ik+zmnz +A−mne

−ik−zmnz) ,

(7)

where Jm and Ym are Bessel and Neumann functions of order m, respectively, krmn is the radial wavenumber,k+zmn and k−zmn are longitudinal wavenumbers (associated to the so-called incident and reflected modes,

respectively).The values of krmn and C are such that Eq. (7) satisfies the hard wall boundary condition, vn = 0. In

this approach, incident duct modes amplitudes are specified, and acoustic perturbations propagating fromthe nacelle aperture back to the fan face can pass through Γfan without generating back any perturbation.They correspond to reflected modes whose amplitude is found are part of the discrete solution.

II. Strategy of inlet modelling

A. Studied model

The numerical model exploited in this paper is a basic nacelle geometry featuring all the characteristicspresent in such applications. Specifically, the nacelle is considered at the ”Approach” certification conditionprescribed by FAR36 since this is typically the most demanding condition for nacelle inlets from an acousticpoint of view. To impose this condition, the Mach number at the fan face is assumed equal to 0.35 whereasthe nacelle is assumed to be flying at 70m/s (i.e. at the far field the flow speed is set to a constant valueof 70m/s). The geometry selected corresponds to a middle size aircraft. The fan internal and externaldiameters are 0.46m and 0.85m respectively. Moreover, to increase the complexity of the simulations a scarfangle of 3 is considered in the geometry and a spliced acoustic liner is applied at the inner barrel. All thesefeatures are depicted in Figure 1.

B. Modelling strategy

The simulation chain is composed of two main steps: the flow field computation and the acoustic radiation.This is schematically depicted in Figure 2.

3 of 23

American Institute of Aeronautics and Astronautics

Dow

nloa

ded

by S

TE

LL

EN

BO

SCH

UN

IVE

RSI

TY

on

Oct

ober

9, 2

014

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/6

.201

4-23

15

Page 4: [American Institute of Aeronautics and Astronautics 20th AIAA/CEAS Aeroacoustics Conference - Atlanta, GA ()] 20th AIAA/CEAS Aeroacoustics Conference - Performance improvements and

Figure 1. Acoustic model.

Figure 2. Actran TM simulation chain.

In this paper, the flow field is computed with the compressible flow solver available in the Actran softwaresuite. Specifically, the solver computes the mean flow field for a compressible, stationary and irrotationalflow, considering the air a perfect gas with constant heat capacity. Moreover, the far field condition is set ata distance of about 18 times the nacelle inlet diameter.

The computed mean flow gives important data needed for the correct acoustic modeling. Indeed, theacoustic mesh shall be locally refined in order to correctly catch the local wavelength. More detailed is pro-vided in the next section. Moreover, the infinite elements used for the far field propagation require an almostconstant velocity at their location. The analysis of the flow field allowed to identify a spherical region witha maximum deviation of the local velocity from the mean flow field at this sphere of about 10 %. Therefore,this spherical region can be used as far field boundary for the acoustic model as depicted in Figure 1. Itis worth to note that the 10 % deviation is mainly present at angles above 140 from the inlet axis andthus affecting mainly the propagation at these angles. Nevertheless, the propagation at such angles is notconsidered in this paper. Indeed, in authors’ knowledge, the propagation at angles above 120 from the inletaxis are never considered for nacelle inlet acoustic design.

4 of 23

American Institute of Aeronautics and Astronautics

Dow

nloa

ded

by S

TE

LL

EN

BO

SCH

UN

IVE

RSI

TY

on

Oct

ober

9, 2

014

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/6

.201

4-23

15

Page 5: [American Institute of Aeronautics and Astronautics 20th AIAA/CEAS Aeroacoustics Conference - Atlanta, GA ()] 20th AIAA/CEAS Aeroacoustics Conference - Performance improvements and

Finally, once the acoustic mesh is set up, in order to finalize the acoustic model, the flow field must beinterpolated into the acoustic mesh. This is done by using the ICFD utility available in the Actran softwaresuite. The interpolated flow field is depicted in Figure 3.

Figure 3. Flow field computed by means of the Actran compressible flow solver and interpolated into theacoustic mesh.

III. Meshing strategy and influence

A. Introduction

The previous paper2 concerning the performance of the software already covered the question of meshingstrategy and influence. It had concluded that the choice of quadratic meshes instead of linear is preferablein terms of memory consumption, computational performance and accuracy of the solution. The usage ofhybrid meshes made of a combination of a hexa core linked to a tetrahedral layer through pyramidal elementsalso provided a slight improvement. This section tends to provide an accurate methodology to generate avalid mesh and to compare the different strategies of interest at higher frequencies.

All models investigated in this first section are corresponding to the nacelle inlet in approach condition.Only a symmetric model will be analyzed, computed at 1500Hz.

B. Methodology

The generation of the different meshes are based on the model and strategy described in Section II. Thismeans that the boundary of the volumes to be generated remains constant. In this study, only unstructuredmeshes are handled, which can be produced in various commercial softwares. Structured meshes provide anincreased accuracy and tend to be more optimal in number of elements, but they require an important effortto produce and are thus not considered. In order to compare the different meshing strategies equivalently,these will be generated using the same meshing tool using equivalent meshing conditions:

• Convected wavelength is accounted for;

• 8 grids points should be specified per convected wavelength. This induces 4 quadratic elements perconvected wavelength or 8 linear elements;

• Only a restrictive grow ratio of 1.1 is introduced in all algorithms;

5 of 23

American Institute of Aeronautics and Astronautics

Dow

nloa

ded

by S

TE

LL

EN

BO

SCH

UN

IVE

RSI

TY

on

Oct

ober

9, 2

014

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/6

.201

4-23

15

Page 6: [American Institute of Aeronautics and Astronautics 20th AIAA/CEAS Aeroacoustics Conference - Atlanta, GA ()] 20th AIAA/CEAS Aeroacoustics Conference - Performance improvements and

• Conservative tetrahedral and triangular algorithms are used, respecting the provided element sizes;

• All configurations are compared with a converged element size (for higher frequencies), with an accep-tance of 0.1dB of difference.

In practice the model is divided in 5 different sections, corresponding for the approach condition todifferent thermodynamic situations. For each section, the most critical convection point is accounted forcomputing the convected wavelength, which is a conservative assumption. In practice, the following sectionsare thus derived:

• The fan entry;

• The inner barrel and the lip;

• The lip of the inlet;

• The surrounding acoustic environment;

• The far field condition.

The convective wavelength is computed for each section and each frequency of interest individually.Accounting that the total speed of sound ct and total fluid density ρt are assumed constant and equal to340ms and 1.225 kg

m3 , the local static speed of sound cs can be computed using:

cs =

√c2t −

(γ − 1) ∗ v2

2(8)

where v corresponds to the local maximal mean velocity. Based on the local static speed of sound, thelocal convected wavelength is thus assumed being:

λ =(cs − v)

freq(9)

C. Models

For the handled configuration, 5 different mesh types will be investigated:

• Linear tetrahedral mesh;

• Quadratic tetrahedral mesh;

• Quadratic tetrahedral mesh with alternative less conservative meshing algorithm;

• Quadratic hybrid mesh with a hexa core and a mix of pyramidal and tetrahedral elements to fill in theborders, involving only the exterior section;

• Quadratic hybrid mesh including the fan entry.

The same meshing requirements are imposed on the different meshing algorithms, both in terms ofelement size and aspect ratio. The different obtained meshes are shown in Figure 4.

D. Results

The different models are first compared in terms of accuracy. For this purpose, the sound pressure level(SPL), expressed in dB is plotted along the symmetry plane over an angle of 120 degrees, at a distanceof 15m. All handled models are computed at 1500Hz, and compared to a converged model, using a meshgenerated for 2000Hz.

The results for the plane wave and the azimuthal order 19 for all models are shown in Figure 5 and 6respectively. Results obtained using a linear mesh can be considered as not converged, since a differencehigher than 0.5dB is observed along the directivity plot. All other quadratic models however show a goodconvergence on both excitations, which indicates the selected meshing rule of 4 quadratic elements per con-vected wavelength is sufficiently accurate in far field. For both loadcases, the different quadratic models

6 of 23

American Institute of Aeronautics and Astronautics

Dow

nloa

ded

by S

TE

LL

EN

BO

SCH

UN

IVE

RSI

TY

on

Oct

ober

9, 2

014

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/6

.201

4-23

15

Page 7: [American Institute of Aeronautics and Astronautics 20th AIAA/CEAS Aeroacoustics Conference - Atlanta, GA ()] 20th AIAA/CEAS Aeroacoustics Conference - Performance improvements and

Figure 4. Different mesh types: linear tetrahedrons, quadratic tetrahedrons, second quadratic algorithm,hybrid covering the exterior field only, hybrid mesh covering the entire model.

are ranging within 0.1dB of the reference models at the peak directivity. The coarser tetrahedral and bothhybrid models show a lower convergence compared to the reference, but are still in an acceptable range.

Figure 5. Result comparison between the different models for a plane wave excitation

The computational performance for each handled model is shown in Figure 7. The comparison of linearand quadratic models shows that, for an equivalent number of grids or degrees of freedom, the overall compu-tational time is longer for a linear model. Accounting the lack of convergence obtained with linear elements,these should not be recommended. The usage of hybrid meshes reduces the model size for an equivalentaccuracy. In this case, a reduction of 35% degrees of freedom is obtained in comparison with a tetrahedralmodel. This is explained as the hexa-core allows an optimal filling of the interior volume, while keeping acorrect accuracy as hexahedral elements are offering a better relative error compared to tetrahedral meshes.Increasing the number of sections involving a hybrid mesh inherently reduces the model size and thus thetotal memory consumption and computational time.

All further computations in this paper will be performed using a hybrid mesh including the fan entry.

7 of 23

American Institute of Aeronautics and Astronautics

Dow

nloa

ded

by S

TE

LL

EN

BO

SCH

UN

IVE

RSI

TY

on

Oct

ober

9, 2

014

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/6

.201

4-23

15

Page 8: [American Institute of Aeronautics and Astronautics 20th AIAA/CEAS Aeroacoustics Conference - Atlanta, GA ()] 20th AIAA/CEAS Aeroacoustics Conference - Performance improvements and

Figure 6. Result comparison between the different models for an azimuthal mode of order 19

Figure 7. Performance comparison of different meshing methodologies

IV. High performance computing

A. Introduction

The performance of an acoustic computation depends on various parameters, including modeling strategy,meshing strategy and computational aspects. This section of the paper focuses on the solver and softwareimprovements brought to the code and evaluates their respective performance. In this section, the followingparameters are assessed:

• Solver choice;

• BLAS and CPU affinity;

• Use of Multithreading for the different solvers;

• Use of matrix parallelism.

All these parameters are tested on an approach model valid up to 2000Hz, using a hybrid quadratic meshover the complete model. This model contains 1.2 million of degrees of freedoms. It is insured for eachconfiguration that the obtained solutions are strictly identical.

8 of 23

American Institute of Aeronautics and Astronautics

Dow

nloa

ded

by S

TE

LL

EN

BO

SCH

UN

IVE

RSI

TY

on

Oct

ober

9, 2

014

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/6

.201

4-23

15

Page 9: [American Institute of Aeronautics and Astronautics 20th AIAA/CEAS Aeroacoustics Conference - Atlanta, GA ()] 20th AIAA/CEAS Aeroacoustics Conference - Performance improvements and

B. Solver comparison

Up to now, only the MUMPS direct solver was available within the software to address important modelsizes. The MUMPS solver is a MUltifrontal Massively Parallel sparse direct Solver. The version used is apublic domain package based on public domain software developed since 1999 by CERFACS, ENSEEIHT-IRIT, and INRIA Rhone-Alpes. This project is partially funded by CEC ESPRIT IV long term researchproject – No. 20160 (PARASOL).

The next release of the software also provides an interesting alternative to the MUMPS solver, the MKLPARDISO solver (for Parallel Direct Sparse Solver Interface). This solver has the advantage of including amultithreaded implementation, whereas the MUMPS solver provides an MPI implementation, but can usemultithreading during BLAS calls during the factorization stage. It can thus be considered as an comple-mentary alternative to the MUMPS solver. Moreover, the PARDISO solver is known to be memory efficient,and this point will be considered during the performance comparison. Both solvers are offering an in-core(IC) and out-of-core (OOC) implementation, each with a minimal required memory estimated during theanalysis phase of the solver.

When handling unsymmetric linear systems of equations, a reordering tool is also used prior to using thedirect solver. This tool will efficiently renumber and reorder the system provided to the direct solver andthus affect its memory consumption and computational time. For the MUMPS solver, different reorderingtools are available: PORD, METIS and SCOTCH. For the PARDISO solver, only the METIS reorderingtool is available. Additional reordering tools are made available when running in parallel or multithreaded,but will be discussed later.

The performance of both solver in sequential configurations is shown in Figure 8. The provided compu-tational time corresponds to the total computational time for the computation, including assembling andpost-processing of the output quantities. The provided memory consumption indicator corresponds to thehighest memory consumption of the computation.

The computational time and memory consumption of the MUMPS solver is highly dependent on thereordering tool used, both in IC and OOC configurations. The SCOTCH reordering tool seems the mostoptimal for such acoustic problems, offering a reduction of the computational time of 37% or 19% comparedto PORD and METIS. The total memory consumption is also reduced, indicating this reordering tool seemsthe most optimal to handle inlet problems. The computational time in single thread of the PARDISOsolver is higher than the MUMPS solver using the SCOTCH reordering tool, but remains below the twoother reordering tools. In terms of memory consumption however, the PARDISO solver shows an interestinglower memory consumption, both in IC and OOC configurations. In OOC configuration, the total memoryconsumed (including an identical model assembly) is 3 times lower using the PARDISO solver in comparisonwith the MUMPS solver. Despite its higher computational time, this higher memory efficiency can becomevery important when handling larger model sizes.

Figure 8. Performance comparison of different solvers and re-ordering tools

9 of 23

American Institute of Aeronautics and Astronautics

Dow

nloa

ded

by S

TE

LL

EN

BO

SCH

UN

IVE

RSI

TY

on

Oct

ober

9, 2

014

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/6

.201

4-23

15

Page 10: [American Institute of Aeronautics and Astronautics 20th AIAA/CEAS Aeroacoustics Conference - Atlanta, GA ()] 20th AIAA/CEAS Aeroacoustics Conference - Performance improvements and

C. CPU affinity and BLAS

An important aspect of high performance computing is the usage of BLAS (Basic Linear Algebra Subpro-grams) libraries, used during important CPU-demanding steps like the factorizing and the backtransforma-tion steps. These are used within the direct solver for solving mathematical vector-vector, matrix-vectorand matrix-matrix operations, and are adapted for different processor architectures. In this section, theinteraction of these BLAS libraries on different Intel architectures and their efficiency are evaluated.

The BLAS used within the software is the Intel MKL version. Its efficiency on different Intel architecturesis shown in Figure 10. The different architectures tested in this section are part of a 2-way server (using 2multicore processors) and are shown in Figure 9.

As the different machines involved in this section are having different clockspeed, the obtained solutionsare scaled to the maximal clockspeed of 3GHz in Figure 11. Note that sequential performance cannot becompared easily as the new architecture (starting from Sandybridge) uses a ”Turbo” mode in case not allcores are working simultaneously and increases the core computing frequency.

The first observation shows that the architecture of recent machines evolves by an increase of numberof cores, while the maximal clockspeed seems to be either reduced, either left unchanged. When usingmultithreading, this will inherently improve the speedup of the computation. This is observed on Ivybridgeand Sandybridge, two machine of similar generation, where the multithreaded solution of the Ivybridge isfaster than the Sandybridge, which is not the case in sequential. The evolution of architecture also show avery positive improvement in terms of computational time, even with a reduced clockspeed. The speedupobserved between Clovertown and Ivybridge and Sandybridge is about a factor 2. The same factor can beobserved between Ivybridge and Haswell. Using all threads both for MUMPS or the PARDISO solver showsan increasing speedup for more recent machines due to the number of available processors, despite theirlower efficiency.

D. Usage of Multithreading

Multithreading consists in the parallel use of different cores of different processors sharing the memory onthe same instance of the OS. It is thus restricted to intra-node (or intra-machine) operations. In practice,multithreading is used only within the direct solver. The PARDISO solver is itself multithreadable anduses all available cores during the factorization and backtransformation steps. The MUMPS solver does notsupport itself a multithreading implementation, but the BLAS operations called by the solver can use mul-tithreading. It is thus expected that the obtained speed-up should be more important using the PARDISOsolver.

In practice, multithreading is used to speedup the computations using an important number of cores,while not affecting the memory consumption. In order to evaluate the performance of a solution, the twofollowing quantities are used in this paper, both for multithreading and parallelism:

Speedup : S(n) =time(1CPU)

time(nCPU)(10)

Efficiency : E(n) =S(n)

n(11)

The speedup indicator is rather explicit and indicates the computational time reduction while increasingthe number of cores. The efficiency indicator indicates the overall workload for each processor. As thespeedup increases when using more threads or processes, the total workload which can be computed on eachprocess decreases. When optimizing the computational time of a computation, it is mainly the speedup thatis of interest, while the efficiency is a good indicator to optimize the type of computation to be launchedwhen multiple computations are concerned.

The computation time and maximal memory consumption for a varying number of threads on theMUMPS and PARDISO solver is shown in Figure 12. For all computations, the memory consumption

10 of 23

American Institute of Aeronautics and Astronautics

Dow

nloa

ded

by S

TE

LL

EN

BO

SCH

UN

IVE

RSI

TY

on

Oct

ober

9, 2

014

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/6

.201

4-23

15

Page 11: [American Institute of Aeronautics and Astronautics 20th AIAA/CEAS Aeroacoustics Conference - Atlanta, GA ()] 20th AIAA/CEAS Aeroacoustics Conference - Performance improvements and

Machine Configuration Architecture

Figure 9. The different processor architectures used.

does not vary significantly when increasing the number of threads. The same conclusions as found previ-ously are highlighted: the memory consumption of the PARDISO solver is much smaller compared to theMUMPS solver, which seems the most efficient when using the SCOTCH reordering tool.

For in-core computations, we can see the overall speed-up is much more important for the PARDISO

11 of 23

American Institute of Aeronautics and Astronautics

Dow

nloa

ded

by S

TE

LL

EN

BO

SCH

UN

IVE

RSI

TY

on

Oct

ober

9, 2

014

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/6

.201

4-23

15

Page 12: [American Institute of Aeronautics and Astronautics 20th AIAA/CEAS Aeroacoustics Conference - Atlanta, GA ()] 20th AIAA/CEAS Aeroacoustics Conference - Performance improvements and

Figure 10. Performance comparison on different architectures

Figure 11. Performance comparison on different architectures at constant speed

solver (about 10 times faster on 20 threads) compared to the MUMPS solver (5,7 times faster). The efficiencyin all configurations decreases while the number of threads is increasing. Using the PARDISO solver, thefinal efficiency is about 0.5, while it reaches 0.3 using the MUMPS solver.

For out-of-core computations, the speedup obtained using the PARDISO solver decreases above 8 threads,while it remains of same magnitude using the MUMPS solver. For a computation requiring the entirememory of a particular machine, it can usually be recommended to use the maximum number of threadsthat is available in order to obtain the best speedup.

E. Usage of Parallelism

An alternative (or complement) to multithreading consist in using MPI implementation of direct solver,as the matrix parallelism available within the MUMPS solver. As multithreading allows to easily decreasethe computational time on a given machine while keeping a constant memory consumption, it does notallow to use different machines, neither it allows to reduce the local memory consumption on a particularmachine. The current MPI implementation consist in a domain decomposition of the model, with a dis-tributed assembly of the impedance matrix on the different processes. The distributed matrix is providedto the direct solver which solves dynamically on the different processes using matrix parallelism. More in-formation on the parallel implementation of the MUMPS solver can be found on the MUMPS webpage.4

The reordering tools that are available for parallel computations are SCOTCH and METIS. The reorderingtools being sequential, the analysis phase of the direct solver can only happen sequentially. An additionalparallel implementation of the SCOTCH reordering tool, PTSCOTCH, allows to perform this step in parallel.

This method allows to both decrease the memory consumption during the assembly phase of the solver

12 of 23

American Institute of Aeronautics and Astronautics

Dow

nloa

ded

by S

TE

LL

EN

BO

SCH

UN

IVE

RSI

TY

on

Oct

ober

9, 2

014

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/6

.201

4-23

15

Page 13: [American Institute of Aeronautics and Astronautics 20th AIAA/CEAS Aeroacoustics Conference - Atlanta, GA ()] 20th AIAA/CEAS Aeroacoustics Conference - Performance improvements and

Figure 12. Performance comparison using different number of threads

and during the factorization and backtransformation step. It can also reduce the computational time, butthis will inherently depend on the type of switch connecting the different machines and on the model bal-ancing. In our tests, the matrix parallelism will be used on a single machine to simplify the comparison withmultithreading. Finally, this methodology can allow, through its reduced memory consumption, to run largemodels on machines with less memory.

The computation time and maximal memory consumption when increasing the number of processes usingthe SCOTCH and METIS reordering tools are shown in Figure 13. As for the sequential computations, theSCOTCH reordering tool is the most suited for solving these type of problems. Using the SCOTCH reorder-ing tool, the computational time shows the same trend as when using multithreading. Below 8 processes,the total computational time using multithreading is lower compared to parallelism. Above 8 processes, thecomputational time, both in IC and OOC implementations, is smaller using the MPI implementation. Interms of memory consumption per process, this is reduced by approximately 33% when doubling the numberof processes, up to 8 processes. Above 8 processes, the memory reduction per process is less important. Intotal, the overall memory consumption is more important than using a sequential computation as seen inFigure 15. This indicates that increasing the number of processes on a single architecture will lead to ahigher memory consumption and will not be practical for larger models.

The comparison between a sequential and parallel analysis of the linear algebraic system performed byMUMPS prior to the factorization is shown in Figure 14. The objective of performing a parallel analysis is toreduce the memory consumption and computational time of this analysis step. The resulting load balancingbetween the different processors however does not seem as optimal as the sequential analysis and will not beconsidered in further computations.

F. Intermediate Conclusions

Different computational parameters have been evaluated on a typical model: solver choice, reordering toolchoice, CPU affinity, usage of multithreading or matrix parallelism.

The two direct solver used, MUMPS and PARDISO, are offering two separate alternatives. The MUMPSsolver provides an MPI implementation and allows to run in multithread, while the current implementationPARDISO solver is restricted to multithreading (and thus run on a single machine only). To the author’sknowledge, an MPI implementation of the MKL PARDISO solver could be made available in the future.The computational time observed using the MUMPS and PARDISO solver in a same multithreading con-figuration are of same order, the PARDISO solver offering a better scalability with more threads in-core,but lower when running out-of-core. For running larger models on a single machine however, the PARDISOsolver has a clear advantage in terms of memory consumption, which allows to run larger models. For adistributed IT environment with smaller machines (typically, a CFD cluster), the MPI implementation of

13 of 23

American Institute of Aeronautics and Astronautics

Dow

nloa

ded

by S

TE

LL

EN

BO

SCH

UN

IVE

RSI

TY

on

Oct

ober

9, 2

014

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/6

.201

4-23

15

Page 14: [American Institute of Aeronautics and Astronautics 20th AIAA/CEAS Aeroacoustics Conference - Atlanta, GA ()] 20th AIAA/CEAS Aeroacoustics Conference - Performance improvements and

Figure 13. Performance comparison using different number of processes using SCOTCH and METIS reordering

Figure 14. Performance comparison using different number of processes using centralized or parallel analysis

the MUMPS solver allows to run larger models through the usage of parallelism.

The different reordering tools have been tested on the MUMPS solver. The SCOTCH reordering toolprovides for nacelle inlet models the best efficiency and memory consumption. Using a multithreaded im-plementation of the SCOTCH reordering tool (PT-SCOTCH) could provide an interesting alternative, butis too unstable in the current implementation.

The comparison of different architectures has shown an important improvement of the solver on morerecent architectures, despite the lower speedclock of these. On single thread configurations, the obtainedcomputation time is reduced compared to older architectures. Since these more recent architectures areproviding more cores, the obtained speedup in multithreading is also increased, despite the lower efficiencyof the computation.

The usage of multithreading or matrix parallelism have been tested and compared. In terms of CPUtime, both solutions are scaling in the same order of magnitude. On a large environment, if enough memoryis available, the most optimal configuration would be to run the process in parallel on a number of processequivalent to the number of sockets, and a number of threads corresponding to the number of cores for eachsocket. This can be seen in Figure 16, where the lowest computational time obtained on an architectureusing 2 sockets is obtained using two MPI processes. While the memory consumption using multithreadingremains almost constant, the memory consumption per process when using matrix parallelism reduces by33% when doubling the number of processes, until 8 processes. Depending on the type of environment (large

14 of 23

American Institute of Aeronautics and Astronautics

Dow

nloa

ded

by S

TE

LL

EN

BO

SCH

UN

IVE

RSI

TY

on

Oct

ober

9, 2

014

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/6

.201

4-23

15

Page 15: [American Institute of Aeronautics and Astronautics 20th AIAA/CEAS Aeroacoustics Conference - Atlanta, GA ()] 20th AIAA/CEAS Aeroacoustics Conference - Performance improvements and

Figure 15. Total memory consumption following the number of processes

machine versus distributed systems), both solutions can be of interest.

Figure 16. Performance when combining matrix parallelism with multithreading

V. Solution strategies

A. Introduction

This section presents new solution strategies for solving nacelle computations. In a standard nacelle inletmodel, the entire volume is modeled using acoustic elements while the far field propagation is insured usinginfinite elements, both supporting a convective wave propagation. Pending the model characteristics, twostrategies can be used to reduce its size:

• If the provided model contains higher order periodicities, a periodic boundary condition enables thedecrease of the model size up to the order of periodicity. Due to the model restriction (the model isassumed scarfed), this second strategy will not be presented in this paper.

• If the provided model is symmetric, an automated method has been implemented which deals with thischaracteristic of the model in order to reduce the final model size; this methodology will be presentedhere.

15 of 23

American Institute of Aeronautics and Astronautics

Dow

nloa

ded

by S

TE

LL

EN

BO

SCH

UN

IVE

RSI

TY

on

Oct

ober

9, 2

014

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/6

.201

4-23

15

Page 16: [American Institute of Aeronautics and Astronautics 20th AIAA/CEAS Aeroacoustics Conference - Atlanta, GA ()] 20th AIAA/CEAS Aeroacoustics Conference - Performance improvements and

Besides the strategy accounting for inherent symmetries or periodicities of the model, a further alternativein the classic modeling strategy consist in replacing the usage of infinite elements with a different non reflectiveboundary condition which is valid in a convected wave propagation. A new type of component, the PerfectlyMatched layer has now been upgraded in the next version of the software to account for convected wavepropagation. This new component will be presented and compared to the infinite elements.

B. Symmetric and anti-symmetric recombination

1. Strategy

For models that show planar symmetry, it is often beneficial to compute separately the symmetric andanti-symmetric part of the solution. This allows the use of half-models, which of course reduce memoryrequirements, but also total CPU time for methods whose CPU time scales more than linearly with numberof degrees of freedom (DOF). This is the case for direct solvers, whose memory requirements typically scalelike N ∗ B and CPU time scales like N ∗ B2, with N the number of dofs and B the bandwidth. Cuttingthe model in two along the symmetry plane typically divide both the number of dofs and the bandwidth bya factor of two, so performing two computations on a half-model typically reduce the memory usage by afactor four and CPU time by a factor of two, compared to a single simulation on the full model. In orderto obtain the full solution for an excitation like an incident rotating duct mode, it is necessary to split theexcitation into a symmetric part and an anti-symmetric part. Assuming that y-z is the symmetry plane, theductmodes (7) can be expressed as

φ(r, θ, z) =∑m,n

(Jm(krmnr) + CYm(krmnr))

(cos(mθ) + i sin(mθ)) (A+mne

−ik+zmnz +A−mne

−ik−zmnz)

(12)

with

x, y, z = r cos(θ), r sin(θ), z .

(13)

Due to the symmetry of the model, considering only the cosine part of the excitation ensures that thesolution will be symmetric (φ(x, y, z) = φ(−x, y, z)), while considering only the sine part ensures that thesolution will be anti-symmetric (φ(x, y, z) = −φ(−x, y, z)). We thus solve independently the cosine part andsine part on a half-model (x >= 0):

• symmetric computation: duct modes contain only the cosine part, the natural boundary condition onthe symmetry plane ensure that the normal derivative of the velocity potential field vanish. Solutionis called φsym(x, y, z);

• anti-symmetric computation: duct modes contain only the the sine part, a Neumann boundary condi-tion φ = 0 is enforced on the symmetry plane x = 0. Solution is called φasym(x, y, z).

When the two half-model velocity potential fields are known, linearity allows to reconstruct the full velocitypotential field

φ(x, y, z) = φsym(x, y, z) + φasym(x, y, z) ∀x > 0

φ(0, y, z) = φsym(0, y, z)

φ(x, y, z) = φsym(−x, y, z)− φasym(−x, y, z) ∀x < 0

(14)

2. Results

All models previously shown only accounted for the symmetric contribution of the symmetric and anti-symmetric decomposition. This is valid as we are mainly interested in computational time and convergence

16 of 23

American Institute of Aeronautics and Astronautics

Dow

nloa

ded

by S

TE

LL

EN

BO

SCH

UN

IVE

RSI

TY

on

Oct

ober

9, 2

014

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/6

.201

4-23

15

Page 17: [American Institute of Aeronautics and Astronautics 20th AIAA/CEAS Aeroacoustics Conference - Atlanta, GA ()] 20th AIAA/CEAS Aeroacoustics Conference - Performance improvements and

of the studied model. This can however only be valid as long as this decomposition is a valid way to comparewith a complete 3d solution. Figure 17 show the comparison of a symmetric and anti-symmetric decompo-sition with a complete 3d model. In order to insure a correct comparison, the meshes have been mirroredand pasted along the symmetry plane. The flow field has also been symmetrized. The directivity shown isthe directivity along the plane perpendicular to the symmetry plane, for 2 azimuthal modes (11 and 15). Inpractice, a very slight difference of about 0.01 dB is noticed, which is due to the regularization process ofthe flow which has been symmetrized, and thus slightly modified. In a general way, this decomposition isthus valid.

The computational performance of the decomposition strategy and the complete model is shown in Figure18. As the number of degrees of freedom is almost divided by a factor 2 (the symmetrical case accounting forthe symmetric points, while the anti-symmetric solution constrains these), we observe almost a factor 2 ofdifference in the total computational time. The memory consumption shows a reduction of 63%. It shouldalways be considered to use this solution strategy when the model shows a planar symmetry.

Figure 17. Directivity comparison between symmetric decomposition and full 3d model for 2 modes

Figure 18. Computational time and memory for decomposed vs complete model

C. Convected Perfectly Matched Layers

1. Strategy

Infinite elements allow to model an unbounded domain accounting for a constant flow within the infiniteelements. This strategy has been used and validated in many different papers both theoretically7,8 , andagainst measurements acquired in fan rigs9 or detected on a real engine during ground tests.10 Despiteits simplicity, a particular problem with the usage of infinite elements is the requirement for an increasedinterpolation order at higher frequencies, which can drastically impact the number of degrees of freedom forlarger frequencies. An alternative for this uncertainty on the interpolation order is the Perfectly Matched

17 of 23

American Institute of Aeronautics and Astronautics

Dow

nloa

ded

by S

TE

LL

EN

BO

SCH

UN

IVE

RSI

TY

on

Oct

ober

9, 2

014

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/6

.201

4-23

15

Page 18: [American Institute of Aeronautics and Astronautics 20th AIAA/CEAS Aeroacoustics Conference - Atlanta, GA ()] 20th AIAA/CEAS Aeroacoustics Conference - Performance improvements and

Layer technique.

Perfectly Matched layer achieve absorption of incident acoustic waves by stretching the coordinate alignedwith the outgoing direction by a complex factor. This achieves two goals:

• Acoustic waves traveling perpendicular to the stretching direction remains undamped and propagatewith the same sound speed as in the finite element domain;

• Acoustic waves traveling parallel to the stretching direction are damped, but the acoustic impedancein the PML domain exactly match the impedance of the finite element domain, ensuring minimalreflection at the interface for acoustic waves penetrating the PML domain.

Those two features, coupled with a PML thickness of at least one wavelength and a well-choosen stretchingfunction, ensure an excellent behavior in term of wave absorption and provide an efficient Non-ReflectingBoundary implementation. However, the fact that the stetching is done on a specific coordinate makes themethod more cumbersome to use, as the coordinate system has to be carefully chosen and the PML regionis usually limited to boxes in cartesian coordinates. Moreover, it has been shown that in presence of meanflow, the PML methods may lead to inverse upstream modes amplification, inverse upstream modes beingmodes whose phase and group velocities are in opposite direction.12 To alleviate those two drawbacks, theclassical PML method has been extended into a more flexible formulation:

• The stretching is done in local curvilinear coordinates, then transformed back into global cartesiancoordinate system. This allows to perform the stretching in an arbitrary direction, that may varyspatially. The need for a carefully chosen global coordinate system and a special geometry for thePML domains is avoided.

• The stretching is done in a local Prandtl-Glauert transformed space, where the mean flow is eliminatedfrom the acoustic propagation operator (transformed into a classical Helmholtz operator). Then thevelocity potential and coordinate system are transformed back into the original coordinate system usingan inverse Prandtl-Glauert transformation. This eliminates inverse upstream mode amplification, andallows to combine arbitrary PML streching direction with arbitrary mean flow direction.

This new formulation is extremely flexible and allows to use a PML absorbing layer in complex geometricregions with a completely arbitrary mean flow, while ensuring a non-reflecting behavior equivalent of thetypical PML using cartesian stretching in a medium at rest.

In comparison with Infinite Elements, we can thus model a varying mean flow within the PML compo-nents, which may lead to shorten the acoustic component. The insured non reflective condition at constantnumber of degrees of freedom allows to avoid the modification of an interpolation order for a particularfrequency.

2. Results

In order to compare the Infinite elements formulation with the PML, an additional layer of elements isgenerated by extrusion surrounding the acoustic component, as shown in Figure 19. The same meshing ruleas retrieved before are used. This layer of elements is made by extrusion, using 4 quadratic elements. Themain advantage of this meshing procedure is that is insures a constant number of degrees of freedom alongthe extrusion for each generated frequency. This model, including the PML can now be compared with theInfinite Elements method.

The comparison between infinite elements and PML elements are made on an approach model valid up to2000Hz, using a hybrid quadratic mesh. Far field results are compared, which insure that near field resultsare similar. 3 different outputs are compared:

• Using infinite elements for near field and far field;

• Using perfectly matched layers for near field, and FWH formulation for far field;

• Using infinite elements for near field, and FWH formulation for far field.

18 of 23

American Institute of Aeronautics and Astronautics

Dow

nloa

ded

by S

TE

LL

EN

BO

SCH

UN

IVE

RSI

TY

on

Oct

ober

9, 2

014

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/6

.201

4-23

15

Page 19: [American Institute of Aeronautics and Astronautics 20th AIAA/CEAS Aeroacoustics Conference - Atlanta, GA ()] 20th AIAA/CEAS Aeroacoustics Conference - Performance improvements and

Figure 19. Extra layer of elements supporting the PML component

Both FWH and infinite elements formulation for far field being very different in their assumptions, itis not expected that the standard infinite element solution and the PML solution would be identical in farfield. However, the FWH solution for both computations can be computed and compared together. This isshown in Figure 20 and 21.

Both figures show the difference between the FWH and Infinite elements formulations for far field. Whenusing the FWH method, both techniques, Infinite elements and PMLs, are providing very similar results,thus validating the approach.

Figure 20. Result comparison between the different non reflective boundary conditions for a plane wave mode

The comparison of the PML methodology with the Infinite Elements for a 3000Hz computation is shownin Figure 22. At this frequency, it is known, for this particular case, that an infinite element interpolationorder of 10 is sufficient to obtain a converged solution. This rather low interpolation order is due to theconservative acoustic mesh and the rather large acoustic domain used. At higher frequency, it is known theorder will need to be increased. At this frequency however, the usage of the PML component already providea reduced memory consumption and CPU time in comparison with Infinite Elements. For larger models athigher frequencies, it can be expected the obtained speedup becomes more important.

A last improvement with the presented PML methodology is the usage of a non-uniform mean flow, whichshould allow to reduce the size of the acoustic domain, and thus reduce the size of the global matrix to beresolved.

19 of 23

American Institute of Aeronautics and Astronautics

Dow

nloa

ded

by S

TE

LL

EN

BO

SCH

UN

IVE

RSI

TY

on

Oct

ober

9, 2

014

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/6

.201

4-23

15

Page 20: [American Institute of Aeronautics and Astronautics 20th AIAA/CEAS Aeroacoustics Conference - Atlanta, GA ()] 20th AIAA/CEAS Aeroacoustics Conference - Performance improvements and

Figure 21. Result comparison between the different non reflective boundary conditions for an azimuthal modeof order 19

Finally, an additional implementation of the PML, the Adaptive PML (APML) is able to generate thePML mesh by extrusion or using a tetrahedral mesh.

Figure 22. Infinite Elements vs PML performance comparison

VI. Enlarging the frequency range

Based on the previous retrieved guidelines, an increase of the computational frequency can now beperformed. Based on the infrastructure availability (a few HPC machines with large memory), the followingguidelines are followed:

• One particular mesh is made for each frequency of interest;

• The mesh is made of quadratic hybrid elements;

• Convected PML components are used for insuring a non reflective boundary condition;

• A symmetric model decomposition is used (only the symmetric part is actually modelled);

• MUMPS and PARDISO solver are used, in order to observe if the inherent differences between theseare still of application on larger models;

• IC computations are used as long as sufficient memory is available. For OOC computations, all theavailable memory is assigned to the direct solver. In practice, only the minimal amount of memory tosolve the computation using the OOC solver is shown;

20 of 23

American Institute of Aeronautics and Astronautics

Dow

nloa

ded

by S

TE

LL

EN

BO

SCH

UN

IVE

RSI

TY

on

Oct

ober

9, 2

014

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/6

.201

4-23

15

Page 21: [American Institute of Aeronautics and Astronautics 20th AIAA/CEAS Aeroacoustics Conference - Atlanta, GA ()] 20th AIAA/CEAS Aeroacoustics Conference - Performance improvements and

• The MUMPS solver uses the SCOTCH reordering tool;

• Multithreading over 20 processors is used.

Using these guidelines, the number of degrees of freedom along the frequency range is shown in Figure23. The first blade passing frequency, which is in our case equal to 2.5kHz, requires 1.9 millions of degrees offreedom. When reaching the second blade passing frequency, this number is reaching 11.9 millions of degreesof freedom. Following the same trend, the number of degrees of freedom should rise up to 20 millions forreaching 6kHz.

The computational time and required memory for each computation is shown in Figure 24. The mem-ory consumption shows that both solver have sufficient resources to run IC up to 2.5kHz (about 2 millionsof dofs), while both need to use the OOC implementation for higher frequencies. As shown previously, thecomputation time using the MUMPS solver is smaller when running OOC in comparison with the PARDISOsolver (3kHz and 4kHz), while the opposite conclusion is valid for lower frequencies. In terms of memoryconsumption, the estimation of required memory for the MUMPS solver for the highest frequency was abovethe available memory (310Gb, against 248 available), which did not allow to run this latest frequency usingthe MUMPS solver. The minimum required memory for using the PARDISO solver is much lower than theMUMPS solver. The required memory to run OOC is two times smaller at 3kHz, while it is 3.4 times smallerat 5kHz. On such an environment, it should be considered to use the PARDISO solver by default to runvery large models, despite the slower computational time in OOC versions.

The obtained computational times are also acceptable. The maximal frequency is thus resolved in lessthan 24 hours.

Figure 23. Number of degrees of freedom following the computational frequency.

VII. Conclusion

Various performance improvements and new solution strategies have been reviewed and compared in thispaper, as well as the influence of different meshing strategies. Based on the different comparison made,different guidelines for modeling nacelle inlet can be retrieved.

The usage of quadratic meshes instead of linear meshes is highly recommended, as it optimizes theconvergence of the solution while reducing the computational time. Using structured, or partly structuredmeshes by means of a hybrid mesh having a hexa core, allows to optimize the element filling and thus tohighly reduce the number of degrees of freedom compared to a standard tetrahedral mesh. In addition, ithas been shown that the usage of 4 quadratic elements per convected acoustic wavelength is sufficient to

21 of 23

American Institute of Aeronautics and Astronautics

Dow

nloa

ded

by S

TE

LL

EN

BO

SCH

UN

IVE

RSI

TY

on

Oct

ober

9, 2

014

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/6

.201

4-23

15

Page 22: [American Institute of Aeronautics and Astronautics 20th AIAA/CEAS Aeroacoustics Conference - Atlanta, GA ()] 20th AIAA/CEAS Aeroacoustics Conference - Performance improvements and

Figure 24. Computational time and memory for different computational frequencies.

obtain converged results in far field.

The integration of the PARDISO solver provides a very interesting alternative in comparison with theMUMPS solver. On a single environment (one machine), its usage should be recommended to run largemodels due to its memory efficiency, even if the efficiency of the OOC implementation is less attractivethan the MUMPS solver. The MUMPS solver remains a very robust solution that allows to distribute thecomputation on a large number of machines, if available.

The positive influence of using recent architectures has been assessed. These latest allows to use a largenumber of threads, which should be used when available to speed-up the processes accordingly. The usageof multithreading or matrix parallelism has been shown and quantified. As the multithreading usage allowsto speedup the computation on a single machine without memory increase, the matrix parallelism allowsto provide an equivalent speedup with an important memory reduction for each process. If available, thecombined usage of both methods is recommended. Depending on the available computational architecture,the usage (or combined usage) of each solution is recommended.

Two new solution strategies, the symmetric and anti-symmetric decomposition and the convected Per-fectly Matched Layer, have been presented and compared to a standard modeling process. When a particularmodel show plane symmetry or periodicity, the usage of a (automated) recombination procedure should berecommended. The usage of perfectly matched layers for insuring a non reflective boundary condition hasbeen compared to the infinite elements. Its usage at higher frequencies is recommended, as it allows toavoid an increase of interpolation order of infinite elements, and provides a constant number of degrees offreedom along its direction for all frequencies. In addition, the usage of PML allows the specification of aheterogeneous flow field within the PML itself, reducing the requirement of having an exterior boundary at(almost) constant mean velocity.

These guidelines have been applied on larger models, increasing the computation size up to the secondBPF with about 12 million of degrees of freedom. The advantage of using the PARDISO solver for thelargest models has been assessed. Finally, the obtained computational times (between 5 and 6 hours forsolving 6 million degrees of freedom, about 20 hours for 12 million) are acceptable for such large models,and allow to industrialize the process of modeling realistic nacelle inlets on an adapted environment.

Acknowledgments

Various hardware components were made available to us for performing these benchmarks (especiallypre-release platforms) thanks to Intel Corporation c©and Micron Corporation c©.

22 of 23

American Institute of Aeronautics and Astronautics

Dow

nloa

ded

by S

TE

LL

EN

BO

SCH

UN

IVE

RSI

TY

on

Oct

ober

9, 2

014

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/6

.201

4-23

15

Page 23: [American Institute of Aeronautics and Astronautics 20th AIAA/CEAS Aeroacoustics Conference - Atlanta, GA ()] 20th AIAA/CEAS Aeroacoustics Conference - Performance improvements and

References

1Mosson A. New advances in the use of Actran/TM for nacelle simulations and optimisation of IBM clusters for Actranparallel computations, AIAA, 2006-2558, 12th AIAA/CEAS Aeroacoustics Conference, 8-10 May 2006, Cambridge MA, USA.

2New advances in the use of Actran/TM for nacelle simulations, AIAA, 2008-28273Actran 14.0 User’s Guide, Free-Field-Technologies-S.A.4Mumps: a multifrontal massively parallel sparse direct solver, http://mumps.enseeiht.fr/index.php5PARDISO : Parallel Direct Sparse Solver Interface,

http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/index.htm#GUID-7E829836-0FEF-46B2-8943-86A022193462.htm

6Myers, K.K., On the acoustic boundary condition in the presence of flow, J. Sound and Vibration, 71:429-434, 19807R.J. Astley and J.P. Coyette, Conditioning of infinite element schemes for wave problems, Commun. Numer. Meth.

Engng., 17:31-41, 20018R.J. Astley and J.P. Coyette, The performance of spheroidal infinite elements, 52:1379-1396, 2001.9P. Ploumhans, K. Meerbergen, T. Knapen, X. Gallez, G. Lielens, and J.P. Coyette, Development and validation of a

parallel out-of-core propagation and radiation code with validation on a turbofan application, The 18th International Congresson Acoustics, April 4-9, Kyoto, Japan, 2004

10B. Schuster, L. Lieber, A. Vavalle, Optimization of a Seamless Inlet Liner Using an Empirically Validated PredictionMethod, AIAA, 2010-3824

11W. Eversman, The boundary condition at an impedance wall in a non-uniform duct with potential mean flow., J. Soundand Vibration, 246(1):63-69, 2004

12E. Bcache, A.S. Bonnet-Ben Dhia and G. Legendre, Perfectly matched layers for the convected Helmholtz equation, SIAMJ. Numer. Anal., 42(1):409-322, 2004

23 of 23

American Institute of Aeronautics and Astronautics

Dow

nloa

ded

by S

TE

LL

EN

BO

SCH

UN

IVE

RSI

TY

on

Oct

ober

9, 2

014

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/6

.201

4-23

15